TEXT2AFFORD: Probing Object Affordance Prediction abilities of Language Models solely from Text

要約

私たちは、事前トレーニングされた言語モデル (LM) と事前トレーニングされた視覚言語モデル (VLM) におけるオブジェクトアフォーダンスの知識を調査します。
増え続ける文献では、PTLM が一貫性がなく非直観的に失敗し、推論と根拠が欠如していることが示されています。
グラウンディング（またはその欠如）の効果を定量化するための第一歩を踏み出すために、私たちは、15 のアフォーダンスクラスで特徴付けられる、オブジェクトアフォーダンスの新規かつ包括的なデータセットである Text2Afford を厳選しました。
視覚領域や言語領域で収集されたアフォーダンスデータセットとは異なり、私たちは野生の文にオブジェクトとアフォーダンスの注釈を付けます。
実験結果から、珍しいオブジェクトアフォーダンスに関して PTLM が示す推論能力は限られていることが明らかになりました。
また、事前トレーニングされた VLM が必ずしもオブジェクトアフォーダンスを効果的にキャプチャしているわけではないことも観察されています。
数ショットの微調整を通じて、PTLM と VLM のアフォーダンス知識の向上を実証します。
私たちの研究は、言語基礎付けタスクのための新しいデータセットに貢献し、LM 機能についての洞察を提示し、オブジェクトアフォーダンスの理解を進めます。
コードとデータは https://github.com/sayantan11995/Affordance で入手できます。

要約(オリジナル)

We investigate the knowledge of object affordances in pre-trained language models (LMs) and pre-trained Vision-Language models (VLMs). A growing body of literature shows that PTLMs fail inconsistently and non-intuitively, demonstrating a lack of reasoning and grounding. To take a first step toward quantifying the effect of grounding (or lack thereof), we curate a novel and comprehensive dataset of object affordances — Text2Afford, characterized by 15 affordance classes. Unlike affordance datasets collected in vision and language domains, we annotate in-the-wild sentences with objects and affordances. Experimental results reveal that PTLMs exhibit limited reasoning abilities when it comes to uncommon object affordances. We also observe that pre-trained VLMs do not necessarily capture object affordances effectively. Through few-shot fine-tuning, we demonstrate improvement in affordance knowledge in PTLMs and VLMs. Our research contributes a novel dataset for language grounding tasks, and presents insights into LM capabilities, advancing the understanding of object affordances. Codes and data are available at https://github.com/sayantan11995/Affordance

arxiv情報

著者	Sayantan Adak,Daivik Agrawal,Animesh Mukherjee,Somak Aditya
発行日	2024-07-23 08:07:40+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

TEXT2AFFORD: Probing Object Affordance Prediction abilities of Language Models solely from Text

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー