Can Language Models Understand Physical Concepts?

要約

言語モデル (LM) は、インタラクティブで具現化された世界において徐々に汎用インターフェースになり、物理的概念の理解が必須の前提条件となります。
しかし、LMが人間世界の物理概念を理解できるかどうかはまだ明らかではありません。
これを調査するために、(i) 物体の形状や材質などの視覚的概念、および (ii) 物体の温度などの世界との相互作用から学習した具体化された概念のタスクをカバーするベンチマーク VEC を設計します。
ゼロ（少数）ショットプロンプトの結果は、LM をスケールアップすると特定の視覚概念の理解が現れるものの、スケーリング則が適用されない基本概念がまだ存在することを示しています。
たとえば、OPT-175B は、物質概念では 85% のゼロショット精度で人間に近い性能を発揮しますが、質量概念ではランダムな推測のように動作します。
その代わりに、CLIP や BLIP などの視覚拡張 LM は、具体化された概念を人間レベルで理解することができます。
分析によれば、視覚表現における豊富なセマンティクスは、身体化された知識の貴重な情報源として機能する可能性があります。
これにインスピレーションを得て、私たちは、具体化された知識を VLM から LM に転送し、LM のパラメータを 134 倍にスケールアップする場合と同等のパフォーマンス向上を達成するための蒸留方法を提案します。
私たちのデータセットは \url{https://github.com/TobiasLee/VEC} で入手できます。

要約(オリジナル)

Language models~(LMs) gradually become general-purpose interfaces in the interactive and embodied world, where the understanding of physical concepts is an essential prerequisite. However, it is not yet clear whether LMs can understand physical concepts in the human world. To investigate this, we design a benchmark VEC that covers the tasks of (i) Visual concepts, such as the shape and material of objects, and (ii) Embodied Concepts, learned from the interaction with the world such as the temperature of objects. Our zero (few)-shot prompting results show that the understanding of certain visual concepts emerges as scaling up LMs, but there are still basic concepts to which the scaling law does not apply. For example, OPT-175B performs close to humans with a zero-shot accuracy of 85\% on the material concept, yet behaves like random guessing on the mass concept. Instead, vision-augmented LMs such as CLIP and BLIP achieve a human-level understanding of embodied concepts. Analysis indicates that the rich semantics in visual representation can serve as a valuable source of embodied knowledge. Inspired by this, we propose a distillation method to transfer embodied knowledge from VLMs to LMs, achieving performance gain comparable with that by scaling up the parameters of LMs 134x. Our dataset is available at \url{https://github.com/TobiasLee/VEC}

arxiv情報

著者	Lei Li,Jingjing Xu,Qingxiu Dong,Ce Zheng,Qi Liu,Lingpeng Kong,Xu Sun
発行日	2023-05-23 13:36:55+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Can Language Models Understand Physical Concepts?

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー