Visual Preference Inference: An Image Sequence-Based Preference Reasoning in Tabletop Object Manipulation

要約

ロボットによるオブジェクトの操作では、人間の好みが色や形状などのオブジェクトの視覚的属性に影響されることがよくあります。
これらの特性は、ロボットを操作してオブジェクトと対話し、人間の意図に合わせて操作する上で重要な役割を果たします。
この論文では、さまざまな種類のオブジェクトを使用したテーブルトップ操作環境における一連の生の視覚観察から、根底にある人間の好みを推論する問題 (Visual Preference Inference (VPI) と呼ばれる) に焦点を当てます。
操作のコンテキストにおける視覚的推論を容易にするために、Chain-of-Visual-Residuals (CoVR) メソッドを導入します。
CoVR は、連続する画像間の違い (つまり、視覚的な残差) を説明するプロンプトメカニズムを採用し、そのようなテキストを一連の画像に組み込んでユーザーの好みを推測します。
このアプローチにより、操作タスク中の視覚環境の動的な変化を理解し、それに適応する能力が大幅に強化されます。
さらに、ユーザーの好みを推測するために、一連の画像とともにそのようなテキストを組み込みます。
私たちの方法は、シミュレーション環境と現実世界の環境の両方で視覚シーケンスから人間の好みを抽出するという点で、ベースライン方法よりも優れています。
コードとビデオは、\href{https://joonhyung-lee.github.io/vpi/}{https://joonhyung-lee.github.io/vpi/} から入手できます。

要約(オリジナル)

In robotic object manipulation, human preferences can often be influenced by the visual attributes of objects, such as color and shape. These properties play a crucial role in operating a robot to interact with objects and align with human intention. In this paper, we focus on the problem of inferring underlying human preferences from a sequence of raw visual observations in tabletop manipulation environments with a variety of object types, named Visual Preference Inference (VPI). To facilitate visual reasoning in the context of manipulation, we introduce the Chain-of-Visual-Residuals (CoVR) method. CoVR employs a prompting mechanism that describes the difference between the consecutive images (i.e., visual residuals) and incorporates such texts with a sequence of images to infer the user’s preference. This approach significantly enhances the ability to understand and adapt to dynamic changes in its visual environment during manipulation tasks. Furthermore, we incorporate such texts along with a sequence of images to infer the user’s preferences. Our method outperforms baseline methods in terms of extracting human preferences from visual sequences in both simulation and real-world environments. Code and videos are available at: \href{https://joonhyung-lee.github.io/vpi/}{https://joonhyung-lee.github.io/vpi/}

arxiv情報

著者	Joonhyung Lee,Sangbeom Park,Yongin Kwon,Jemin Lee,Minwook Ahn,Sungjoon Choi
発行日	2024-03-18 06:54:38+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Visual Preference Inference: An Image Sequence-Based Preference Reasoning in Tabletop Object Manipulation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー