Vision in Action: Learning Active Perception from Human Demonstrations

要約

双方向のロボット操作のためのアクティブな知覚システムであるアクション（VIA）を提示します。
介して、タスク関連のアクティブな知覚戦略（たとえば、検索、追跡、焦点の焦点）を人間のデモンストレーションから直接学習します。
ハードウェア側では、viaはシンプルでありながら効果的な6-dofロボットネックを採用して、柔軟で人間のような頭の動きを可能にします。
人間の積極的な知覚戦略をキャプチャするために、ロボットと人間のオペレーターの間に共有された観測スペースを作成するVRベースのテレオ操作インターフェイスを設計します。
ロボットの物理的な動きのレイテンシによって引き起こされるVRモーション酔いを軽減するために、インターフェイスは中間の3Dシーン表現を使用し、ロボットの最新の観察でシーンを非同期に更新しながら、演算子側でのリアルタイムビューのレンダリングを可能にします。
一緒に、これらの設計要素は、視覚的閉塞を含む3つの複雑な多段階の双方向操作タスクの堅牢な視覚運動ポリシーの学習を可能にし、ベースラインシステムを大幅に上回ることができます。

要約(オリジナル)

We present Vision in Action (ViA), an active perception system for bimanual robot manipulation. ViA learns task-relevant active perceptual strategies (e.g., searching, tracking, and focusing) directly from human demonstrations. On the hardware side, ViA employs a simple yet effective 6-DoF robotic neck to enable flexible, human-like head movements. To capture human active perception strategies, we design a VR-based teleoperation interface that creates a shared observation space between the robot and the human operator. To mitigate VR motion sickness caused by latency in the robot’s physical movements, the interface uses an intermediate 3D scene representation, enabling real-time view rendering on the operator side while asynchronously updating the scene with the robot’s latest observations. Together, these design elements enable the learning of robust visuomotor policies for three complex, multi-stage bimanual manipulation tasks involving visual occlusions, significantly outperforming baseline systems.

arxiv情報

著者	Haoyu Xiong,Xiaomeng Xu,Jimmy Wu,Yifan Hou,Jeannette Bohg,Shuran Song
発行日	2025-06-18 17:43:55+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Vision in Action: Learning Active Perception from Human Demonstrations

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー