A Multi-Modal Interaction Framework for Efficient Human-Robot Collaborative Shelf Picking

要約

倉庫などの人間中心の環境でのサービスロボットの存在の増加には、シームレスで直感的な人間のロボットコラボレーションが必要です。
このペーパーでは、マルチモーダルの相互作用、物理ベースの推論、および強化された人間ロボットチームワークのためのタスク部門を組み合わせた共同シェルフピッキングフレームワークを提案します。
このフレームワークにより、ロボットは人間のポインティングジェスチャーを認識し、口頭の手がかりと音声コマンドを解釈し、視覚的および聴覚フィードバックを通じて通信できます。
さらに、それは、棚の散らかった箱のスタック、サブタスク生成の関係グラフ、抽出シーケンスの計画、意思決定のための、思考のチェーン（COT）と物理ベースのシミュレーションエンジンを利用する大規模な言語モデル（LLM）と物理ベースのシミュレーションエンジンを搭載しています。
さらに、1）ジェスチャー誘導ボックス抽出、2）共同棚のクリアリング、3）共同安定性支援などの実験的な実験を通じて、フレームワークを検証します。

要約(オリジナル)

The growing presence of service robots in human-centric environments, such as warehouses, demands seamless and intuitive human-robot collaboration. In this paper, we propose a collaborative shelf-picking framework that combines multimodal interaction, physics-based reasoning, and task division for enhanced human-robot teamwork. The framework enables the robot to recognize human pointing gestures, interpret verbal cues and voice commands, and communicate through visual and auditory feedback. Moreover, it is powered by a Large Language Model (LLM) which utilizes Chain of Thought (CoT) and a physics-based simulation engine for safely retrieving cluttered stacks of boxes on shelves, relationship graph for sub-task generation, extraction sequence planning and decision making. Furthermore, we validate the framework through real-world shelf picking experiments such as 1) Gesture-Guided Box Extraction, 2) Collaborative Shelf Clearing and 3) Collaborative Stability Assistance.

arxiv情報

著者	Abhinav Pathak,Kalaichelvi Venkatesan,Tarek Taha,Rajkumar Muthusamy
発行日	2025-04-09 05:42:33+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

A Multi-Modal Interaction Framework for Efficient Human-Robot Collaborative Shelf Picking

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー