MindEye-OmniAssist: A Gaze-Driven LLM-Enhanced Assistive Robot System for Implicit Intention Recognition and Task Execution

要約

支援ロボットシステムにおける有望な効果的な人間とロボットの相互作用は、視線ベースの制御です。
ただし、現在の視線ベースの支援システムは、主に基本的な把握アクションをユーザーに支援し、限られたサポートを提供します。
さらに、制限された意図認識能力は、多様な支援機能を提供する支援システムの能力を制約します。
このホワイトペーパーでは、大規模な言語モデル（LLM）とVision Foundationモデル（VFM）を搭載したオープンな暗黙的意図認識フレームワークを提案します。これにより、入力を注視し、定義または特定のシナリオに限定されないユーザーの意図を認識できます。
さらに、視線駆動型のLLM強化支援ロボットシステム（Mindeye-Omniassist）を実装し、視線を通してユーザーの意図を認識し、タスクの完了を支援します。
これを達成するために、システムはオープンボキャブラリーオブジェクト検出器、意図認識ネットワーク、およびLLMを利用して、完全な意図を推測します。
眼球運動のフィードバックとLLMを統合することにより、アクションシーケンスを生成して、ユーザーがタスクの完了を支援します。
現実世界の実験は支援タスクのために実施されており、システムはさまざまな未定義のタスクで41/55の全体的な成功率を達成しました。
予備的な結果は、提案された方法が、より複雑で多様なタスクをサポートすることにより、よりユーザーフレンドリーなヒューマンコンピューター相互作用インターフェイスを提供し、支援システムの汎用性と有効性を大幅に向上させる可能性を示していることを示しています。

要約(オリジナル)

A promising effective human-robot interaction in assistive robotic systems is gaze-based control. However, current gaze-based assistive systems mainly help users with basic grasping actions, offering limited support. Moreover, the restricted intent recognition capability constrains the assistive system’s ability to provide diverse assistance functions. In this paper, we propose an open implicit intention recognition framework powered by Large Language Model (LLM) and Vision Foundation Model (VFM), which can process gaze input and recognize user intents that are not confined to predefined or specific scenarios. Furthermore, we implement a gaze-driven LLM-enhanced assistive robot system (MindEye-OmniAssist) that recognizes user’s intentions through gaze and assists in completing task. To achieve this, the system utilizes open vocabulary object detector, intention recognition network and LLM to infer their full intentions. By integrating eye movement feedback and LLM, it generates action sequences to assist the user in completing tasks. Real-world experiments have been conducted for assistive tasks, and the system achieved an overall success rate of 41/55 across various undefined tasks. Preliminary results show that the proposed method holds the potential to provide a more user-friendly human-computer interaction interface and significantly enhance the versatility and effectiveness of assistive systems by supporting more complex and diverse task.

arxiv情報

著者	Zejia Zhang,Bo Yang,Xinxing Chen,Weizhuang Shi,Haoyuan Wang,Wei Luo,Jian Huang
発行日	2025-03-17 15:06:14+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

MindEye-OmniAssist: A Gaze-Driven LLM-Enhanced Assistive Robot System for Implicit Intention Recognition and Task Execution

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー