Language and Sketching: An LLM-driven Interactive Multimodal Multitask Robot Navigation Framework

要約

社会を意識したナビゲーションシステムは、ポイントツーポイントナビゲーション、人間の追従、誘導などの複数のタスクを実行しながら、さまざまな障害物を適切に回避するように進化しました。
ただし、顕著なギャップは依然として存在します。ヒューマンロボットインタラクション (HRI) では、ロボットにコマンドを伝達する手順には複雑な数学的定式化が必要です。
さらに、タスク間の移行には、ユーザーが望む直観的な制御やユーザー中心の対話性がまったく備わっていません。
この研究では、ナビゲーション分野における上記の新たな課題を解決するために、LIM2N と呼ばれる LLM 駆動の対話型マルチモーダルマルチタスクロボットナビゲーションフレームワークを提案します。
これは、まず、言語と手書きの入力がナビゲーションの制約と制御目標として機能するマルチモーダルインタラクションフレームワークを導入することで実現します。
次に、受信した情報を使用して複数のタスクを処理する強化学習エージェントが構築されます。
重要なのは、LIM2N が、複雑なシステムにおけるマルチモーダル入力の推論、マルチタスク計画、およびインテリジェントセンシングモジュールの適応と処理の間でスムーズな連携を生み出すことです。
シミュレーションと現実世界の両方で広範な実験が行われ、LIM2N がユーザーのニーズを優れて理解し、インタラクティブなエクスペリエンスが向上していることが実証されました。

要約(オリジナル)

The socially-aware navigation system has evolved to adeptly avoid various obstacles while performing multiple tasks, such as point-to-point navigation, human-following, and -guiding. However, a prominent gap persists: in Human-Robot Interaction (HRI), the procedure of communicating commands to robots demands intricate mathematical formulations. Furthermore, the transition between tasks does not quite possess the intuitive control and user-centric interactivity that one would desire. In this work, we propose an LLM-driven interactive multimodal multitask robot navigation framework, termed LIM2N, to solve the above new challenge in the navigation field. We achieve this by first introducing a multimodal interaction framework where language and hand-drawn inputs can serve as navigation constraints and control objectives. Next, a reinforcement learning agent is built to handle multiple tasks with the received information. Crucially, LIM2N creates smooth cooperation among the reasoning of multimodal input, multitask planning, and adaptation and processing of the intelligent sensing modules in the complicated system. Extensive experiments are conducted in both simulation and the real world demonstrating that LIM2N has superior user needs understanding, alongside an enhanced interactive experience.

arxiv情報

著者	Weiqin Zu,Wenbin Song,Ruiqing Chen,Ze Guo,Fanglei Sun,Zheng Tian,Wei Pan,Jun Wang
発行日	2023-11-14 15:29:52+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Language and Sketching: An LLM-driven Interactive Multimodal Multitask Robot Navigation Framework

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー