CognitiveDog: Large Multimodal Model Based System to Translate Vision and Language into Action of Quadruped Robot

要約

この論文では、人間と口頭でコミュニケーションできるだけでなく、オブジェクト操作を通じて環境と物理的に対話できる大型マルチモーダルモデル (LMM) を備えた四足ロボットの先駆的な開発である CognitiveDog を紹介します。
このシステムは、カスタムグリッパーを備えた Unitree Go1 ロボット犬上で実現され、自律的な意思決定機能を実証し、ユーザー定義のタスクを実行するために最適なアクションとさまざまなオブジェクトとの相互作用を独立して決定します。
これらのタスクには必ずしも直接的な指示が含まれているわけではなく、ロボットが自然言語入力や環境の合図に基づいてタスクを理解して実行することが求められます。
この論文では、このシステムの複雑さ、データセットの特性、ソフトウェアアーキテクチャについて詳しく説明しています。
この開発の鍵となるのは、Visual-SLAM を使用して空間を移動し、物体を効果的に操作および輸送し、タスク実行中に洞察力に富んだ自然言語による解説を提供するロボットの熟練度です。
実験結果は、ロボットの高度なタスク理解力と適応性を強調し、現実世界のアプリケーションにおけるその可能性を強調しています。
ロボットと犬の行動生成モデルを微調整するために使用されるデータセットは、次のリンクで提供されます: Huggingface.co/datasets/ArtemLykov/CognitiveDog_dataset

要約(オリジナル)

This paper introduces CognitiveDog, a pioneering development of quadruped robot with Large Multi-modal Model (LMM) that is capable of not only communicating with humans verbally but also physically interacting with the environment through object manipulation. The system was realized on Unitree Go1 robot-dog equipped with a custom gripper and demonstrated autonomous decision-making capabilities, independently determining the most appropriate actions and interactions with various objects to fulfill user-defined tasks. These tasks do not necessarily include direct instructions, challenging the robot to comprehend and execute them based on natural language input and environmental cues. The paper delves into the intricacies of this system, dataset characteristics, and the software architecture. Key to this development is the robot’s proficiency in navigating space using Visual-SLAM, effectively manipulating and transporting objects, and providing insightful natural language commentary during task execution. Experimental results highlight the robot’s advanced task comprehension and adaptability, underscoring its potential in real-world applications. The dataset used to fine-tune the robot-dog behavior generation model is provided at the following link: huggingface.co/datasets/ArtemLykov/CognitiveDog_dataset

arxiv情報

著者	Artem Lykov,Mikhail Litvinov,Mikhail Konenkov,Rinat Prochii,Nikita Burtsev,Ali Alridha Abdulkarim,Artem Bazhenov,Vladimir Berman,Dzmitry Tsetserukou
発行日	2024-01-17 18:01:24+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

CognitiveDog: Large Multimodal Model Based System to Translate Vision and Language into Action of Quadruped Robot

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー