Scene-Driven Multimodal Knowledge Graph Construction for Embodied AI

要約

身体化型 AI は、人工知能とロボット工学で最も人気のある研究の 1 つであり、人間にサービスを提供する現実世界のエージェント (ロボットなど) の知能を効果的に向上させることができます。
シーンの知識は、エージェントが周囲の状況を理解し、変化に富んだオープンワールドで正しい意思決定を行うために重要です。
現在、具体化されたタスクの知識ベースが欠落しており、既存の作業のほとんどは、エージェントの知能を強化するために一般的な知識ベースまたは事前トレーニングされたモデルを使用しています。
従来の知識ベースの場合、データ収集の容量とコストがまばらで不十分です。
事前トレーニングされたモデルの場合、知識の不確実性とハードなメンテナンスに直面します。
シーン知識の課題を克服するために、従来の知識工学と大規模言語モデルを組み合わせたシーン駆動型マルチモーダル知識グラフ (Scene-MMKG) 構築方法を提案します。
知識表現のために、統一されたシーン知識注入フレームワークが導入されています。
提案手法の利点を評価するために、典型的な屋内ロボット機能 (操作と移動性) を考慮して、ManipMob-MMKG と名付けられた Scene-MMKG をインスタンス化します。
特性を比較すると、インスタンス化された ManipMob-MMKG がデータ収集効率と知識の品質において広範囲に優れていることがわかります。
典型的な具体化されたタスクに関する実験結果は、インスタンス化された ManipMob-MMKG を使用した知識強化方法が、モデル構造を複雑に再設計することなく、明らかにパフォーマンスを向上できることを示しています。
私たちのプロジェクトは https://sites.google.com/view/manipmob-mmkg でご覧いただけます。

要約(オリジナル)

Embodied AI is one of the most popular studies in artificial intelligence and robotics, which can effectively improve the intelligence of real-world agents (i.e. robots) serving human beings. Scene knowledge is important for an agent to understand the surroundings and make correct decisions in the varied open world. Currently, knowledge base for embodied tasks is missing and most existing work use general knowledge base or pre-trained models to enhance the intelligence of an agent. For conventional knowledge base, it is sparse, insufficient in capacity and cost in data collection. For pre-trained models, they face the uncertainty of knowledge and hard maintenance. To overcome the challenges of scene knowledge, we propose a scene-driven multimodal knowledge graph (Scene-MMKG) construction method combining conventional knowledge engineering and large language models. A unified scene knowledge injection framework is introduced for knowledge representation. To evaluate the advantages of our proposed method, we instantiate Scene-MMKG considering typical indoor robotic functionalities (Manipulation and Mobility), named ManipMob-MMKG. Comparisons in characteristics indicate our instantiated ManipMob-MMKG has broad superiority in data-collection efficiency and knowledge quality. Experimental results on typical embodied tasks show that knowledge-enhanced methods using our instantiated ManipMob-MMKG can improve the performance obviously without re-designing model structures complexly. Our project can be found at https://sites.google.com/view/manipmob-mmkg

arxiv情報

著者	Song Yaoxian,Sun Penglei,Liu Haoyu,Li Zhixu,Song Wei,Xiao Yanghua,Zhou Xiaofang
発行日	2023-11-07 08:06:27+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Scene-Driven Multimodal Knowledge Graph Construction for Embodied AI

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー