Semantic Map-based Generation of Navigation Instructions

要約

私たちは、ナビゲーション命令自体の生成、またはロボットナビゲーションタスクのトレーニング教材としてのナビゲーション命令の生成に興味を持っています。
この論文では、視覚入力としてセマンティックマップを使用する画像キャプションタスクとして問題を組み立てることにより、ナビゲーション命令生成への新しいアプローチを提案します。
従来のアプローチでは、一連のパノラマ画像を使用してナビゲーション指示を生成します。
セマンティックマップは視覚的な詳細を抽象化し、複数のパノラマ画像の情報を単一のトップダウン表現に融合することで、入力を処理するための計算の複雑さを軽減します。
セマンティックマップを使用した命令生成のベンチマークデータセットを提示し、初期モデルを提案し、人間の被験者に生成された命令の品質を手動で評価するよう依頼します。
私たちの初期調査では、命令生成に一連のパノラマ画像の代わりにセマンティックマップを使用することが有望であることが示されていますが、改善の余地は膨大にあります。
データ準備とモデルトレーニング用のコードを https://github.com/chengzu-li/VLGen でリリースします。

要約(オリジナル)

We are interested in the generation of navigation instructions, either in their own right or as training material for robotic navigation task. In this paper, we propose a new approach to navigation instruction generation by framing the problem as an image captioning task using semantic maps as visual input. Conventional approaches employ a sequence of panorama images to generate navigation instructions. Semantic maps abstract away from visual details and fuse the information in multiple panorama images into a single top-down representation, thereby reducing computational complexity to process the input. We present a benchmark dataset for instruction generation using semantic maps, propose an initial model and ask human subjects to manually assess the quality of generated instructions. Our initial investigations show promise in using semantic maps for instruction generation instead of a sequence of panorama images, but there is vast scope for improvement. We release the code for data preparation and model training at https://github.com/chengzu-li/VLGen.

arxiv情報

著者	Chengzu Li,Chao Zhang,Simone Teufel,Rama Sanand Doddipatla,Svetlana Stoyanchev
発行日	2024-03-28 17:27:44+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Semantic Map-based Generation of Navigation Instructions

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー