Trans4Map: Revisiting Holistic Top-down Mapping from Egocentric Images to Allocentric Semantics with Vision Transformers

要約

人間は、自己中心的な知覚から空間表現を抽出し、空間変換と記憶の更新を介して同種中心のセマンティックマップを形成できるため、周囲を感知する生来の能力を持っています。
ただし、このような空間センシング機能をモバイルエージェントに与えることは、2つの問題があるため、依然として課題です。（1）以前の畳み込みモデルは、局所受容野によって制限されているため、観察中に全体的な長距離依存関係をキャプチャするのに苦労しています。
（2）成功に必要な過剰な計算予算は、多くの場合、マッピングパイプラインを段階に分離し、マッピングプロセス全体を非効率的にします。
これらの問題に対処するために、Trans4Mapと呼ばれるマッピング用のエンドツーエンドの1ステージTransformerベースのフレームワークを提案します。
自己中心性からアロセントリックへのマッピングプロセスには、次の3つのステップが含まれます。（1）効率的なトランスフォーマーは、自己中心性画像のバッチからコンテキストの特徴を抽出します。
（2）提案された双方向アロセントリックメモリ（BAM）モジュールは、エゴセントリック機能をアロセントリックメモリに投影します。
（3）マップデコーダーは、蓄積されたメモリを解析し、トップダウンのセマンティックセグメンテーションマップを予測します。
対照的に、Trans4Mapは最先端の結果を達成し、67.2％のパラメーターを削減しますが、Matterport3Dデータセットで+ 3.25％mIoUと+ 4.09％mBF1の改善を実現します。
コードはhttps://github.com/jamycheung/Trans4Mapで公開されます。

要約(オリジナル)

Humans have an innate ability to sense their surroundings, as they can extract the spatial representation from the egocentric perception and form an allocentric semantic map via spatial transformation and memory updating. However, endowing mobile agents with such a spatial sensing ability is still a challenge, due to two difficulties: (1) the previous convolutional models are limited by the local receptive field, thus, struggling to capture holistic long-range dependencies during observation; (2) the excessive computational budgets required for success, often lead to a separation of the mapping pipeline into stages, resulting the entire mapping process inefficient. To address these issues, we propose an end-to-end one-stage Transformer-based framework for Mapping, termed Trans4Map. Our egocentric-to-allocentric mapping process includes three steps: (1) the efficient transformer extracts the contextual features from a batch of egocentric images; (2) the proposed Bidirectional Allocentric Memory (BAM) module projects egocentric features into the allocentric memory; (3) the map decoder parses the accumulated memory and predicts the top-down semantic segmentation map. In contrast, Trans4Map achieves state-of-the-art results, reducing 67.2% parameters, yet gaining a +3.25% mIoU and a +4.09% mBF1 improvements on the Matterport3D dataset. Code will be made publicly available at https://github.com/jamycheung/Trans4Map.

arxiv情報

著者	Chang Chen,Jiaming Zhang,Kailun Yang,Kunyu Peng,Rainer Stiefelhagen
発行日	2022-07-13 14:01:00+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Trans4Map: Revisiting Holistic Top-down Mapping from Egocentric Images to Allocentric Semantics with Vision Transformers

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー