CoBEVT: Cooperative Bird’s Eye View Semantic Segmentation with Sparse Transformers

要約

鳥瞰図（BEV）セマンティックセグメンテーションは、自律走行のための空間センシングにおいて重要な役割を果たす。最近の文献では、BEVマップ理解について大きな進展があったが、それらはすべて単一エージェントカメラに基づくシステムであり、複雑な交通シーンにおけるオクルージョンの処理や遠方のオブジェクトの検出が困難であった。車車間（V2V）通信技術により、自律走行車はセンシング情報を共有できるようになり、シングルエージェントシステムと比較して、認識性能と検出範囲を劇的に向上させることができる。本論文では、BEVマップ予測を協調的に生成できる、初の汎用マルチエージェント・マルチカメラ知覚フレームワークであるCoBEVTを提案する。マルチビューとマルチエージェントデータからのカメラ特徴を、基礎となるTransformerアーキテクチャにおいて効率的に融合するために、我々は融合軸性注意（FAXモジュール）を設計し、ビューとエージェント間の疎なローカルとグローバル空間相互作用を捕らえることができるようにする。V2V知覚データセットであるOPV2Vを用いた広範な実験により、CoBEVTは協調的なBEV意味分割において最先端の性能を達成することが示された。さらに、CoBEVTは、1) シングルエージェントのマルチカメラを用いたBEVセグメンテーション、2) マルチエージェントのLiDARシステムを用いた3次元物体検出など、他のタスクにも一般化可能であり、リアルタイム推論速度で最先端の性能を達成することが示された。

要約(オリジナル)

Bird’s eye view (BEV) semantic segmentation plays a crucial role in spatial sensing for autonomous driving. Although recent literature has made significant progress on BEV map understanding, they are all based on single-agent camera-based systems which are difficult to handle occlusions and detect distant objects in complex traffic scenes. Vehicle-to-Vehicle (V2V) communication technologies have enabled autonomous vehicles to share sensing information, which can dramatically improve the perception performance and range as compared to single-agent systems. In this paper, we propose CoBEVT, the first generic multi-agent multi-camera perception framework that can cooperatively generate BEV map predictions. To efficiently fuse camera features from multi-view and multi-agent data in an underlying Transformer architecture, we design a fused axial attention or FAX module, which can capture sparsely local and global spatial interactions across views and agents. The extensive experiments on the V2V perception dataset, OPV2V, demonstrate that CoBEVT achieves state-of-the-art performance for cooperative BEV semantic segmentation. Moreover, CoBEVT is shown to be generalizable to other tasks, including 1) BEV segmentation with single-agent multi-camera and 2) 3D object detection with multi-agent LiDAR systems, and achieves state-of-the-art performance with real-time inference speed.

arxiv情報

著者	Runsheng Xu,Zhengzhong Tu,Hao Xiang,Wei Shao,Bolei Zhou,Jiaqi Ma
発行日	2022-07-05 17:59:28+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

CoBEVT: Cooperative Bird’s Eye View Semantic Segmentation with Sparse Transformers

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー