Voxelized 3D Feature Aggregation for Multiview Detection

要約

マルチビュー検出は、混雑したシーンでのオクルージョンを軽減するために、複数のカメラビューを取り入れるものである。しかし、これらの2次元変換は物体の高さを考慮していないため、同じ物体の垂直方向に沿った特徴量は同じ地平面上に投影されない可能性が高く、不純な地平面特徴量となることが分かっています。この問題を解決するために、多視点検出における特徴変換と集計のためのVFA（voxelized 3D feature aggregation）を提案する。具体的には、3次元空間をボクセル化し、そのボクセルを各カメラビューに投影し、その投影されたボクセルに2次元特徴を関連付ける。これにより、同一垂直線上にある2次元特徴量を特定・集約することができ、投影歪みを大幅に軽減することができる。また、人と牛という異なる種類の物体が地表面上で異なる形状を持つことから、その形状に合わせた方向性ガウス符号化を導入し、精度と効率の向上を図っている。多視点2D検出と多視点3D検出の問題について実験を行った。4つのデータセット（新たに導入したMultiviewCデータセットを含む）の結果、本システムは最先端のアプローチと比較して非常に競争力があることが示された。コードとデータはオープンソース化されます。コードとMultiviewCはhttps://github.com/Robert-Mar/VFA で公開されています。

要約(オリジナル)

Multi-view detection incorporates multiple camera views to alleviate occlusion in crowded scenes, where the state-of-the-art approaches adopt homography transformations to project multi-view features to the ground plane. However, we find that these 2D transformations do not take into account the object’s height, and with this neglection features along the vertical direction of same object are likely not projected onto the same ground plane point, leading to impure ground-plane features. To solve this problem, we propose VFA, voxelized 3D feature aggregation, for feature transformation and aggregation in multi-view detection. Specifically, we voxelize the 3D space, project the voxels onto each camera view, and associate 2D features with these projected voxels. This allows us to identify and then aggregate 2D features along the same vertical line, alleviating projection distortions to a large extent. Additionally, because different kinds of objects (human vs. cattle) have different shapes on the ground plane, we introduce the oriented Gaussian encoding to match such shapes, leading to increased accuracy and efficiency. We perform experiments on multiview 2D detection and multiview 3D detection problems. Results on four datasets (including a newly introduced MultiviewC dataset) show that our system is very competitive compared with the state-of-the-art approaches. %Our code and data will be open-sourced.Code and MultiviewC are released at https://github.com/Robert-Mar/VFA.

arxiv情報

著者	Jiahao Ma,Jinguang Tong,Shan Wang,Wei Zhao,Zicheng Duan,Chuong Nguyen
発行日	2023-01-04 06:02:15+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

Voxelized 3D Feature Aggregation for Multiview Detection

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー