CASAPose: Class-Adaptive and Semantic-Aware Multi-Object Pose Estimation

要約

拡張現実やロボット工学の分野でのアプリケーションでは、多くの場合、複数のオブジェクトの共同ローカリゼーションと 6D 姿勢推定が必要です。
ただし、ほとんどのアルゴリズムでは、最良の結果を得るために、オブジェクトクラスごとに 1 つのネットワークをトレーニングする必要があります。
すべての可視オブジェクトを分析するには、複数の推論が必要であり、メモリと時間がかかります。
CASAPose と呼ばれる新しいシングルステージアーキテクチャを提示します。これは、1 つのパスで RGB イメージ内の複数の異なるオブジェクトのポーズ推定のための 2D-3D 対応を決定します。
これは高速でメモリ効率が高く、セマンティックセグメンテーションデコーダーの出力をローカルクラス適応正規化によるキーポイント認識デコーダーへの制御入力として活用することで、複数のオブジェクトに対して高い精度を実現します。
キーポイント位置の新しい微分可能な回帰は、実際のテストデータと合成トレーニングデータの間のドメインギャップをより迅速に埋めるのに大きく貢献します。
セグメンテーションを意識した畳み込みとアップサンプリング操作を適用して、オブジェクトマスク内のフォーカスを増やし、オクルージョンオブジェクトの相互干渉を減らします。
挿入されたオブジェクトごとに、ネットワークは 1 つの出力セグメンテーションマップとごくわずかな数のパラメーターだけ増加します。
オブジェクト間のオクルージョンと合成トレーニングを使用して、挑戦的なマルチオブジェクトシーンで最先端のアプローチを凌駕します。

要約(オリジナル)

Applications in the field of augmented reality or robotics often require joint localisation and 6D pose estimation of multiple objects. However, most algorithms need one network per object class to be trained in order to provide the best results. Analysing all visible objects demands multiple inferences, which is memory and time-consuming. We present a new single-stage architecture called CASAPose that determines 2D-3D correspondences for pose estimation of multiple different objects in RGB images in one pass. It is fast and memory efficient, and achieves high accuracy for multiple objects by exploiting the output of a semantic segmentation decoder as control input to a keypoint recognition decoder via local class-adaptive normalisation. Our new differentiable regression of keypoint locations significantly contributes to a faster closing of the domain gap between real test and synthetic training data. We apply segmentation-aware convolutions and upsampling operations to increase the focus inside the object mask and to reduce mutual interference of occluding objects. For each inserted object, the network grows by only one output segmentation map and a negligible number of parameters. We outperform state-of-the-art approaches in challenging multi-object scenes with inter-object occlusion and synthetic training.

arxiv情報

著者	Niklas Gard,Anna Hilsmann,Peter Eisert
発行日	2022-10-17 13:40:46+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

CASAPose: Class-Adaptive and Semantic-Aware Multi-Object Pose Estimation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー