M-FUSE: Multi-frame Fusion for Scene Flow Estimation

要約

最近、シーンフロー推定用のニューラルネットワークは、KITTI ベンチマークなどの自動車データで印象的な結果を示しています。
ただし、洗練された剛性の仮定とパラメーター化を使用しているにもかかわらず、そのようなネットワークは通常、時間情報を利用できない 2 つのフレームペアのみに制限されます。
私たちの論文では、追加の先行ステレオペアを考慮する新しいマルチフレームアプローチを提案することにより、この欠点に対処します。
この目的のために、2 つのステップで進めます。まず、最近の RAFT-3D アプローチに基づいて、高度なステレオ法を組み込むことにより、改善された 2 フレームベースラインを開発します。
次に、さらに重要なこととして、RAFT-3D の特定のモデリング概念を活用して、順方向と逆方向のフロー推定値の融合を実行し、オンデマンドで時間情報を統合できるようにする U-Net アーキテクチャを提案します。
KITTI ベンチマークでの実験は、改善されたベースラインと時間融合アプローチの利点が互いに補完し合うことを示すだけでなく、計算されたシーンフローが非常に正確であることも示しています。
より正確には、私たちのアプローチは全体で 2 番目にランク付けされ、さらに困難な前景オブジェクトでは 1 位にランクされ、元の RAFT-3D メソッドを合計で 16% 以上上回っています。
コードは https://github.com/cv-stuttgart/M-FUSE で入手できます。

要約(オリジナル)

Recently, neural network for scene flow estimation show impressive results on automotive data such as the KITTI benchmark. However, despite of using sophisticated rigidity assumptions and parametrizations, such networks are typically limited to only two frame pairs which does not allow them to exploit temporal information. In our paper we address this shortcoming by proposing a novel multi-frame approach that considers an additional preceding stereo pair. To this end, we proceed in two steps: Firstly, building upon the recent RAFT-3D approach, we develop an improved two-frame baseline by incorporating an advanced stereo method. Secondly, and even more importantly, exploiting the specific modeling concepts of RAFT-3D, we propose a U-Net architecture that performs a fusion of forward and backward flow estimates and hence allows to integrate temporal information on demand. Experiments on the KITTI benchmark do not only show that the advantages of the improved baseline and the temporal fusion approach complement each other, they also demonstrate that the computed scene flow is highly accurate. More precisely, our approach ranks second overall and first for the even more challenging foreground objects, in total outperforming the original RAFT-3D method by more than 16%. Code is available at https://github.com/cv-stuttgart/M-FUSE.

arxiv情報

著者	Lukas Mehl,Azin Jahedi,Jenny Schmalfuss,Andrés Bruhn
発行日	2022-10-28 09:22:46+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

M-FUSE: Multi-frame Fusion for Scene Flow Estimation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー