Learning Appearance and Motion Cues for Panoptic Tracking

要約

パノプティック追跡により、パノプティックセグメンテーションでインスタンストラッキングを統合することにより、ビデオのピクセルレベルのシーン解釈が可能になります。
これにより、ロボットは環境の時空間的理解を提供します。これは、動的環境での動作に不可欠な属性です。
この論文では、一般的なセマンティック情報とインスタンス固有の外観と動きの特徴を同時にキャプチャするパノプティック追跡のための新しいアプローチを提案します。
動的なシーンの属性を見落とす既存の方法とは異なり、当社のアプローチは、専用のネットワークヘッドを介して外観とモーションキューの両方を活用します。
これらの相互接続されたヘッドは、セマンティックコンテキストとモーション強化された外観機能を備えたシーンモーションオフセットについての理由で、埋め込みを追跡することを学習するマルチスケールの変形可能な畳み込みを採用しています。
さらに、現在のタイムステップから最初のマッチングインスタンスと以前の時間ステップから伝播されたインスタンスを一致させることにより、両方のヘッドからの出力を統合し、その後、モーション強化の外観埋め込みを使用して関連性を改善し、挑戦的なシナリオの堅牢性を改善する、両方のヘッドからの出力を統合する新しい2段階融合モジュールを導入します。
2つのベンチマークデータセットで提案されている\ NetNameモデルの広範な評価は、パノプティック追跡の精度で最先端のパフォーマンスを達成し、時間の経過とともにオブジェクトのアイデンティティを維持する以前の方法を上回ることを示しています。
将来の研究を促進するために、http://panoptictracking.cs.uni-freiburg.deでコードを利用できるようにします

要約(オリジナル)

Panoptic tracking enables pixel-level scene interpretation of videos by integrating instance tracking in panoptic segmentation. This provides robots with a spatio-temporal understanding of the environment, an essential attribute for their operation in dynamic environments. In this paper, we propose a novel approach for panoptic tracking that simultaneously captures general semantic information and instance-specific appearance and motion features. Unlike existing methods that overlook dynamic scene attributes, our approach leverages both appearance and motion cues through dedicated network heads. These interconnected heads employ multi-scale deformable convolutions that reason about scene motion offsets with semantic context and motion-enhanced appearance features to learn tracking embeddings. Furthermore, we introduce a novel two-step fusion module that integrates the outputs from both heads by first matching instances from the current time step with propagated instances from previous time steps and subsequently refines associations using motion-enhanced appearance embeddings, improving robustness in challenging scenarios. Extensive evaluations of our proposed \netname model on two benchmark datasets demonstrate that it achieves state-of-the-art performance in panoptic tracking accuracy, surpassing prior methods in maintaining object identities over time. To facilitate future research, we make the code available at http://panoptictracking.cs.uni-freiburg.de

arxiv情報

著者	Juana Valeria Hurtado,Sajad Marvi,Rohit Mohan,Abhinav Valada
発行日	2025-03-12 09:32:29+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Learning Appearance and Motion Cues for Panoptic Tracking

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー