Adaptive Perception for Unified Visual Multi-modal Object Tracking

要約

最近、多くのマルチモーダルトラッカーがRGBを支配的なモダリティとして優先し、他のモダリティを補助として扱い、さまざまなマルチモーダルタスクを個別に微調整します。
モダリティ依存性におけるこの不均衡は、複雑なシナリオで各モダリティから補完的な情報を動的に利用する方法の能力を制限し、マルチモーダルの利点を完全に知覚することが困難になります。
その結果、統一されたパラメーターモデルは、さまざまなマルチモーダル追跡タスクでしばしばパフォーマンスを低下させます。
この問題に対処するために、マルチモーダル適応認識向けに設計された新しい統一トラッカーであるAptrackを提案します。
以前の方法とは異なり、Aptrackは、平等なモデリング戦略を通じて統一された表現を調査します。
この戦略により、モデルは、異なるタスク間の追加の微調整を必要とせずに、さまざまなモダリティやタスクに動的に適応することができます。
さらに、トラッカーは、学習可能なトークンを生成することでクロスモダリティインタラクションを効率的に橋渡しする適応型モダリティインタラクション（AMI）モジュールを統合します。
5つの多様なマルチモーダルデータセット（RGBT234、ラッシャー、ヴィセベント、深さトラック、およびLot-RGBD2022）で実施された実験は、Aptrackが既存の最先端の統合マルチモーダルトラッカーを上回っているだけでなく、特定のマルチのために設計されたマルチのために設計されたトラッカーよりも優れていることを示しています。
– モーダルタスク。

要約(オリジナル)

Recently, many multi-modal trackers prioritize RGB as the dominant modality, treating other modalities as auxiliary, and fine-tuning separately various multi-modal tasks. This imbalance in modality dependence limits the ability of methods to dynamically utilize complementary information from each modality in complex scenarios, making it challenging to fully perceive the advantages of multi-modal. As a result, a unified parameter model often underperforms in various multi-modal tracking tasks. To address this issue, we propose APTrack, a novel unified tracker designed for multi-modal adaptive perception. Unlike previous methods, APTrack explores a unified representation through an equal modeling strategy. This strategy allows the model to dynamically adapt to various modalities and tasks without requiring additional fine-tuning between different tasks. Moreover, our tracker integrates an adaptive modality interaction (AMI) module that efficiently bridges cross-modality interactions by generating learnable tokens. Experiments conducted on five diverse multi-modal datasets (RGBT234, LasHeR, VisEvent, DepthTrack, and VOT-RGBD2022) demonstrate that APTrack not only surpasses existing state-of-the-art unified multi-modal trackers but also outperforms trackers designed for specific multi-modal tasks.

arxiv情報

著者	Xiantao Hu,Bineng Zhong,Qihua Liang,Zhiyi Mo,Liangtao Shi,Ying Tai,Jian Yang
発行日	2025-02-10 15:50:26+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Adaptive Perception for Unified Visual Multi-modal Object Tracking

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー