Cannot See the Forest for the Trees: Aggregating Multiple Viewpoints to Better Classify Objects in Videos

要約

最近、ロングテール認識とオブジェクト追跡の両方が個別に大きな進歩を遂げました。
TAOベンチマークは、現実世界の側面をさらに反映するために、2つのロングテールオブジェクトトラッキングの混合を提示しました。
現在まで、既存のソリューションは、フレームごとの結果を導き出すロングテール分布でロバスト性を示す検出器を採用しています。
次に、時間的に独立した検出を組み合わせてトラックレットを完成させる追跡アルゴリズムを使用しました。
ただし、アプローチではシーンの時間的変化が考慮されていないため、ビデオの分類結果に一貫性がないと、全体的なパフォーマンスが低下しました。
本論文では、トラックレットに含まれる複数の視点からの情報を集約することにより、トラックレットの分類の精度を向上させるセット分類器を提示します。
ビデオのスパースアノテーションに対処するために、データ効率を最大化できるトラックレットの拡張をさらに提案します。
セット分類子は、既存のオブジェクトトラッカーにプラグアンドプレイ可能であり、ロングテールオブジェクトトラッキングのパフォーマンスを大幅に向上させます。
ResNet-101の上にあるQDTrackにメソッドをアタッチするだけで、TAO検証セットとテストセットでそれぞれ19.9％と15.7％の新しい最先端のTrackAP_50を実現します。

要約(オリジナル)

Recently, both long-tailed recognition and object tracking have made great advances individually. TAO benchmark presented a mixture of the two, long-tailed object tracking, in order to further reflect the aspect of the real-world. To date, existing solutions have adopted detectors showing robustness in long-tailed distributions, which derive per-frame results. Then, they used tracking algorithms that combine the temporally independent detections to finalize tracklets. However, as the approaches did not take temporal changes in scenes into account, inconsistent classification results in videos led to low overall performance. In this paper, we present a set classifier that improves accuracy of classifying tracklets by aggregating information from multiple viewpoints contained in a tracklet. To cope with sparse annotations in videos, we further propose augmentation of tracklets that can maximize data efficiency. The set classifier is plug-and-playable to existing object trackers, and highly improves the performance of long-tailed object tracking. By simply attaching our method to QDTrack on top of ResNet-101, we achieve the new state-of-the-art, 19.9% and 15.7% TrackAP_50 on TAO validation and test sets, respectively.

arxiv情報

著者	Sukjun Hwang,Miran Heo,Seoung Wug Oh,Seon Joo Kim
発行日	2022-06-05 07:51:58+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Cannot See the Forest for the Trees: Aggregating Multiple Viewpoints to Better Classify Objects in Videos

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー