Towards Good Practices for Missing Modality Robust Action Recognition

要約

標準的なマルチモーダルモデルは、トレーニング段階と推論段階で同じモダリティを使用することを前提としています。
ただし、実際には、マルチモーダルモデルが動作する環境は、このような仮定を満たさない場合があります。
そのため、推論段階でモダリティが欠落している場合、パフォーマンスは大幅に低下します。
モダリティの欠落に対してロバストなモデルをどのようにトレーニングできるでしょうか?
このホワイトペーパーでは、マルチモーダルアクション認識の一連の優れたプラクティスを求めます。特に、推論時に一部のモダリティが利用できない状況に関心があります。
まず、トレーニング中にモデルを効果的に正則化する方法を研究します (例: データ拡張)。
第二に、失われたモダリティに対するロバスト性のための融合方法を調査します。トランスフォーマーベースの融合は、合計や連結よりも失われたモダリティに対して優れたロバスト性を示すことがわかりました。
3 番目に、シンプルなモジュラーネットワーク ActionMAE を提案します。このネットワークは、モダリティの特徴をランダムにドロップすることで欠落しているモダリティの予測コーディングを学習し、残りのモダリティの特徴でそれらを再構築しようとします。
これらの優れたプラクティスを組み合わせることで、マルチモーダルアクション認識に効果的であるだけでなく、モダリティの欠落に対してもロバストなモデルを構築します。
私たちのモデルは、複数のベンチマークで最先端を達成し、モダリティがないシナリオでも競争力のあるパフォーマンスを維持します。
コードは https://github.com/sangminwoo/ActionMAE で入手できます。

要約(オリジナル)

Standard multi-modal models assume the use of the same modalities in training and inference stages. However, in practice, the environment in which multi-modal models operate may not satisfy such assumption. As such, their performances degrade drastically if any modality is missing in the inference stage. We ask: how can we train a model that is robust to missing modalities? This paper seeks a set of good practices for multi-modal action recognition, with a particular interest in circumstances where some modalities are not available at an inference time. First, we study how to effectively regularize the model during training (e.g., data augmentation). Second, we investigate on fusion methods for robustness to missing modalities: we find that transformer-based fusion shows better robustness for missing modality than summation or concatenation. Third, we propose a simple modular network, ActionMAE, which learns missing modality predictive coding by randomly dropping modality features and tries to reconstruct them with the remaining modality features. Coupling these good practices, we build a model that is not only effective in multi-modal action recognition but also robust to modality missing. Our model achieves the state-of-the-arts on multiple benchmarks and maintains competitive performances even in missing modality scenarios. Codes are available at https://github.com/sangminwoo/ActionMAE.

arxiv情報

著者	Sangmin Woo,Sumin Lee,Yeonju Park,Muhammad Adi Nugroho,Changick Kim
発行日	2023-03-30 06:35:24+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Towards Good Practices for Missing Modality Robust Action Recognition

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー