FE-Adapter: Adapting Image-based Emotion Classifiers to Videos

要約

特定のタスクに大規模な事前トレーニング済みモデルを利用することで、素晴らしい結果が得られました。
ただし、これらのますます大規模化するモデルを完全に微調整すると、リソースが過度に消費されるようになります。
これにより、主に同じモダリティ内で、よりパラメータ効率の高い転移学習に焦点が当てられるようになりました。
しかし、このアプローチには限界があり、特に適切な事前トレーニング済みモデルがあまり一般的ではないビデオ理解においては限界があります。
これに対処するために、私たちの研究では、画像からビデオへの新しいクロスモダリティ転移学習アプローチを導入しています。これは、パラメーター効率の高い画像からビデオへの転移学習と呼ばれています。
ビデオタスクで効率的に微調整できるように設計された顔感情アダプター (FE アダプター) を紹介します。
このアダプターを使用すると、従来は時間処理機能が不足していた事前トレーニング済みの画像モデルが、動的なビデオコンテンツを効率的に分析できるようになります。
特に、精度を向上させながら、以前の方法よりも約 15 倍少ないパラメータを使用します。
ビデオ感情認識における私たちの実験では、FE アダプターがパフォーマンスと効率の両方で既存の微調整モデルとビデオ感情モデルに匹敵するか、それを上回ることができることを実証しました。
この画期的な進歩は、特に効率と精度への要求が常に高まっているビデオ感情分析などの分野において、AI モデルの機能を強化するためのクロスモダリティアプローチの可能性を浮き彫りにしています。

要約(オリジナル)

Utilizing large pre-trained models for specific tasks has yielded impressive results. However, fully fine-tuning these increasingly large models is becoming prohibitively resource-intensive. This has led to a focus on more parameter-efficient transfer learning, primarily within the same modality. But this approach has limitations, particularly in video understanding where suitable pre-trained models are less common. Addressing this, our study introduces a novel cross-modality transfer learning approach from images to videos, which we call parameter-efficient image-to-video transfer learning. We present the Facial-Emotion Adapter (FE-Adapter), designed for efficient fine-tuning in video tasks. This adapter allows pre-trained image models, which traditionally lack temporal processing capabilities, to analyze dynamic video content efficiently. Notably, it uses about 15 times fewer parameters than previous methods, while improving accuracy. Our experiments in video emotion recognition demonstrate that the FE-Adapter can match or even surpass existing fine-tuning and video emotion models in both performance and efficiency. This breakthrough highlights the potential for cross-modality approaches in enhancing the capabilities of AI models, particularly in fields like video emotion analysis where the demand for efficiency and accuracy is constantly rising.

arxiv情報

著者	Shreyank N Gowda,Boyan Gao,David A. Clifton
発行日	2024-08-05 12:27:28+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

FE-Adapter: Adapting Image-based Emotion Classifiers to Videos

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー