Exemplar Masking for Multimodal Incremental Learning

要約

マルチモーダル増分学習では、以前に学習した情報を忘れることなく、新しい知識を同時に学習しながら、複数のモダリティからの情報を消化する必要があります。
このタスクには数多くの課題があり、主に、イグザンプラベースの手法におけるマルチモーダルデータのストレージサイズが大きくなることや、巨大なマルチモーダルモデルを微調整するための計算要件が挙げられます。
この論文では、パラメータ効率の高い調整スキームを活用して微調整の負担を軽減し、古い知識を効率的に再生するための例示的なマスキングフレームワークを提案します。
具体的には、重要でないトークンは、アテンションの重みとさまざまなモダリティ間の相関関係に基づいてマスクされ、イグザンプラのストレージサイズが大幅に削減され、その結果、同じメモリバッファーの下により多くのイグザンプラが保存されます。
さらに、事前知識を再現するためのサンプルを多様化するためのマルチモーダルなデータ拡張手法を設計します。
実験では、既存のマルチモーダルデータセットでメソッドを評価するだけでなく、ImageNet-R データセットを現実世界のアプリケーションとしてマルチモーダルデータセットに拡張します。キャプションは、マルチモーダル大規模言語モデル (InstructBLIP など) をクエリすることによって生成されます。
広範な実験により、同じ限られたメモリバッファの下で、私たちの模範的なマスキングフレームワークがより効率的で壊滅的な忘却に対して堅牢であることが示されました。
コードは https://github.com/YiLunLee/Exemplar_Masking_MCIL で入手できます。

要約(オリジナル)

Multimodal incremental learning needs to digest the information from multiple modalities while concurrently learning new knowledge without forgetting the previously learned information. There are numerous challenges for this task, mainly including the larger storage size of multimodal data in exemplar-based methods and the computational requirement of finetuning on huge multimodal models. In this paper, we leverage the parameter-efficient tuning scheme to reduce the burden of fine-tuning and propose the exemplar masking framework to efficiently replay old knowledge. Specifically, the non-important tokens are masked based on the attention weights and the correlation across different modalities, significantly reducing the storage size of an exemplar and consequently saving more exemplars under the same memory buffer. Moreover, we design a multimodal data augmentation technique to diversify exemplars for replaying prior knowledge. In experiments, we not only evaluate our method in existing multimodal datasets but also extend the ImageNet-R dataset to a multimodal dataset as a real-world application, where captions are generated by querying multimodal large language models (e.g., InstructBLIP). Extensive experiments show that our exemplar masking framework is more efficient and robust to catastrophic forgetting under the same limited memory buffer. Code is available at https://github.com/YiLunLee/Exemplar_Masking_MCIL.

arxiv情報

著者	Yi-Lun Lee,Chen-Yu Lee,Wei-Chen Chiu,Yi-Hsuan Tsai
発行日	2024-12-12 18:40:20+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Exemplar Masking for Multimodal Incremental Learning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー