Modality-Inconsistent Continual Learning of Multimodal Large Language Models

要約

このペーパーでは、一貫性のないモダリティ (画像、音声、またはビデオ) とさまざまなタスクタイプ (キャプションまたは質問) を含むマルチモーダル大規模言語モデル (MLLM) の新しい継続的学習シナリオであるモダリティ非一貫性継続学習 (MICL) を紹介します。
-答える）。
既存の視覚のみやモダリティ増分設定とは異なり、MICL はモダリティとタスクタイプのシフトを組み合わせており、どちらも壊滅的な物忘れを引き起こします。
これらの課題に対処するために、私たちは MoInCL を提案します。これは、擬似ターゲット生成モジュールを採用して、これまでに見られたモダリティにおけるタスクタイプの変化によって引き起こされる忘却を軽減します。
また、新しいモダリティが導入されたときに、以前に学習したモダリティを処理するモデルの能力を維持するための命令ベースの知識蒸留も組み込まれています。
合計 6 つのタスクを使用して MICL のベンチマークを行い、提案した MoInCL の有効性を検証するための実験を実施します。
実験結果は MoInCL の優位性を強調しており、代表的な最先端の継続学習ベースラインに比べて大幅な改善が示されています。

要約(オリジナル)

In this paper, we introduce Modality-Inconsistent Continual Learning (MICL), a new continual learning scenario for Multimodal Large Language Models (MLLMs) that involves tasks with inconsistent modalities (image, audio, or video) and varying task types (captioning or question-answering). Unlike existing vision-only or modality-incremental settings, MICL combines modality and task type shifts, both of which drive catastrophic forgetting. To address these challenges, we propose MoInCL, which employs a Pseudo Targets Generation Module to mitigate forgetting caused by task type shifts in previously seen modalities. It also incorporates Instruction-based Knowledge Distillation to preserve the model’s ability to handle previously learned modalities when new ones are introduced. We benchmark MICL using a total of six tasks and conduct experiments to validate the effectiveness of our proposed MoInCL. The experimental results highlight the superiority of MoInCL, showing significant improvements over representative and state-of-the-art continual learning baselines.

arxiv情報

著者	Weiguo Pian,Shijian Deng,Shentong Mo,Yunhui Guo,Yapeng Tian
発行日	2024-12-17 16:13:56+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Modality-Inconsistent Continual Learning of Multimodal Large Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー