Learning Optimal Multimodal Information Bottleneck Representations

要約

マルチモーダルデータから高品質のジョイント表現を活用すると、さまざまな機械学習ベースのアプリケーションでモデルのパフォーマンスを大幅に向上させることができます。
マルチモーダル情報ボトルネック（MIB）原理に基づく最近のマルチモーダル学習方法は、最大のタスク関連情報と正規化による最小限の余分な情報を含む最適なMIBを生成することを目的としています。
ただし、これらの方法は、多くの場合、アドホックな正則化の重みを設定し、モダリティ全体で不均衡なタスク関連情報を見落とし、最適なMIBを達成する能力を制限します。
このギャップに対処するために、新しいマルチモーダル学習フレームワークである最適なマルチモーダル情報ボトルネック（OMIB）を提案します。その最適化目標は、理論的に導出されたバウンド内で正規化重量を設定することにより、最適なMIBの達成可能性を保証します。
Omibはさらに、モダリティごとに正規化の重みを動的に調整し、すべてのタスク関連情報を含めることを促進することにより、不均衡なタスク関連情報に対処します。
さらに、Omibの最適化のための強固な情報理論的基盤を確立し、計算効率のための変分近似フレームワークの下でそれを実装します。
最後に、合成データに関するOmibの理論的特性を経験的に検証し、さまざまな下流タスクにおける最先端のベンチマーク方法に対する優位性を実証します。

要約(オリジナル)

Leveraging high-quality joint representations from multimodal data can greatly enhance model performance in various machine-learning based applications. Recent multimodal learning methods, based on the multimodal information bottleneck (MIB) principle, aim to generate optimal MIB with maximal task-relevant information and minimal superfluous information via regularization. However, these methods often set ad hoc regularization weights and overlook imbalanced task-relevant information across modalities, limiting their ability to achieve optimal MIB. To address this gap, we propose a novel multimodal learning framework, Optimal Multimodal Information Bottleneck (OMIB), whose optimization objective guarantees the achievability of optimal MIB by setting the regularization weight within a theoretically derived bound. OMIB further addresses imbalanced task-relevant information by dynamically adjusting regularization weights per modality, promoting the inclusion of all task-relevant information. Moreover, we establish a solid information-theoretical foundation for OMIB’s optimization and implement it under the variational approximation framework for computational efficiency. Finally, we empirically validate the OMIB’s theoretical properties on synthetic data and demonstrate its superiority over the state-of-the-art benchmark methods in various downstream tasks.

arxiv情報

著者	Qilong Wu,Yiyang Shao,Jun Wang,Xiaobo Sun
発行日	2025-05-26 13:48:07+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Learning Optimal Multimodal Information Bottleneck Representations

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー