Multimodal Learning Without Labeled Multimodal Data: Guarantees and Applications

要約

複数のモダリティから共同学習する多くの機械学習システムでは、中心的な研究課題は、マルチモーダル相互作用の性質を理解することです。つまり、モダリティをどのように組み合わせて、どちらか単独では存在しなかった新しいタスク関連情報を提供する方法を理解することです。
私たちは、ラベル付けされた単峰性データと、自然に共生する多峰性データ（例：ラベル付けされていない画像とキャプション、ビデオと対応する音声）のみを使用して、半教師あり設定でインタラクション定量化というこの課題を研究しますが、ラベル付けには時間がかかる場合があります。
相互作用の正確な情報理論的定義を使用して、私たちの主な貢献は、この半教師あり設定における多峰性相互作用の量を定量化するための下限と上限を導出したことです。
我々は 2 つの下限を提案します。1 つはモダリティ間の共有情報に基づくもの、もう 1 つは個別にトレーニングされた単峰性分類器間の不一致に基づくもので、最小エントロピー結合の近似アルゴリズムへの接続を通じて上限を導出します。
これらの推定境界を検証し、実際の相互作用をどのように正確に追跡するかを示します。
最後に、これらの理論的結果を使用してマルチモーダルモデルのパフォーマンスを推定し、データ収集をガイドし、さまざまなタスクに適切なマルチモーダルモデルを選択する方法を示します。

要約(オリジナル)

In many machine learning systems that jointly learn from multiple modalities, a core research question is to understand the nature of multimodal interactions: how modalities combine to provide new task-relevant information that was not present in either alone. We study this challenge of interaction quantification in a semi-supervised setting with only labeled unimodal data and naturally co-occurring multimodal data (e.g., unlabeled images and captions, video and corresponding audio) but when labeling them is time-consuming. Using a precise information-theoretic definition of interactions, our key contribution is the derivation of lower and upper bounds to quantify the amount of multimodal interactions in this semi-supervised setting. We propose two lower bounds: one based on the shared information between modalities and the other based on disagreement between separately trained unimodal classifiers, and derive an upper bound through connections to approximate algorithms for min-entropy couplings. We validate these estimated bounds and show how they accurately track true interactions. Finally, we show how these theoretical results can be used to estimate multimodal model performance, guide data collection, and select appropriate multimodal models for various tasks.

arxiv情報

著者	Paul Pu Liang,Chun Kai Ling,Yun Cheng,Alex Obolenskiy,Yudong Liu,Rohan Pandey,Alex Wilf,Louis-Philippe Morency,Ruslan Salakhutdinov
発行日	2024-06-13 17:05:54+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Multimodal Learning Without Labeled Multimodal Data: Guarantees and Applications

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー