Multimodal Learning Without Labeled Multimodal Data: Guarantees and Applications

要約

複数のモダリティから共同学習する多くの機械学習システムでは、中心的な研究課題は、マルチモーダル相互作用の性質を理解することです。つまり、両方のモダリティからの学習中に、どちらか単独では存在しなかった新しいタスク関連情報が出現することです。
私たちは、ラベル付けされた単峰性データと、自然に共生する多峰性データ（例：ラベル付けされていない画像とキャプション、ビデオと対応する音声）のみを使用して、半教師あり設定でインタラクション定量化というこの課題を研究しますが、ラベル付けには時間がかかる場合があります。
相互作用の正確な情報理論的定義を使用して、私たちの主な貢献は、この半教師あり設定における多峰性相互作用の量を定量化するための下限と上限を導出したことです。
モダリティ間の共有情報の量と、個別にトレーニングされた単峰性分類器間の不一致に基づいて 2 つの下限を提案し、最小エントロピー結合の近似アルゴリズムへの接続を通じて上限を導出します。
これらの推定境界を検証し、実際の相互作用をどのように正確に追跡するかを示します。
最後に、これらの理論的結果に基づいて、2 つの半教師ありマルチモーダルアプリケーションを検討します: (1) マルチモーダルのパフォーマンスと推定される相互作用の間の関係の分析、および (2) 通常行われているような合意を超えたモダリティ間の不一致を受け入れる自己教師あり学習。

要約(オリジナル)

In many machine learning systems that jointly learn from multiple modalities, a core research question is to understand the nature of multimodal interactions: the emergence of new task-relevant information during learning from both modalities that was not present in either alone. We study this challenge of interaction quantification in a semi-supervised setting with only labeled unimodal data and naturally co-occurring multimodal data (e.g., unlabeled images and captions, video and corresponding audio) but when labeling them is time-consuming. Using a precise information-theoretic definition of interactions, our key contributions are the derivations of lower and upper bounds to quantify the amount of multimodal interactions in this semi-supervised setting. We propose two lower bounds based on the amount of shared information between modalities and the disagreement between separately trained unimodal classifiers, and derive an upper bound through connections to approximate algorithms for min-entropy couplings. We validate these estimated bounds and show how they accurately track true interactions. Finally, two semi-supervised multimodal applications are explored based on these theoretical results: (1) analyzing the relationship between multimodal performance and estimated interactions, and (2) self-supervised learning that embraces disagreement between modalities beyond agreement as is typically done.

arxiv情報

著者	Paul Pu Liang,Chun Kai Ling,Yun Cheng,Alex Obolenskiy,Yudong Liu,Rohan Pandey,Alex Wilf,Louis-Philippe Morency,Ruslan Salakhutdinov
発行日	2023-06-07 15:44:53+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Multimodal Learning Without Labeled Multimodal Data: Guarantees and Applications

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー