Enhancing Multimodal Entity and Relation Extraction with Variational Information Bottleneck

要約

タイトル：バリアブルインフォメーションボトルネックでマルチモーダルエンティティと関係抽出を改良

要約：
– 本論文は、マルチメディアソーシャルプラットフォーム分析に重要なマルチモーダル名前付きエンティティ認識(MNER)とマルチモーダル関係抽出(MRE)を研究している。
– MNERとMREの核心は、明らかな視覚情報を取り込んでテキストの意味を強化することにあるが、2つの問題が内在的に調査を要求している。
– 最初の問題はモダリティノイズであり、各モダリティのタスクに関係のない情報がタスクの予測を誤導するノイズになる可能性がある。
– 2番目の問題はモダリティギャップであり、異なるモダリティからの表現が不一致であり、テキストと画像の間の意味的な整合性を構築することができない。
– これらの問題に対処するために、MMIBと呼ばれるマルチモーダル表現学習と情報ボトルネックを組み合わせた新しい方法を提案する。
– 最初の問題に対しては、改良リギュレータが情報ボトルネック原理を探ることで、予測的な証拠とノイズ情報をバランスして表現を作り出す。
– 2番目の問題に対しては、アライメントリギュレータが提案され、相互情報に基づくアイテムが対比的に機能して一貫したテキスト-画像表現を規制する。
– MNERとMREのためにバリアブルIB推定を探索する最初の研究である。実験結果は、MMIBが3つの公開ベンチマークで最新の性能を発揮することを示している。

要約(オリジナル)

This paper studies the multimodal named entity recognition (MNER) and multimodal relation extraction (MRE), which are important for multimedia social platform analysis. The core of MNER and MRE lies in incorporating evident visual information to enhance textual semantics, where two issues inherently demand investigations. The first issue is modality-noise, where the task-irrelevant information in each modality may be noises misleading the task prediction. The second issue is modality-gap, where representations from different modalities are inconsistent, preventing from building the semantic alignment between the text and image. To address these issues, we propose a novel method for MNER and MRE by Multi-Modal representation learning with Information Bottleneck (MMIB). For the first issue, a refinement-regularizer probes the information-bottleneck principle to balance the predictive evidence and noisy information, yielding expressive representations for prediction. For the second issue, an alignment-regularizer is proposed, where a mutual information-based item works in a contrastive manner to regularize the consistent text-image representations. To our best knowledge, we are the first to explore variational IB estimation for MNER and MRE. Experiments show that MMIB achieves the state-of-the-art performances on three public benchmarks.

arxiv情報

著者	Shiyao Cui,Jiangxia Cao,Xin Cong,Jiawei Sheng,Quangang Li,Tingwen Liu,Jinqiao Shi
発行日	2023-04-05 09:32:25+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, OpenAI

Enhancing Multimodal Entity and Relation Extraction with Variational Information Bottleneck

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー