The Power of Noise: Toward a Unified Multi-modal Knowledge Graph Representation Framework

要約

マルチモーダル事前トレーニングの進歩は、堅牢なマルチモーダルナレッジグラフ (MMKG) 表現学習フレームワークの必要性を浮き彫りにしています。
このフレームワークは、知識の誤解やマルチモーダルな幻覚などの問題を軽減することを目的として、構造化された知識をマルチモーダル大規模言語モデル (LLM) に大規模に統合するために不可欠です。
この作業では、MMKG 内にエンティティを正確に埋め込むモデルの能力を評価するために、マルチモーダルナレッジグラフ補完 (MKGC) とマルチモーダルエンティティアライメント (MMEA) という 2 つの広く研究されているタスクに焦点を当てます。
この基盤に基づいて、KG でのマルチモーダルエンティティ機能の堅牢な統合のために、モダリティレベルのノイズマスキングを備えた Transformer ベースのアーキテクチャを利用する新しい SNAG 方法を提案します。
MKGC と MMEA の両方に特定のトレーニング目標を組み込むことで、私たちのアプローチは合計 10 個のデータセット (MKGC 用に 3 つ、MEMA 用に 7 つ) にわたって SOTA パフォーマンスを達成し、その堅牢性と多用途性を実証しています。
さらに、SNAG はスタンドアロンモデルとして機能するだけでなく、他の既存のメソッドを強化し、安定したパフォーマンスの向上を実現します。
コードとデータは https://github.com/zjukg/SNAG から入手できます。

要約(オリジナル)

The advancement of Multi-modal Pre-training highlights the necessity for a robust Multi-Modal Knowledge Graph (MMKG) representation learning framework. This framework is crucial for integrating structured knowledge into multi-modal Large Language Models (LLMs) at scale, aiming to alleviate issues like knowledge misconceptions and multi-modal hallucinations. In this work, to evaluate models’ ability to accurately embed entities within MMKGs, we focus on two widely researched tasks: Multi-modal Knowledge Graph Completion (MKGC) and Multi-modal Entity Alignment (MMEA). Building on this foundation, we propose a novel SNAG method that utilizes a Transformer-based architecture equipped with modality-level noise masking for the robust integration of multi-modal entity features in KGs. By incorporating specific training objectives for both MKGC and MMEA, our approach achieves SOTA performance across a total of ten datasets (three for MKGC and seven for MEMA), demonstrating its robustness and versatility. Besides, SNAG can not only function as a standalone model but also enhance other existing methods, providing stable performance improvements. Our code and data are available at: https://github.com/zjukg/SNAG.

arxiv情報

著者	Zhuo Chen,Yin Fang,Yichi Zhang,Lingbing Guo,Jiaoyan Chen,Huajun Chen,Wen Zhang
発行日	2024-03-20 10:02:54+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

The Power of Noise: Toward a Unified Multi-modal Knowledge Graph Representation Framework

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー