Generative Multimodal Entity Linking

要約

マルチモーダルエンティティリンク (MEL) は、マルチモーダルコンテキストを含む言及をナレッジベース (Wikipedia など) からの参照先エンティティにマッピングするタスクです。
従来の MEL メソッドは主に、複雑なマルチモーダルな対話メカニズムの設計に焦点を当てており、すべてのモデルパラメーターを微調整する必要がありますが、これは法外にコストがかかり、大規模言語モデル (LLM) の時代には拡張が困難になる可能性があります。
この研究では、大規模な事前トレーニングからの LLM の機能を活用してターゲットエンティティ名を直接生成する、シンプルかつ効果的な生成マルチモーダルエンティティリンク手法である GEMEL を提案します。
ビジョンと言語モデルを凍結したままにし、線形層のみをトレーニングしてクロスモダリティ相互作用を可能にします。
LLM を MEL タスクに適応させるために、マルチモーダルインスタンスをデモンストレーションとして取得することで、LLM の新たなインコンテキスト学習 (ICL) 機能を利用します。
広範な実験により、GEMEL はモデルパラメーターのわずか約 0.3% を微調整するだけで、2 つの十分に確立された MEL データセットで最先端の結果を達成できることがわかりました (WikiDiverse で 4.1% の精度向上、WikiMEL で 15.4% の精度向上)。
私たちのアプローチは既製の言語モデルと互換性があり、MEL タスクで LLM を利用するための効率的かつ一般的なソリューションへの道を開きます。

要約(オリジナル)

Multimodal Entity Linking (MEL) is the task of mapping mentions with multimodal contexts to the referent entities from a knowledge base (e.g., Wikipedia). Prior MEL methods mainly focus on designing complex multimodal interaction mechanisms and require fine-tuning all model parameters, which can be prohibitively costly and difficult to scale in the era of Large Language Models (LLMs). In this work, we propose GEMEL, a simple yet effective Generative Multimodal Entity Linking method, which leverages the capabilities of LLMs from large-scale pre-training to directly generate target entity names. We keep the vision and language model frozen and only train a linear layer to enable cross-modality interactions. To adapt LLMs to the MEL task, we take advantage of the emerging in-context learning (ICL) capability of LLMs by retrieving multimodal instances as demonstrations. Extensive experiments show that with only ~0.3% of the model parameters fine-tuned, GEMEL achieves state-of-the-art results on two well-established MEL datasets (4.1% accuracy gains on WikiDiverse and 15.4% accuracy gains on WikiMEL). Our approach is compatible with any off-the-shelf language model, paving the way towards an efficient and general solution for utilizing LLMs in the MEL task.

arxiv情報

著者	Senbao Shi,Zhenran Xu,Baotian Hu,Min Zhang
発行日	2023-06-22 07:57:19+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Generative Multimodal Entity Linking

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー