Distilling Implicit Multimodal Knowledge into Large Language Models for Zero-Resource Dialogue Generation

要約

マルチモーダルの知識を大規模な言語モデル（LLMS）に統合することは、対話生成能力の大幅な進歩を表しています。
ただし、ゼロリソースシナリオにそのような知識を効果的に組み込むことは、多様で高品質のダイアログデータセットが不足しているため、依然として大きな課題です。
これに対処するために、暗黙のマルチモーダル知識を活用することにより、ゼロリソースのコンテキストでの豊かな対話生成のためのLLMを強化することを目的とした革新的なアプローチである視覚的暗黙の知識蒸留フレームワーク（VIKDF）を提案します。
VIKDFは、2つの主要な段階で構成されています。知識の蒸留、暗黙のクエリトランスを使用して、画像テキストペアから視覚的暗黙の知識を知識ベクトルに抽出およびエンコードします。
そして、これらの蒸留ベクトルをLLMにシームレスに統合するために、新しい双方向の変動情報融合技術を採用している知識統合。
これにより、LLMは一貫性があり魅力的であるだけでなく、暗黙のマルチモーダルキューを通じてコンテキストの深い理解を示すダイアログを生成し、ゼロリソースシナリオの制限を効果的に克服することができます。
2つのダイアログデータセットでの広範な実験は、VIKDFが高品質のダイアログを生成する際に既存の最先端モデルを上回ることを示しています。
このコードは、https：//github.com/zhangbo-nlp/vikdfで入手できます。

要約(オリジナル)

Integrating multimodal knowledge into large language models (LLMs) represents a significant advancement in dialogue generation capabilities. However, the effective incorporation of such knowledge in zero-resource scenarios remains a substantial challenge due to the scarcity of diverse, high-quality dialogue datasets. To address this, we propose the Visual Implicit Knowledge Distillation Framework (VIKDF), an innovative approach aimed at enhancing LLMs for enriched dialogue generation in zero-resource contexts by leveraging implicit multimodal knowledge. VIKDF comprises two main stages: knowledge distillation, using an Implicit Query Transformer to extract and encode visual implicit knowledge from image-text pairs into knowledge vectors; and knowledge integration, employing a novel Bidirectional Variational Information Fusion technique to seamlessly integrate these distilled vectors into LLMs. This enables the LLMs to generate dialogues that are not only coherent and engaging but also exhibit a deep understanding of the context through implicit multimodal cues, effectively overcoming the limitations of zero-resource scenarios. Our extensive experimentation across two dialogue datasets shows that VIKDF outperforms existing state-of-the-art models in generating high-quality dialogues. The code is available at https://github.com/zhangbo-nlp/VIKDF.

arxiv情報

著者	Bo Zhang,Hui Ma,Jian Ding,Jian Wang,Bo Xu,Hongfei Lin
発行日	2025-02-05 16:54:42+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Distilling Implicit Multimodal Knowledge into Large Language Models for Zero-Resource Dialogue Generation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー