DualEdit: Dual Editing for Knowledge Updating in Vision-Language Models

要約

モデルの編集は、時間のかかる完全な再訓練を必要とせずに、事前に訓練されたモデルの知識を効率的に更新することを目的としています。
既存の先駆的な編集方法は有望な結果を達成しますが、主にシングルモーダル言語モデル（LLM）の編集に焦点を当てています。
ただし、複数のモダリティを含むビジョン言語モデル（VLM）の場合、編集パフォーマンスに対する各モダリティの役割と影響はほとんど未調査のままです。
このギャップに対処するために、モデルの編集に対するテキストと視覚のモダリティの影響を調査し、次のことを見つけます。（1）テキストと視覚の表現は、さまざまな重要性を反映して、異なる層でピーク感度に達します。
（2）両方のモダリティを編集することは、知識を効率的に更新できますが、これはモデルの元の機能を損なうコストでもたらされます。
調査結果に基づいて、それぞれのキー層でテキストと視覚の両方のモダリティを変更する編集者であるDualeDitを提案します。
さらに、より機密のテキストモダリティ内でゲーティングモジュールを導入し、デュアルエディットがモデルの元の情報を保存しながら新しい知識を効率的に更新できるようにします。
複数のVLMバックボーンとベンチマークデータセットにわたってDualeDITを評価し、さまざまな評価メトリックで最先端のVLM編集ベースラインと適応LLM編集方法よりも優れていることを示しています。

要約(オリジナル)

Model editing aims to efficiently update a pre-trained model’s knowledge without the need for time-consuming full retraining. While existing pioneering editing methods achieve promising results, they primarily focus on editing single-modal language models (LLMs). However, for vision-language models (VLMs), which involve multiple modalities, the role and impact of each modality on editing performance remain largely unexplored. To address this gap, we explore the impact of textual and visual modalities on model editing and find that: (1) textual and visual representations reach peak sensitivity at different layers, reflecting their varying importance; and (2) editing both modalities can efficiently update knowledge, but this comes at the cost of compromising the model’s original capabilities. Based on our findings, we propose DualEdit, an editor that modifies both textual and visual modalities at their respective key layers. Additionally, we introduce a gating module within the more sensitive textual modality, allowing DualEdit to efficiently update new knowledge while preserving the model’s original information. We evaluate DualEdit across multiple VLM backbones and benchmark datasets, demonstrating its superiority over state-of-the-art VLM editing baselines as well as adapted LLM editing methods on different evaluation metrics.

arxiv情報

著者	Zhiyi Shi,Binjie Wang,Chongjie Si,Yichen Wu,Junsik Kim,Hanspeter Pfister
発行日	2025-06-16 16:04:16+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

DualEdit: Dual Editing for Knowledge Updating in Vision-Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー