Unilogit: Robust Machine Unlearning for LLMs Using Uniform-Target Self-Distillation

要約

このペーパーでは、大規模な言語モデルで学習するための新しい自己設計方法であるUnilogitを紹介します。
Unilogitは、GDPRなどのデータプライバシー規制に準拠した重要なタスクである、モデル全体のユーティリティを維持しながら、特定の情報を選択的に忘れているという課題に対処します。
静的なハイパーパラメーターまたはモデル出力の開始に依存する以前の方法とは異なり、UnILOGITはターゲットトークンの均一な確率を実現するためにターゲットロジットを動的に調整し、より正確な自己設定ターゲットのために現在のモデルの出力を活用します。
このアプローチは、追加のハイパーパラメーターの必要性を排除するだけでなく、黄金のターゲットを近似するモデルの能力も強化します。
パブリックベンチマークと社内のeコマースデータセットでの広範な実験は、NPOやUndialなどの最先端の方法を上回る忘れを維持するためのUnilogitの優れたパフォーマンスを示しています。
私たちの分析により、さまざまなシナリオにわたるUnilogitの堅牢性がさらに明らかになり、実用的なマシンの学習を達成する際の実用的な適用性と有効性が強調されています。

要約(オリジナル)

This paper introduces Unilogit, a novel self-distillation method for machine unlearning in Large Language Models. Unilogit addresses the challenge of selectively forgetting specific information while maintaining overall model utility, a critical task in compliance with data privacy regulations like GDPR. Unlike prior methods that rely on static hyperparameters or starting model outputs, Unilogit dynamically adjusts target logits to achieve a uniform probability for the target token, leveraging the current model’s outputs for more accurate self-distillation targets. This approach not only eliminates the need for additional hyperparameters but also enhances the model’s ability to approximate the golden targets. Extensive experiments on public benchmarks and an in-house e-commerce dataset demonstrate Unilogit’s superior performance in balancing forget and retain objectives, outperforming state-of-the-art methods such as NPO and UnDIAL. Our analysis further reveals Unilogit’s robustness across various scenarios, highlighting its practical applicability and effectiveness in achieving efficacious machine unlearning.

arxiv情報

著者	Stefan Vasilev,Christian Herold,Baohao Liao,Seyyed Hadi Hashemi,Shahram Khadivi,Christof Monz
発行日	2025-05-09 13:19:09+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Unilogit: Robust Machine Unlearning for LLMs Using Uniform-Target Self-Distillation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー