Constrained Entropic Unlearning: A Primal-Dual Framework for Large Language Models

要約

現実世界の設定で展開された大規模な言語モデル（LLM）は、繊細、時代遅れ、または独自の情報を学習する必要性にますます直面しています。
既存の未学習方法は、通常、忘却と保持を正規化されたトレードオフとして定式化し、両方の目的を単一のスカラリットされた損失に組み合わせます。
これは、特に積極的な忘却の下で、不安定な最適化と保持データのパフォーマンスの低下につながることがよくあります。
制約された最適化問題としてLLMの新しい定式化を提案します。忘却は、指定された忘却セットの均一性に向けて出力分布を明示的に駆動する新しいロジットマージン分布を介して施行されますが、個別の保持セットでのハード制約を介して保持が保持されます。
エントロピーベースの目的と比較して、私たちの損失はソフトマックスフリーで、数値的に安定しており、非廃止勾配を維持し、より効率的で堅牢な最適化を可能にします。
デュアル変数のダイナミクスを介して忘却と保持のトレードオフを公開するスケーラブルなプライマルデュアルアルゴリズムを使用して、制約された問題を解決します。
多様なLLMアーキテクチャ全体の豆腐とミューズのベンチマークの評価は、私たちのアプローチが一貫して最先端のベースラインに一致するか、それを超えており、下流のユーティリティを維持しながらターゲット情報を効果的に削除することを示しています。

要約(オリジナル)

Large Language Models (LLMs) deployed in real-world settings increasingly face the need to unlearn sensitive, outdated, or proprietary information. Existing unlearning methods typically formulate forgetting and retention as a regularized trade-off, combining both objectives into a single scalarized loss. This often leads to unstable optimization and degraded performance on retained data, especially under aggressive forgetting. We propose a new formulation of LLM unlearning as a constrained optimization problem: forgetting is enforced via a novel logit-margin flattening loss that explicitly drives the output distribution toward uniformity on a designated forget set, while retention is preserved through a hard constraint on a separate retain set. Compared to entropy-based objectives, our loss is softmax-free, numerically stable, and maintains non-vanishing gradients, enabling more efficient and robust optimization. We solve the constrained problem using a scalable primal-dual algorithm that exposes the trade-off between forgetting and retention through the dynamics of the dual variable. Evaluations on the TOFU and MUSE benchmarks across diverse LLM architectures demonstrate that our approach consistently matches or exceeds state-of-the-art baselines, effectively removing targeted information while preserving downstream utility.

arxiv情報

著者	Taha Entesari,Arman Hatami,Rinat Khaziev,Anil Ramakrishna,Mahyar Fazlyab
発行日	2025-06-05 17:55:23+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Constrained Entropic Unlearning: A Primal-Dual Framework for Large Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー