Is Free Self-Alignment Possible?

要約

前提条件モデル（LMS）を調整するには、多くの場合、大規模な選好データと実質的な計算リソースが必要です。
これらのコストは、多目的または多元的なアライメントでさらに禁止されています。
これは本当に必要ですか？
内部モデル機能のみを使用して、追加のトレーニングなしで効率的なアライメントを実行できますか？
この質問に答えるために、Alignezを提案します。Alignezは、（1）自己生成設定データと（2）表現編集を活用して、費用対効果が高く効率的なアライメントを実現する新しいアプローチを提案します。
学習した表現を直接操作することにより、Alignezは、従来のアライメント方法のオーバーヘッドなしに異なる行動の側面を独立してターゲットにします。
私たちの実験は、この費用効率の高い手順により、多様なタスク全体のパフォーマンスが向上することが明らかになりました。強力な基本モデルから始まる場合でも、一般的なアライメントで最大19.9％、挑戦的な数学的推論タスクで1.9％です。
Alignezは、モデルを複数の目的に同時に並べることもでき、複数の優先軸に対する細粒の制御を許可することもできます。
最後に、Alignezは、より高価なアライメント手順（DPOなど）を、地上忠実な優先データの限られた可用性の下でも加速できることを示しています。

要約(オリジナル)

Aligning pretrained language models (LMs) often requires large-scale preference data and substantial computational resources. These costs become even more prohibitive for multi-objective or pluralistic alignment. Is this truly necessary? Can we perform efficient alignment using only internal model capabilities, and without additional training? To answer this question, we propose AlignEZ, a novel approach that leverages (1) self-generated preference data and (2) representation editing to achieve cost-effective, efficient alignment. By operating directly on learned representations, AlignEZ independently targets different behavioral aspects without the overhead of traditional alignment methods. Our experiments reveal that this cost-efficient procedure improves performance across diverse tasks: up to 19.9% on general alignment and 1.9% on challenging mathematical reasoning tasks, even when starting from a strong base model. AlignEZ can also align models to multiple objectives simultaneously, granting fine-grained control over multiple preference axes. Finally, we show that AlignEZ can accelerate more expensive alignment procedures–such as DPO–even under limited availability of ground-truth preference data.

arxiv情報

著者	Dyah Adila,Changho Shin,Yijing Zhang,Frederic Sala
発行日	2025-02-21 14:54:05+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Is Free Self-Alignment Possible?

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー