Adding Alignment Control to Language Models

要約

トレーニング後のアライメントは、言語モデル（LMS）の使いやすさを高める上でますます重要な要因になりました。
ただし、アライメントの強さは、個々の好みによって異なります。
このペーパーでは、CLMと呼ばれる単一のモデルにアライメント制御を組み込む方法を提案します。
このアプローチは、初期レイヤーの前に1つのIDレイヤーを追加し、このレイヤーでのみ好みの学習を実行して、アライメントされていない入力トークン埋め込みをアライメントした空間にマッピングします。
実験結果は、この効率的な微調整方法が完全な微調整に匹敵することを実証しています。
推論中、入力埋め込みはアライメントされた層と整列されていない層を介して処理され、補間係数を介してマージされます。
このパラメーターを制御することにより、アライメントは明確な補間と外挿現象を示します。

要約(オリジナル)

Post-training alignment has increasingly become a crucial factor in enhancing the usability of language models (LMs). However, the strength of alignment varies depending on individual preferences. This paper proposes a method to incorporate alignment control into a single model, referred to as CLM. This approach adds one identity layer preceding the initial layers and performs preference learning only on this layer to map unaligned input token embeddings into the aligned space. Experimental results demonstrate that this efficient fine-tuning method performs comparable to full fine-tuning. During inference, the input embeddings are processed through the aligned and unaligned layers, which are then merged through the interpolation coefficient. By controlling this parameter, the alignment exhibits a clear interpolation and extrapolation phenomenon.

arxiv情報

著者	Wenhong Zhu,Weinan Zhang,Rui Wang
発行日	2025-03-07 15:13:32+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Adding Alignment Control to Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー