Diverse, not Short: A Length-Controlled Self-Learning Framework for Improving Response Diversity of Language Models

要約

多様な言語モデルの応答は、クリエイティブ生成、オープンエンドのタスク、自己改善トレーニングに不可欠です。
一般的な多様性メトリック、さらには好みの最適化に使用される報酬モデル、さらにはより短い出力に向けてモデルを体系的にバイアスし、表現力を制限することを示します。
これに対処するために、長さのパリティを維持しながら応答の多様性を改善する長さ制御された自己学習フレームワークである、短い（多様なns）ではなく、多様な多様なものを紹介します。
多様性、品質、長さのバランスをとる優先データを生成およびフィルタリングすることにより、多様なNSを使用すると、3,000の優先ペアのみを使用して効果的なトレーニングが可能になります。
Llama-3.1-8bおよびOlmo-2ファミリーに適用されると、多様なNSは語彙とセマンティックの多様性を大幅に向上させます。
多様性の一貫した改善は、多様性の削減または4つのクリエイティブジェネレーションタスクの対応品質の利益を示しています：Divergent Associations、Persona Generation、Alternate使用、および創造的な執筆。
驚くべきことに、OLMO-2モデルファミリー（7B、および13B）を使用した実験は、OLMO-2-7Bのような小さなモデルが、より大きなモデルの効果的な「多様性教師」として役立つことを示しています。
長さのバイアスに明示的に対処することにより、この方法はモデルをより多様で表現力のある出力に効率的に押し進めます。

要約(オリジナル)

Diverse language model responses are crucial for creative generation, open-ended tasks, and self-improvement training. We show that common diversity metrics, and even reward models used for preference optimization, systematically bias models toward shorter outputs, limiting expressiveness. To address this, we introduce Diverse, not Short (Diverse-NS), a length-controlled self-learning framework that improves response diversity while maintaining length parity. By generating and filtering preference data that balances diversity, quality, and length, Diverse-NS enables effective training using only 3,000 preference pairs. Applied to LLaMA-3.1-8B and the Olmo-2 family, Diverse-NS substantially enhances lexical and semantic diversity. We show consistent improvement in diversity with minor reduction or gains in response quality on four creative generation tasks: Divergent Associations, Persona Generation, Alternate Uses, and Creative Writing. Surprisingly, experiments with the Olmo-2 model family (7B, and 13B) show that smaller models like Olmo-2-7B can serve as effective ‘diversity teachers’ for larger models. By explicitly addressing length bias, our method efficiently pushes models toward more diverse and expressive outputs.

arxiv情報

著者	Vijeta Deshpande,Debasmita Ghose,John D. Patterson,Roger Beaty,Anna Rumshisky
発行日	2025-05-26 17:21:01+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Diverse, not Short: A Length-Controlled Self-Learning Framework for Improving Response Diversity of Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー