Entropy-Based Adaptive Weighting for Self-Training

要約

大規模な言語モデルの数学的な問題解決能力は、これらのモデルを改良および強化する有望な方法として、自己生成された推論パスを活用することに関心が高まって、研究の焦点となっています。
これらのパスは、監督のための正解のみを必要としながら、段階的な論理プロセスをキャプチャします。
セルフトレーニング方法は、外部モデルと手動注釈の必要性を排除しながら、推論タスクに効果的であることが示されています。
ただし、モデルトレーニングのために自己生成データの使用を最適化することは依然としてオープンな課題です。
この作業では、セルフトレーニング中に不確実なデータに優先順位を付けるために設計された適応型重み付け戦略である自己訓練（東）のエントロピーベースの適応重み付けを提案します。
具体的には、Eastは、重み付けの鋭さを制御する調整可能なパラメーターを使用してマッピング関数を採用し、モデルがより大きな不確実性を示すデータにより高い重みを割り当てます。
このアプローチは、モデルをガイドして、より有益で挑戦的な例に焦点を当て、それによりその推論能力を向上させます。
GSM8Kおよび数学ベンチマークに関するアプローチを評価します。
経験的結果は、バニラ法では数学の改善が事実上改善されないが、東部はバックボーンモデルよりも約1％のゲインを達成することを示しています。
GSM8Kでは、イーストはバニラ法と比較してさらに1〜2％のパフォーマンスブーストを達成します。

要約(オリジナル)

The mathematical problem-solving capabilities of large language models have become a focal point of research, with growing interests in leveraging self-generated reasoning paths as a promising way to refine and enhance these models. These paths capture step-by-step logical processes while requiring only the correct answer for supervision. The self-training method has been shown to be effective in reasoning tasks while eliminating the need for external models and manual annotations. However, optimizing the use of self-generated data for model training remains an open challenge. In this work, we propose Entropy-Based Adaptive Weighting for Self-Training (EAST), an adaptive weighting strategy designed to prioritize uncertain data during self-training. Specifically, EAST employs a mapping function with a tunable parameter that controls the sharpness of the weighting, assigning higher weights to data where the model exhibits greater uncertainty. This approach guides the model to focus on more informative and challenging examples, thereby enhancing its reasoning ability. We evaluate our approach on GSM8K and MATH benchmarks. Empirical results show that, while the vanilla method yields virtually no improvement (0%) on MATH, EAST achieves around a 1% gain over backbone model. On GSM8K, EAST attains a further 1-2% performance boost compared to the vanilla method.

arxiv情報

著者	Xiaoxuan Wang,Yihe Deng,Mingyu Derek Ma,Wei Wang
発行日	2025-03-31 10:04:35+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Entropy-Based Adaptive Weighting for Self-Training

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー