Fine-Tuning with Divergent Chains of Thought Boosts Reasoning Through Self-Correction in Language Models

要約

ラージ・ランゲージ・モデルに中間推論ステップを生成させることは、パフォーマンスを向上させる効果的な方法であることが示されている。実際、これらの中間推論ステップに対する命令チューニングがモデルの性能を向上させることがわかっている。本研究では、1つの推論ステップで解を生成する前に、モデルに複数の推論チェーンを比較させることで、性能をさらに向上させる新しい方法を提案する。この方法をDivergent CoT (DCoT)と呼ぶ。DCoTデータセット上で命令チューニングを行うことで、LLMの性能がさらに向上することを発見した。様々な推論タイプを必要とする広範なタスクにまたがる厳密な一連の実験を通して、DCoT上でのファインチューニングが、モデルファミリーとスケール（1.3Bから70B）にわたって、CoTベースラインよりも一貫して性能を向上させることを示す。さらに、経験的評価と手動評価の組み合わせにより、これらの性能向上は、モデルが単一の推論ステップで複数の発散推論チェーンを生成することに起因しており、言語モデルにおける自己修正が可能であることを示している。我々のコードとデータはhttps://github.com/UKPLab/arxiv2024-divergent-cot。

要約(オリジナル)

Requiring a Large Language Model to generate intermediary reasoning steps has been shown to be an effective way of boosting performance. In fact, it has been found that instruction tuning on these intermediary reasoning steps improves model performance. In this work, we present a novel method of further improving performance by requiring models to compare multiple reasoning chains before generating a solution in a single inference step. We call this method Divergent CoT (DCoT). We find that instruction tuning on DCoT datasets boosts the performance of even smaller, and therefore more accessible, LLMs. Through a rigorous set of experiments spanning a wide range of tasks that require various reasoning types, we show that fine-tuning on DCoT consistently improves performance over the CoT baseline across model families and scales (1.3B to 70B). Through a combination of empirical and manual evaluation, we additionally show that these performance gains stem from models generating multiple divergent reasoning chains in a single inference step, indicative of the enabling of self-correction in language models. Our code and data are publicly available at https://github.com/UKPLab/arxiv2024-divergent-cot.

arxiv情報

著者	Haritz Puerto,Tilek Chubakov,Xiaodan Zhu,Harish Tayyar Madabushi,Iryna Gurevych
発行日	2024-07-03 15:01:18+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

Fine-Tuning with Divergent Chains of Thought Boosts Reasoning Through Self-Correction in Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー