PaD: Program-aided Distillation Can Teach Small Models Reasoning Better than Chain-of-thought Fine-tuning

要約

大規模言語モデル (LLM) はさまざまな自然言語処理タスクに優れていますが、その巨大なサイズとパラメーターへのアクセスの難しさにより、実際の展開には課題が生じます。
これまでの研究では、データ合成と思考連鎖 (CoT) の微調整を使用して、タスク固有の能力を LLM からより小さなモデルに抽出しようとしました。
ただし、合成 CoT データには誤った推論が含まれることが多く、特に推論能力において蒸留の品質が低下します。
この研究では、プログラム支援蒸留 (PaD) を提案します。これは、蒸留されたデータのエラーを抑制する推論プログラムを導入し、推論タスクの蒸留品質を向上させます。
PaD では、推論プログラムを利用して CoT を置き換え、合成データの自動エラーチェックを可能にします。
さらに、エラーの挿入とさらなるトレーニングを通じて、小規模な蒸留モデルは推論を繰り返し自己洗練することができます。
さらに、より正確な推論連鎖を取得するために、段階的な検証による段階的なビーム探索を実行します。
PaDは算術推理、記号推理、一般的な能力を評価します。
実験結果は、PaD を使用した小規模なモデルが特定の LLM (例: LLaMA-1 13B) を上回るパフォーマンスを発揮できるだけでなく、大幅に小さいパラメーターとデータのスケールでベースラインを上回る強力な改善を達成できることを示しています。
ソースコードは https://github.com/Xuekai-Zhu/pad で公開されています。

要約(オリジナル)

While large language models (LLMs) excel in various natural language processing tasks, their huge size and the inaccessibility of parameters present challenges for practical deployment. Previous studies try to distill task-specific ability from LLMs to smaller models, using data synthesis and chain-of-thought (CoT) fine-tuning. However, synthetic CoT data often contains faulty reasoning, which deteriorates the quality of distillation, especially in reasoning capabilities. In this work, we propose Program-aided Distillation (PaD), which introduces reasoning programs to suppress the errors in distilled data, and thus achieves better distillation quality for reasoning tasks. In PaD, we utilize the reasoning program to substitute the CoT, allowing automated error checking of synthetic data. Further, through error injecting and further training, the small distilling model could iteratively self-refine the reasoning. Moreover, we conduct a step-wise beam search by step-by-step verifying to acquire more exact reasoning chains. We evaluate PaD on arithmetic reasoning, symbolic reasoning, and general ability. Experimental results demonstrate that smaller models using PaD can not only outperform certain LLMs~(e.g., LLaMA-1 13B) but also achieve strong improvement over baselines with a significantly smaller scale of parameters and data. The source code is publicly available at https://github.com/Xuekai-Zhu/pad.

arxiv情報

著者	Xuekai Zhu,Biqing Qi,Kaiyan Zhang,Xinwei Long,Zhouhan Lin,Bowen Zhou
発行日	2024-03-20 08:37:42+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

PaD: Program-aided Distillation Can Teach Small Models Reasoning Better than Chain-of-thought Fine-tuning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー