LLM Post-Training: A Deep Dive into Reasoning Large Language Models

要約

大規模な言語モデル（LLM）は、自然言語処理環境を変え、多様なアプリケーションを実現しました。
膨大なウェブスケールのデータを事前に削除して、これらのモデルの基盤が築かれていますが、研究コミュニティは、さらなるブレークスルーを達成するために、トレーニング後のテクニックにますます焦点を移しています。
事前トレーニングは幅広い言語基盤を提供しますが、トレーニング後の方法により、LLMは知識を改善し、推論を改善し、事実上の正確性を高め、ユーザーの意図と倫理的考慮事項とより効果的に調整できます。
微調整、強化学習、およびテスト時間スケーリングは、LLMSパフォーマンスを最適化し、堅牢性を確保し、さまざまな現実世界のタスクにわたる適応性を改善するための重要な戦略として浮上しています。
この調査では、トレーニング後の方法論の体系的な調査を提供し、壊滅的な忘却、報酬のハッキング、推論時間のトレードオフなどの重要な課題に対処することを超えて、LLMを改良する役割を分析します。
モデルアライメント、スケーラブルな適応、および推論時間推論における新しい方向性を強調し、将来の研究方向の概要を説明します。
また、この急速に進化する分野の開発を継続的に追跡するためのパブリックリポジトリを提供します：https：//github.com/mbzuai-oryx/awesome-llm-post-training。

要約(オリジナル)

Large Language Models (LLMs) have transformed the natural language processing landscape and brought to life diverse applications. Pretraining on vast web-scale data has laid the foundation for these models, yet the research community is now increasingly shifting focus toward post-training techniques to achieve further breakthroughs. While pretraining provides a broad linguistic foundation, post-training methods enable LLMs to refine their knowledge, improve reasoning, enhance factual accuracy, and align more effectively with user intents and ethical considerations. Fine-tuning, reinforcement learning, and test-time scaling have emerged as critical strategies for optimizing LLMs performance, ensuring robustness, and improving adaptability across various real-world tasks. This survey provides a systematic exploration of post-training methodologies, analyzing their role in refining LLMs beyond pretraining, addressing key challenges such as catastrophic forgetting, reward hacking, and inference-time trade-offs. We highlight emerging directions in model alignment, scalable adaptation, and inference-time reasoning, and outline future research directions. We also provide a public repository to continually track developments in this fast-evolving field: https://github.com/mbzuai-oryx/Awesome-LLM-Post-training.

arxiv情報

著者	Komal Kumar,Tajamul Ashraf,Omkar Thawakar,Rao Muhammad Anwer,Hisham Cholakkal,Mubarak Shah,Ming-Hsuan Yang,Phillip H. S. Torr,Fahad Shahbaz Khan,Salman Khan
発行日	2025-03-24 09:34:38+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

LLM Post-Training: A Deep Dive into Reasoning Large Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー