DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales

要約

ChatGPTのようなモデルは、要約やコーディングから翻訳に至るまで、人工知能の様々なアプリケーションに革命をもたらし、人間のパフォーマンスに匹敵するか、あるいはそれを上回ることさえあります。しかし、現在の状況には、これらの強力なモデルのための、アクセスしやすく、効率的で、費用対効果の高いエンドツーエンドのRLHF（人間のフィードバックを伴う強化学習）トレーニングパイプラインが欠けています。本稿では、RLHF学習を民主化し、AIコミュニティが利用できるようにする新しいシステム、DeepSpeed-Chatを紹介する。DeepSpeed-Chatは、ChatGPTのようなモデルのための使いやすい訓練と推論エクスペリエンス、InstructGPTの訓練パイプラインを複製するDeepSpeed-RLHFパイプライン、訓練と推論のための様々な最適化を統一的な方法で組み合わせた堅牢なDeepSpeed-RLHFシステムの3つの主要機能を提供する。このシステムは比類のない効率性とスケーラビリティを実現し、数千億のパラメータを持つモデルのトレーニングを記録的な速さとわずかなコストで可能にします。この開発により、DeepSpeed-Chatは、リソースが限られたデータサイエンティストでも、高度なRLHFトレーニングに広くアクセスできる道を開き、AI分野のイノベーションとさらなる発展を促進します。

要約(オリジナル)

ChatGPT-like models have revolutionized various applications in artificial intelligence, from summarization and coding to translation, matching or even surpassing human performance. However, the current landscape lacks an accessible, efficient, and cost-effective end-to-end RLHF (Reinforcement Learning with Human Feedback) training pipeline for these powerful models, particularly when training at the scale of billions of parameters. This paper introduces DeepSpeed-Chat, a novel system that democratizes RLHF training, making it accessible to the AI community. DeepSpeed-Chat offers three key capabilities: an easy-to-use training and inference experience for ChatGPT-like models, a DeepSpeed-RLHF pipeline that replicates the training pipeline from InstructGPT, and a robust DeepSpeed-RLHF system that combines various optimizations for training and inference in a unified way. The system delivers unparalleled efficiency and scalability, enabling training of models with hundreds of billions of parameters in record time and at a fraction of the cost. With this development, DeepSpeed-Chat paves the way for broader access to advanced RLHF training, even for data scientists with limited resources, thereby fostering innovation and further development in the field of AI.

arxiv情報

著者	Zhewei Yao,Reza Yazdani Aminabadi,Olatunji Ruwase,Samyam Rajbhandari,Xiaoxia Wu,Ammar Ahmad Awan,Jeff Rasley,Minjia Zhang,Conglong Li,Connor Holmes,Zhongzhu Zhou,Michael Wyatt,Molly Smith,Lev Kurilenko,Heyang Qin,Masahiro Tanaka,Shuai Che,Shuaiwen Leon Song,Yuxiong He
発行日	2023-08-02 18:49:57+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー