Scaling Granite Code Models to 128K Context

要約

このペーパーでは、最大 128K トークンの効果的なコンテキストウィンドウをサポートするロングコンテキストの Granite コードモデルを紹介します。
Granite 3B/8B コードモデルのコンテキスト長を 2K/4K から 128K にスケーリングするためのソリューションは、リポジトリレベルのファイルパッキングと長さアップサンプリングされた長いコンテキストデータを使用して、RoPE ベース周波数を徐々に増加させることによる軽量の継続的な事前トレーニングで構成されます。
さらに、許容的にライセンスされたショートコンテキストとロングコンテキストの命令応答ペアの組み合わせでロングコンテキストベースモデルをさらに微調整することによって派生した、ロングコンテキストサポートを備えた命令調整モデルもリリースします。
オリジナルのショートコンテキストの Granite コードモデルと比較すると、当社のロングコンテキストモデルは、通常のコード補完ベンチマーク (HumanEval など) で目立ったパフォーマンスの低下を引き起こすことなく、ロングコンテキストのタスクで大幅な改善を達成しています。
すべてのロングコンテキストの Granite コードモデルを、研究用と商用利用の両方のために Apache 2.0 ライセンスに基づいてリリースします。

要約(オリジナル)

This paper introduces long-context Granite code models that support effective context windows of up to 128K tokens. Our solution for scaling context length of Granite 3B/8B code models from 2K/4K to 128K consists of a light-weight continual pretraining by gradually increasing its RoPE base frequency with repository-level file packing and length-upsampled long-context data. Additionally, we also release instruction-tuned models with long-context support which are derived by further finetuning the long context base models on a mix of permissively licensed short and long-context instruction-response pairs. While comparing to the original short-context Granite code models, our long-context models achieve significant improvements on long-context tasks without any noticeable performance degradation on regular code completion benchmarks (e.g., HumanEval). We release all our long-context Granite code models under an Apache 2.0 license for both research and commercial use.

arxiv情報

著者	Matt Stallone,Vaibhav Saxena,Leonid Karlinsky,Bridget McGinn,Tim Bula,Mayank Mishra,Adriana Meza Soria,Gaoyuan Zhang,Aditya Prasad,Yikang Shen,Saptha Surendran,Shanmukha Guttula,Hima Patel,Parameswaran Selvam,Xuan-Hong Dang,Yan Koyfman,Atin Sood,Rogerio Feris,Nirmit Desai,David D. Cox,Ruchir Puri,Rameswar Panda
発行日	2024-07-18 17:46:02+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Scaling Granite Code Models to 128K Context

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー