Llemma: An Open Language Model For Mathematics

要約

数学の大規模言語モデルである Llemma を紹介します。
私たちは、科学論文、数学を含む Web データ、数学的コードの混合物である Proof-Pile-2 上で Code Llama の事前トレーニングを続け、Llemma を生成します。
MATH ベンチマークでは、Llemma は、既知のすべてのオープンベースモデルだけでなく、等パラメータベースで未リリースの Minerva モデルスイートよりも優れたパフォーマンスを示します。
さらに、Llemma は、さらに微調整することなく、ツールを使用して形式的定理を証明することができます。
私たちは、70 億と 340 億のパラメーターモデル、Proof-Pile-2、実験を再現するコードを含むすべてのアーティファクトをオープンにリリースします。

要約(オリジナル)

We present Llemma, a large language model for mathematics. We continue pretraining Code Llama on the Proof-Pile-2, a mixture of scientific papers, web data containing mathematics, and mathematical code, yielding Llemma. On the MATH benchmark Llemma outperforms all known open base models, as well as the unreleased Minerva model suite on an equi-parameter basis. Moreover, Llemma is capable of tool use and formal theorem proving without any further finetuning. We openly release all artifacts, including 7 billion and 34 billion parameter models, the Proof-Pile-2, and code to replicate our experiments.

arxiv情報

著者	Zhangir Azerbayev,Hailey Schoelkopf,Keiran Paster,Marco Dos Santos,Stephen McAleer,Albert Q. Jiang,Jia Deng,Stella Biderman,Sean Welleck
発行日	2023-10-16 17:54:07+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Llemma: An Open Language Model For Mathematics

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー