TACO: Topics in Algorithmic COde generation dataset

要約

TACO は、オープンソースの大規模コード生成データセットであり、アルゴリズムの光学に焦点を当てており、コード生成モデルの分野でより困難なトレーニングデータセットと評価ベンチマークを提供するように設計されています。
TACO には、実際のプログラミングシナリオにおける問題の理解と推論能力を強化または評価するための、より挑戦的な競技レベルのプログラミング問題が含まれています。
トレーニングおよびテストセットには 25,433 および 1,000 のコーディング問題があり、最大 155 万の多様な解決策の回答もあります。
さらに、各 TACO 問題には、タスクのトピック、アルゴリズム、プログラミングスキル、難易度などのいくつかのきめ細かいラベルが含まれており、コード生成モデルのトレーニングと評価のためのより正確なリファレンスを提供します。
データセットと評価スクリプトは、Hugging Face Hub (https://huggingface.co/datasets/BAAI/TACO) および Github (https://github.com/FlagOpen/TACO) で入手できます。

要約(オリジナル)

We introduce TACO, an open-source, large-scale code generation dataset, with a focus on the optics of algorithms, designed to provide a more challenging training dataset and evaluation benchmark in the field of code generation models. TACO includes competition-level programming questions that are more challenging, to enhance or evaluate problem understanding and reasoning abilities in real-world programming scenarios. There are 25433 and 1000 coding problems in training and test set, as well as up to 1.55 million diverse solution answers. Moreover, each TACO problem includes several fine-grained labels such as task topics, algorithms, programming skills, and difficulty levels, providing a more precise reference for the training and evaluation of code generation models. The dataset and evaluation scripts are available on Hugging Face Hub (https://huggingface.co/datasets/BAAI/TACO) and Github (https://github.com/FlagOpen/TACO).

arxiv情報

著者	Rongao Li,Jie Fu,Bo-Wen Zhang,Tao Huang,Zhihong Sun,Chen Lyu,Guang Liu,Zhi Jin,Ge Li
発行日	2023-12-22 17:25:42+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

TACO: Topics in Algorithmic COde generation dataset

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー