Predicting Large Language Model Capabilities on Closed-Book QA Tasks Using Only Information Available Prior to Training

要約

OpenAIのGPT-4テクニカルレポートは、方法論が不特定のままであるにもかかわらず、特定のタスクのモデルパフォーマンスをトレーニング前に予測できることを示唆しています。
このアプローチは、リソースの割り当てを最適化し、ターゲットタスクとのデータの調整を確保するために重要です。
このビジョンを達成するために、トレーニング前のデータと知識保持に密接に結びついているクローズドブック質問応答（CBQA）タスクのパフォーマンスの予測に焦点を当てています。
次の3つの主要な課題に対処します。1）トレーニング前のプロセス全体、特にデータ構築をマスターする。
2）モデルの知識保持の評価。
3）トレーニング前に利用可能な情報のみを使用して、タスク固有の知識保持を予測する。
これらの課題に取り組むために、560Kドルと520K GPU時間を使用して、3つの大きな言語モデル（つまり、1.6b、7b、および13b）を事前訓練します。
知識トリプルでトレーニング前のデータを分析し、確立された方法を使用して知識保持を評価します。
さらに、トレーニング前のデータ、モデルサイズ、およびタスク固有の知識保持との関係を定量化する情報理論的尺度であるSMIメトリックを紹介します。
私たちの実験は、SMIメトリックとさまざまなサイズのモデル全体でCBQAタスクのモデルの精度との間の強い線形相関（$ \ Text {r}^2> 0.84 $）を明らかにしています（すなわち、1.1b、1.6b、7b、および13b）
。
データセット、モデル、およびコードは、https：//github.com/yuhui1038/smiで入手できます。

要約(オリジナル)

The GPT-4 technical report from OpenAI suggests that model performance on specific tasks can be predicted prior to training, though methodologies remain unspecified. This approach is crucial for optimizing resource allocation and ensuring data alignment with target tasks. To achieve this vision, we focus on predicting performance on Closed-book Question Answering (CBQA) tasks, which are closely tied to pre-training data and knowledge retention. We address three major challenges: 1) mastering the entire pre-training process, especially data construction; 2) evaluating a model’s knowledge retention; and 3) predicting task-specific knowledge retention using only information available prior to training. To tackle these challenges, we pre-train three large language models (i.e., 1.6B, 7B, and 13B) using 560k dollars and 520k GPU hours. We analyze the pre-training data with knowledge triples and assess knowledge retention using established methods. Additionally, we introduce the SMI metric, an information-theoretic measure that quantifies the relationship between pre-training data, model size, and task-specific knowledge retention. Our experiments reveal a strong linear correlation ($\text{R}^2 > 0.84$) between the SMI metric and the model’s accuracy on CBQA tasks across models of varying sizes (i.e., 1.1B, 1.6B, 7B, and 13B). The dataset, model, and code are available at https://github.com/yuhui1038/SMI.

arxiv情報

著者	Changhao Jiang,Ming Zhang,Junjie Ye,Xiaoran Fan,Yifei Cao,Jiajun Sun,Zhiheng Xi,Shihan Dou,Yi Dong,Yujiong Shen,Jingqi Tong,Zhen Wang,Tao Liang,Zhihui Fei,Mingyang Wan,Guojun Ma,Qi Zhang,Tao Gui,Xuanjing Huang
発行日	2025-02-06 13:23:53+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Predicting Large Language Model Capabilities on Closed-Book QA Tasks Using Only Information Available Prior to Training

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー