CodeIt: Self-Improving Language Models with Prioritized Hindsight Replay

要約

大規模な言語モデルは、人間レベルの推論能力が必要と一般に考えられているタスクを解決するようになっています。
ただし、これらのモデルは、Abstraction and Reasoning Corpus (ARC) などの一般知能のベンチマークでは依然としてパフォーマンスが非常に悪いです。
この論文では、例によるプログラミング問題として ARC にアプローチし、コード反復 (CodeIt) と呼ばれる、言語モデルを自己改善するための新規でスケーラブルな方法を紹介します。
私たちの方法は、1) プログラムのサンプリングと後知恵の再ラベル付け、2) 優先順位付けされたエクスペリエンスの再生からの学習を繰り返します。
エピソードの目標 (つまり、入力が与えられたターゲットプログラムの出力) を、サンプルされたプログラムによって生成された実現された出力に再ラベル付けすることにより、私たちの方法は、プログラム合成における報酬の極度の希薄さに効果的に対処します。
CodeIt を ARC データセットに適用することで、優先順位を付けた事後リプレイと事前トレーニングおよびデータ拡張がタスク間の一般化の成功につながることを実証します。
CodeIt は、完全な ARC 評価データセットに対応する最初の神経記号的アプローチです。
私たちの手法は ARC 評価タスクの 15% を解決し、最先端のパフォーマンスを達成し、既存のニューラルおよびシンボリックベースラインを上回るパフォーマンスを実現します。
私たちのコードは https://github.com/Qualcomm-AI-research/codeit で入手できます。

要約(オリジナル)

Large language models are increasingly solving tasks that are commonly believed to require human-level reasoning ability. However, these models still perform very poorly on benchmarks of general intelligence such as the Abstraction and Reasoning Corpus (ARC). In this paper, we approach ARC as a programming-by-examples problem, and introduce a novel and scalable method for language model self-improvement called Code Iteration (CodeIt). Our method iterates between 1) program sampling and hindsight relabeling, and 2) learning from prioritized experience replay. By relabeling the goal of an episode (i.e., the target program output given input) to the realized output produced by the sampled program, our method effectively deals with the extreme sparsity of rewards in program synthesis. Applying CodeIt to the ARC dataset, we demonstrate that prioritized hindsight replay, along with pre-training and data-augmentation, leads to successful inter-task generalization. CodeIt is the first neuro-symbolic approach that scales to the full ARC evaluation dataset. Our method solves 15% of ARC evaluation tasks, achieving state-of-the-art performance and outperforming existing neural and symbolic baselines. Our code is available at https://github.com/Qualcomm-AI-research/codeit .

arxiv情報

著者	Natasha Butt,Blazej Manczak,Auke Wiggers,Corrado Rainone,David W. Zhang,Michaël Defferrard,Taco Cohen
発行日	2024-07-01 10:03:33+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

CodeIt: Self-Improving Language Models with Prioritized Hindsight Replay

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー