Code Simulation Challenges for Large Language Models

要約

私たちは、大規模言語モデル (LLM) がコンピューターコードとアルゴリズムの実行をどの程度シミュレートできるかを調査します。
まず直線プログラムを検討し、現在の LLM がそのような単純なプログラムであってもパフォーマンスが低いことを示します。パフォーマンスはコードの長さに応じて急速に低下します。
次に、クリティカルパスと冗長命令を含むプログラムをシミュレートする LLM の機能を調査します。
また、ソートアルゴリズムやネストされたループを使用した直線的なプログラムシミュレーションを超え、ルーチンの計算の複雑さが LLM の実行シミュレーション能力に直接影響することを示します。
LLM は、短いプログラムまたは標準プロシージャの場合にのみ、低いエラーマージンで命令を順番に実行することが観察されています。
LLM のコードシミュレーションは、パターン認識および記憶能力と緊張関係にあります。記憶が有害なタスクについては、コードの実行を 1 行ずつシミュレートする新しいプロンプト方法を提案します。
経験的に、私たちの新しいシミュレーション連鎖 (CoSm) メソッドは、暗記の落とし穴を回避することで、標準的な思考連鎖を促すアプローチを改善しています。

要約(オリジナル)

We investigate the extent to which Large Language Models (LLMs) can simulate the execution of computer code and algorithms. We begin by looking straight line programs, and show that current LLMs demonstrate poor performance even with such simple programs — performance rapidly degrades with the length of code. We then investigate the ability of LLMs to simulate programs that contain critical paths and redundant instructions. We also go beyond straight line program simulation with sorting algorithms and nested loops, and we show the computational complexity of a routine directly affects the ability of an LLM to simulate its execution. We observe that LLMs execute instructions sequentially and with a low error margin only for short programs or standard procedures. LLMs’ code simulation is in tension with their pattern recognition and memorisation capabilities: on tasks where memorisation is detrimental, we propose a novel prompting method to simulate code execution line by line. Empirically, our new Chain of Simulation (CoSm) method improves on the standard Chain of Thought prompting approach by avoiding the pitfalls of memorisation.

arxiv情報

著者	Emanuele La Malfa,Christoph Weinhuber,Orazio Torre,Fangru Lin,Anthony Cohn,Nigel Shadbolt,Michael Wooldridge
発行日	2024-01-17 09:23:59+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Code Simulation Challenges for Large Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー