General Reasoning Requires Learning to Reason from the Get-go

要約

大規模な言語モデル（LLMS）は、人工的な有用な知能（AUI）を例示する、印象的な現実世界の有用性を実証しています。
しかし、適応的かつ堅牢に推論する能力 – 人工的な一般情報（AGI）の特徴 – は脆弱なままです。
LLMは、常識的な推論、プログラミング、数学で成功しているように見えますが、新しい文脈全体でアルゴリズムの理解を一般化するのに苦労しています。
難解なプログラミング言語でのアルゴリズムタスクを使用した実験は、LLMの推論がトレーニングデータに覆されており、その転送可能性が制限されていることを明らかにしています。
このような限られた移転可能性の根底にあるコア問題は、LLMSの推論と知識の結合であると仮定します。
AUIからAGIへの移行のために、3つの重要な方向を通して知識と推論を解き放つことを提案します。（1）RLを広く使用されている次のトークン予測前orの代替としてRLをゼロから使用するふりをすること、（2）合成タスクのカリキュラムを使用して、RLを学習するためにRLを学習するために、（2）
小さなコンテキストウィンドウを使用して機能して、トークン間のスプリアスな相関を活用します。
このような推論システムは、訓練された検索システムとナレッジストアとしての大規模な外部メモリバンクと相まって、既存のアーキテクチャのいくつかの制限を克服することができます。

要約(オリジナル)

Large Language Models (LLMs) have demonstrated impressive real-world utility, exemplifying artificial useful intelligence (AUI). However, their ability to reason adaptively and robustly — the hallmarks of artificial general intelligence (AGI) — remains fragile. While LLMs seemingly succeed in commonsense reasoning, programming, and mathematics, they struggle to generalize algorithmic understanding across novel contexts. Our experiments with algorithmic tasks in esoteric programming languages reveal that LLM’s reasoning overfits to the training data and is limited in its transferability. We hypothesize that the core issue underlying such limited transferability is the coupling of reasoning and knowledge in LLMs. To transition from AUI to AGI, we propose disentangling knowledge and reasoning through three key directions: (1) pretaining to reason using RL from scratch as an alternative to the widely used next-token prediction pretraining, (2) using a curriculum of synthetic tasks to ease the learning of a \textit{reasoning prior} for RL that can then be transferred to natural language tasks, and (3) learning more generalizable reasoning functions using a small context window to reduce exploiting spurious correlations between tokens. Such a reasoning system coupled with a trained retrieval system and a large external memory bank as a knowledge store can overcome several limitations of existing architectures at learning to reason in novel scenarios.

arxiv情報

著者	Seungwook Han,Jyothish Pari,Samuel J. Gershman,Pulkit Agrawal
発行日	2025-02-26 18:51:12+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

General Reasoning Requires Learning to Reason from the Get-go

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー