Exploring the Compositional Deficiency of Large Language Models in Mathematical Reasoning

要約

人間の認知は系統的な構成性、つまり学習された有限の構成要素から無限の新しい組み合わせを生成する代数的能力を示し、これが複雑な論理を理解し推論するための鍵となります。
この研究では、数学的推論における大規模言語モデル (LLM) の構成性を調査します。
具体的には、注意深く設計された論理トラップを MATH と GSM8K の問題記述に導入することにより、新しいデータセット \textsc{MathTrap} を構築します。
論理的欠陥を伴う問題は現実の世界では非常にまれであるため、これらは LLM にとって「目に見えない」ケースとなります。
これらを解決するには、モデルが (1) 元の問題に含まれる数学的知識と (2) 導入されたトラップに関連する知識を体系的に構成する必要があります。
私たちの実験によると、LLM は必要な知識の両方の要素を持っていますが、それらを \textbf{自発的に}組み合わせてこれらの新しいケースを処理するわけではありません。
私たちは、自然言語プロンプト、数ショットのデモンストレーション、微調整など、この欠点を軽減するいくつかの方法を検討しています。
さらに、最近リリースされた OpenAI o1 モデルをテストしたところ、人間のような「遅い思考」が LLM の構成性の向上に役立つことがわかりました。
全体として、体系的な構成性は、大規模な言語モデルにとって未解決の課題のままです。

要約(オリジナル)

Human cognition exhibits systematic compositionality, the algebraic ability to generate infinite novel combinations from finite learned components, which is the key to understanding and reasoning about complex logic. In this work, we investigate the compositionality of large language models (LLMs) in mathematical reasoning. Specifically, we construct a new dataset \textsc{MathTrap} by introducing carefully designed logical traps into the problem descriptions of MATH and GSM8K. Since problems with logical flaws are quite rare in the real world, these represent ‘unseen’ cases to LLMs. Solving these requires the models to systematically compose (1) the mathematical knowledge involved in the original problems with (2) knowledge related to the introduced traps. Our experiments show that while LLMs possess both components of requisite knowledge, they do not \textbf{spontaneously} combine them to handle these novel cases. We explore several methods to mitigate this deficiency, such as natural language prompts, few-shot demonstrations, and fine-tuning. Additionally, we test the recently released OpenAI o1 model and find that human-like `slow thinking’ helps improve the compositionality of LLMs. Overall, systematic compositionality remains an open challenge for large language models.

arxiv情報

著者	Jun Zhao,Jingqi Tong,Yurong Mou,Ming Zhang,Qi Zhang,Xuanjing Huang
発行日	2024-10-10 14:38:37+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Exploring the Compositional Deficiency of Large Language Models in Mathematical Reasoning

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー