Do Large Language Models Excel in Complex Logical Reasoning with Formal Language?

要約

大規模な言語モデル（LLM）は、複雑な論理推論タスクで画期的なパフォーマンスを達成することが示されています。
それにもかかわらず、ほとんどの既存の研究は、正式な言語を使用してLLMを導き、信頼できる推論パスを導き出すことに焦点を当てていますが、これらの機能の体系的な評価はまだ限られています。
この論文では、正式な言語を利用するさまざまな論理的推論問題にわたってLLMの包括的な評価を実施することを目指しています。
3次元、つまりLLMSのスペクトル、タスクの分類、および軌跡の形式の観点から、私たちの重要な調査結果は次のとおりです。
2）すべてのLLMは、正式な言語を使用するかどうかに関係なく、帰納的推論能力に制限を示します。
3）ポット形式のデータは、他の言語で最高の一般化パフォーマンスを実現します。
さらに、正式な関連するトレーニングデータをキュレートして小言語モデルをさらに強化します。実験結果は、単純な拒否された微調整方法により、LLMが正式な言語全体で一般化し、最高の全体的なパフォーマンスを達成できるようにすることができることを示しています。
私たちのコードとレポートは、https：//github.com/jiangjin1999/formalevalで入手できます。

要約(オリジナル)

Large Language Models (LLMs) have been shown to achieve breakthrough performance on complex logical reasoning tasks. Nevertheless, most existing research focuses on employing formal language to guide LLMs to derive reliable reasoning paths, while systematic evaluations of these capabilities are still limited. In this paper, we aim to conduct a comprehensive evaluation of LLMs across various logical reasoning problems utilizing formal languages. From the perspective of three dimensions, i.e., spectrum of LLMs, taxonomy of tasks, and format of trajectories, our key findings are: 1) Thinking models significantly outperform Instruct models, especially when formal language is employed; 2) All LLMs exhibit limitations in inductive reasoning capability, irrespective of whether they use a formal language; 3) Data with PoT format achieves the best generalization performance across other languages. Additionally, we also curate the formal-relative training data to further enhance the small language models, and the experimental results indicate that a simple rejected fine-tuning method can better enable LLMs to generalize across formal languages and achieve the best overall performance. Our codes and reports are available at https://github.com/jiangjin1999/FormalEval.

arxiv情報

著者	Jin Jiang,Jianing Wang,Yuchen Yan,Yang Liu,Jianhua Zhu,Mengdi Zhang,Xunliang Cai,Liangcai Gao
発行日	2025-05-22 17:57:23+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Do Large Language Models Excel in Complex Logical Reasoning with Formal Language?

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー