When Do Program-of-Thoughts Work for Reasoning?

要約

大規模言語モデル (LLM) の推論機能は、身体化された人工知能の領域で極めて重要な役割を果たします。
プログラミング言語を使用して複雑な推論タスクに取り組む、LLM 向けの思考プログラムプロンプトのような効果的な方法はありますが、推論能力の向上に対するコードデータの具体的な影響はまだ調査が不十分です。
このギャップに対処するために、コードと推論能力の間の相関関係を測定するために、構造的属性と論理的属性を組み合わせた複雑性に影響された推論スコア (CIRS) を提案します。
具体的には、抽象構文ツリーを使用して構造情報をエンコードし、難易度と循環的複雑さを考慮して論理複雑さを計算します。
経験的な分析を通じて、複雑なコードデータのすべてが LLM によって学習または理解できるわけではないことがわかりました。
プログラム支援プロンプトによる推論能力の向上には、最適なレベルの複雑さが重要です。
次に、自動合成および階層化アルゴリズムを設計し、それを数学的推論のための命令生成と、コード生成タスクのためのコードデータフィルタリングに適用します。
広範な結果は、私たちが提案したアプローチの有効性を示しています。
コードは https://github.com/zjunlp/EasyInstruct にある EasyInstruct フレームワークに統合されます。

要約(オリジナル)

The reasoning capabilities of Large Language Models (LLMs) play a pivotal role in the realm of embodied artificial intelligence. Although there are effective methods like program-of-thought prompting for LLMs which uses programming language to tackle complex reasoning tasks, the specific impact of code data on the improvement of reasoning capabilities remains under-explored. To address this gap, we propose complexity-impacted reasoning score (CIRS), which combines structural and logical attributes, to measure the correlation between code and reasoning abilities. Specifically, we use the abstract syntax tree to encode the structural information and calculate logical complexity by considering the difficulty and the cyclomatic complexity. Through an empirical analysis, we find not all code data of complexity can be learned or understood by LLMs. Optimal level of complexity is critical to the improvement of reasoning abilities by program-aided prompting. Then we design an auto-synthesizing and stratifying algorithm, and apply it to instruction generation for mathematical reasoning and code data filtering for code generation tasks. Extensive results demonstrates the effectiveness of our proposed approach. Code will be integrated into the EasyInstruct framework at https://github.com/zjunlp/EasyInstruct.

arxiv情報

著者	Zhen Bi,Ningyu Zhang,Yinuo Jiang,Shumin Deng,Guozhou Zheng,Huajun Chen
発行日	2023-09-08 02:31:35+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

When Do Program-of-Thoughts Work for Reasoning?

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー