A Language-Agent Approach to Formal Theorem-Proving

要約

外部環境と対話するためにコンテキスト内学習が可能な大規模言語モデル (LLM) を使用する言語エージェントは、タスクを制御するための有望なアプローチとして最近登場しました。
形式的定理証明に対する最初の言語エージェントアプローチを紹介します。
私たちの手法である COPRA は、ステートフルバックトラッキング検索のポリシーの一部として大容量のブラックボックス LLM (GPT-4) を使用します。
検索中に、ポリシーは証明戦術を選択し、外部データベースから補題と定義を取得できます。
選択された各戦術は基礎となる証明フレームワークで実行され、実行フィードバックは次のポリシー呼び出しのプロンプトを構築するために使用されます。
また、検索では、履歴から選択された情報が追跡され、それを使用して幻覚や不必要な LLM クエリが削減されます。
Lean の miniF2F ベンチマークと Compcert プロジェクトの一連の Coq タスクで COPRA の実装を評価します。
これらのベンチマークでは、COPRA は、正しい証明を迅速に見つけるという点で、GPT-4 のワンショット呼び出しや、証明データに基づいて微調整された最先端のモデルよりも大幅に優れています。
コードとデータは https://github.com/trishullab/copra で入手できます。

要約(オリジナル)

Language agents, which use a large language model (LLM) capable of in-context learning to interact with an external environment, have recently emerged as a promising approach to control tasks. We present the first language-agent approach to formal theorem-proving. Our method, COPRA, uses a high-capacity, black-box LLM (GPT-4) as part of a policy for a stateful backtracking search. During the search, the policy can select proof tactics and retrieve lemmas and definitions from an external database. Each selected tactic is executed in the underlying proof framework, and the execution feedback is used to build the prompt for the next policy invocation. The search also tracks selected information from its history and uses it to reduce hallucinations and unnecessary LLM queries. We evaluate our implementation of COPRA on the miniF2F benchmark for Lean and a set of Coq tasks from the Compcert project. On these benchmarks, COPRA significantly outperforms one-shot invocations of GPT-4, as well as state-of-the-art models fine-tuned on proof data, at finding correct proofs quickly. Our code and data are available at https://github.com/trishullab/copra.

arxiv情報

著者	Amitayush Thakur,Yeming Wen,Swarat Chaudhuri
発行日	2023-12-13 13:18:47+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

A Language-Agent Approach to Formal Theorem-Proving

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー