Models That Prove Their Own Correctness

要約

関心のある特定の入力に対する学習済みモデルの正確性をどのように信頼できるでしょうか?
モデルの精度は通常、入力の分布に対して \emph{平均して} 測定され、固定入力については保証されません。
この論文では、この問題に対する理論的に根拠のある解決策を提案します。それは、対話型証明を介して検証アルゴリズム $V$ に対する出力の正しさを証明する *自己証明モデル* をトレーニングすることです。
自己証明モデルは、ランダムな入力に対して高い確率でモデルが正しい出力を生成し、 \emph{そして} $V\!$ に対してその正しさを首尾よく証明することを満たします。
$V$ の *健全性* プロパティは、*すべての* 入力について、どのモデルも誤った出力の正しさを $V$ に納得させることができないことを保証します。
したがって、自己証明モデルはその出力のほとんどが正しいことを証明しますが、(任意のモデルの) *すべて* 正しくない出力は $V$ によって検出されます。
私たちは自己証明モデルを学習するための一般的な方法を考案し、特定の仮定の下で収束限界を証明します。
理論的枠組みと結果は、2 つの整数の最大公約数 (GCD) を計算する算術機能に関する実験によって補完されます。
私たちの学習方法は、GCD を計算し、*そして*その答えの正しさを証明する自己証明変換器をトレーニングするために使用されます。

要約(オリジナル)

How can we trust the correctness of a learned model on a particular input of interest? Model accuracy is typically measured \emph{on average} over a distribution of inputs, giving no guarantee for any fixed input. This paper proposes a theoretically-founded solution to this problem: to train *Self-Proving models* that prove the correctness of their output to a verification algorithm $V$ via an Interactive Proof. Self-Proving models satisfy that, with high probability over a random input, the model generates a correct output \emph{and} successfully proves its correctness to $V\!$. The *soundness* property of $V$ guarantees that, for *every* input, no model can convince $V$ of the correctness of an incorrect output. Thus, a Self-Proving model proves correctness of most of its outputs, while *all* incorrect outputs (of any model) are detected by $V$. We devise a generic method for learning Self-Proving models, and we prove convergence bounds under certain assumptions. The theoretical framework and results are complemented by experiments on an arithmetic capability: computing the greatest common divisor (GCD) of two integers. Our learning method is used to train a Self-Proving transformer that computes the GCD *and* proves the correctness of its answer.

arxiv情報

著者	Noga Amit,Shafi Goldwasser,Orr Paradise,Guy Rothblum
発行日	2024-05-24 17:10:08+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Models That Prove Their Own Correctness

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー