Learning Horn Envelopes via Queries from Large Language Models

要約

私たちは、オラクルへのメンバーシップおよび等価性クエリを使用した、Angluin の正確な学習モデルに基づいて、トレーニングされたニューラルネットワークから知識を抽出するアプローチを調査します。
このアプローチでは、オラクルは訓練されたニューラルネットワークです。
私たちは、ホーン理論を学習するための Angluin の古典的なアルゴリズムを検討し、それをニューラルネットワークからの学習に適用できるようにするために必要な変更を研究します。
特に、訓練されたニューラルネットワークがホーンオラクルとして動作しない可能性があることを考慮する必要があります。これは、根底にあるターゲット理論がホーンではない可能性があることを意味します。
我々は、ターゲット理論の「最も厳密なホーン近似」を抽出することを目的とした新しいアルゴリズムを提案します。このアルゴリズムは、指数関数的時間 (最悪の場合) およびターゲットに多項式的に多数の非ホーン例がある場合には多項式時間で終了することが保証されています。
このアプローチの適用可能性を示すために、事前にトレーニングされた言語モデルで実験を実行し、職業に基づくジェンダーバイアスを明らかにするルールを抽出します。

要約(オリジナル)

We investigate an approach for extracting knowledge from trained neural networks based on Angluin’s exact learning model with membership and equivalence queries to an oracle. In this approach, the oracle is a trained neural network. We consider Angluin’s classical algorithm for learning Horn theories and study the necessary changes to make it applicable to learn from neural networks. In particular, we have to consider that trained neural networks may not behave as Horn oracles, meaning that their underlying target theory may not be Horn. We propose a new algorithm that aims at extracting the ‘tightest Horn approximation’ of the target theory and that is guaranteed to terminate in exponential time (in the worst case) and in polynomial time if the target has polynomially many non-Horn examples. To showcase the applicability of the approach, we perform experiments on pre-trained language models and extract rules that expose occupation-based gender biases.

arxiv情報

著者	Sophie Blum,Raoul Koudijs,Ana Ozaki,Samia Touileb
発行日	2023-09-13 11:49:29+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Learning Horn Envelopes via Queries from Large Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー