CAVE: Controllable Authorship Verification Explanations

要約

著者証明 (AV) (2 つの文書の著者が同じか?) は、多くの機密性の高い現実のアプリケーションでは不可欠です。
AV はプライベートなオフラインモデルを必要とする独自ドメインで使用されることが多いため、ChatGPT のような SOTA オンラインモデルは望ましくありません。
しかし、現在のオフラインモデルは、精度/スケーラビリティが低く (例: 従来のスタイロメトリー AV システム)、アクセス可能な事後説明が欠如しているため、下流での有用性が低くなります。
この研究では、トレーニング済みのオフライン Llama-3-8B モデル CAVE (制御可能な著者検証の説明) を使用して、上記の課題に対処するための最初のステップを実行します。CAVE は、(1) 構造化されるように制御されるフリーテキストの AV 説明を生成します (
関連する言語的特徴の観点からサブ説明に分解できます)、(2) 説明とラベルの一貫性を簡単に検証できます (サブ説明の中間ラベルを介して)。
まず、SOTA 教師モデルからシルバートレーニングデータを目的の CAVE 出力形式で生成できるプロンプトを設計します。
次に、このデータをフィルタリングして抽出し、事前トレーニング済みの Llama-3-8B (慎重に選択した学生モデル) を作成します。
3 つの困難な AV データセット IMDb62、Blog-Auth、および Fanfiction の結果は、CAVE が高品質の説明 (自動および人間の評価によって測定) と競合するタスク精度を生成することを示しています。

要約(オリジナル)

Authorship Verification (AV) (do two documents have the same author?) is essential in many sensitive real-life applications. AV is often used in proprietary domains that require a private, offline model, making SOTA online models like ChatGPT undesirable. Current offline models however have lower downstream utility due to low accuracy/scalability (eg: traditional stylometry AV systems) and lack of accessible post-hoc explanations. In this work, we take the first step to address the above challenges with our trained, offline Llama-3-8B model CAVE (Controllable Authorship Verification Explanations): CAVE generates free-text AV explanations that are controlled to be (1) structured (can be decomposed into sub-explanations in terms of relevant linguistic features), and (2) easily verified for explanation-label consistency (via intermediate labels in sub-explanations). We first engineer a prompt that can generate silver training data from a SOTA teacher model in the desired CAVE output format. We then filter and distill this data into a pretrained Llama-3-8B, our carefully selected student model. Results on three difficult AV datasets IMDb62, Blog-Auth, and Fanfiction show that CAVE generates high quality explanations (as measured by automatic and human evaluation) as well as competitive task accuracies.

arxiv情報

著者	Sahana Ramnath,Kartik Pandey,Elizabeth Boschee,Xiang Ren
発行日	2024-09-05 06:44:24+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

CAVE: Controllable Authorship Verification Explanations

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー