Can Authorship Attribution Models Distinguish Speakers in Speech Transcripts?

要約

著者の検証は、2つの異なるライティングサンプルが同じ著者を共有しているかどうかを判断するタスクであり、通常、書かれたテキストの帰属に関係しています。
この論文では、転写されたスピーチの帰属を調査します。これは、新しい課題をもたらします。
主な課題は、句読点や大文字などの多くのスタイルの特徴がこの設定では有益ではないことです。
一方、転写された音声は、異なるスピーカーの特徴である可能性のあるフィラーの単語やバックチャネル（「um」、「uh-huh」など）などの他のパターンを示します。
人間転写された会話音声成績証明書に焦点を当てたスピーカーの帰属の新しいベンチマークを提案します。
スピーカーとトピックとの偽の関連性を制限するために、同じ会話に参加する会話プロンプトとスピーカーの両方を使用して、さまざまな困難の検証試験を作成します。
私たちは、一連のニューラルと非ネオラルのベースラインを比較することにより、この新しいベンチマークの最先端を確立し、書かれたテキストの帰属モデルは特定の設定で驚くほど良いパフォーマンスを達成しますが、会話のトピックがますます制御されるにつれて著しく悪いパフォーマンスを発揮することを発見しました。
転写スタイルがパフォーマンスに与える影響の分析と、パフォーマンスを改善するための音声転写産物に対する微調整の能力を示します。

要約(オリジナル)

Authorship verification is the task of determining if two distinct writing samples share the same author and is typically concerned with the attribution of written text. In this paper, we explore the attribution of transcribed speech, which poses novel challenges. The main challenge is that many stylistic features, such as punctuation and capitalization, are not informative in this setting. On the other hand, transcribed speech exhibits other patterns, such as filler words and backchannels (e.g., ‘um’, ‘uh-huh’), which may be characteristic of different speakers. We propose a new benchmark for speaker attribution focused on human-transcribed conversational speech transcripts. To limit spurious associations of speakers with topic, we employ both conversation prompts and speakers participating in the same conversation to construct verification trials of varying difficulties. We establish the state of the art on this new benchmark by comparing a suite of neural and non-neural baselines, finding that although written text attribution models achieve surprisingly good performance in certain settings, they perform markedly worse as conversational topic is increasingly controlled. We present analyses of the impact of transcription style on performance as well as the ability of fine-tuning on speech transcripts to improve performance.

arxiv情報

著者	Cristina Aggazzotti,Nicholas Andrews,Elizabeth Allyn Smith
発行日	2025-05-16 15:04:56+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Can Authorship Attribution Models Distinguish Speakers in Speech Transcripts?

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー