LLM-based speaker diarization correction: A generalizable approach

要約

スピーカーダイアリゼーションは、自動化された音声認識（ASR）ツールを使用して転写される会話を解釈するために必要です。
ダイアリゼーション方法の重要な発展にもかかわらず、ダイアリゼーションの精度は依然として問題です。
ここでは、後処理ステップとしてのダイアリゼーション補正のための大規模な言語モデル（LLM）の使用を調査します。
LLMは、転写された会話の大きなデータセットであるFisher Corpusを使用して微調整されました。
フィッシャーコーパスからのホールドアウトデータセットのダイアリゼーション精度と独立したデータセットを測定するモデルの能力が測定されました。
微調整されたLLMが著しくダイアリゼーションの精度を改善できると報告しています。
ただし、モデルのパフォーマンスは、微調整に使用される転写産物と同じASRツールを使用して生成される転写産物に制約され、一般化を制限します。
この制約に対処するために、3つの別々のモデルの重みを組み合わせることにより、アンサンブルモデルが開発されました。各モデルは、それぞれ異なるASRツールの転写産物を使用して微調整されました。
アンサンブルモデルは、ASR固有の各モデルよりも全体的なパフォーマンスが優れていることを示しており、一般化可能でASRに依存しないアプローチが達成可能であることを示唆しています。
これらのモデルの重みを、https://huggingface.co/bklynhlthのHuggingfaceで公開されました。

要約(オリジナル)

Speaker diarization is necessary for interpreting conversations transcribed using automated speech recognition (ASR) tools. Despite significant developments in diarization methods, diarization accuracy remains an issue. Here, we investigate the use of large language models (LLMs) for diarization correction as a post-processing step. LLMs were fine-tuned using the Fisher corpus, a large dataset of transcribed conversations. The ability of the models to improve diarization accuracy in a holdout dataset from the Fisher corpus as well as an independent dataset was measured. We report that fine-tuned LLMs can markedly improve diarization accuracy. However, model performance is constrained to transcripts produced using the same ASR tool as the transcripts used for fine-tuning, limiting generalizability. To address this constraint, an ensemble model was developed by combining weights from three separate models, each fine-tuned using transcripts from a different ASR tool. The ensemble model demonstrated better overall performance than each of the ASR-specific models, suggesting that a generalizable and ASR-agnostic approach may be achievable. We have made the weights of these models publicly available on HuggingFace at https://huggingface.co/bklynhlth.

arxiv情報

著者	Georgios Efstathiadis,Vijay Yadav,Anzar Abbas
発行日	2025-03-17 13:34:07+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

LLM-based speaker diarization correction: A generalizable approach

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー