Semi-Supervised Cognitive State Classification from Speech with Multi-View Pseudo-Labeling

要約

ラベル付きデータの欠如は、音声分類タスク、特に認知状態分類などの広範な主観的評価を必要とするタスクにおいて共通の課題です。
この研究では、半教師あり学習 (SSL) フレームワークを提案し、音響特性と言語特性の両方を活用して分類モデルのトレーニングに最も信頼できるデータを選択する新しいマルチビュー擬似ラベル付け手法を導入します。
音響的には、複数のオーディオエンコーダによって生成されたエンベディングから計算された Frechet オーディオ距離を使用して、ラベルなしデータがラベル付きデータと比較されます。
言語的には、大規模な言語モデルは、提案されたタスク固有の知識に基づいて自動音声認識転写を修正し、ラベルを予測するように促されます。
信頼性の高いデータは、両方のソースからの疑似ラベルが一致する場合に識別され、不一致は信頼性の低いデータとして扱われます。
次に、事前定義された基準が満たされるまで、信頼性の低いデータに繰り返しラベルを付けるように二峰性分類器がトレーニングされます。
感情認識および認知症検出タスクに関する SSL フレームワークを評価します。
実験結果は、ラベル付きデータの 30% のみを使用した完全教師あり学習と比較して、私たちの方法が競合するパフォーマンスを達成し、選択された 2 つのベースラインを大幅に上回るパフォーマンスを示していることを示しています。

要約(オリジナル)

The lack of labeled data is a common challenge in speech classification tasks, particularly those requiring extensive subjective assessment, such as cognitive state classification. In this work, we propose a Semi-Supervised Learning (SSL) framework, introducing a novel multi-view pseudo-labeling method that leverages both acoustic and linguistic characteristics to select the most confident data for training the classification model. Acoustically, unlabeled data are compared to labeled data using the Frechet audio distance, calculated from embeddings generated by multiple audio encoders. Linguistically, large language models are prompted to revise automatic speech recognition transcriptions and predict labels based on our proposed task-specific knowledge. High-confidence data are identified when pseudo-labels from both sources align, while mismatches are treated as low-confidence data. A bimodal classifier is then trained to iteratively label the low-confidence data until a predefined criterion is met. We evaluate our SSL framework on emotion recognition and dementia detection tasks. Experimental results demonstrate that our method achieves competitive performance compared to fully supervised learning using only 30% of the labeled data and significantly outperforms two selected baselines.

arxiv情報

著者	Yuanchao Li,Zixing Zhang,Jing Han,Peter Bell,Catherine Lai
発行日	2024-09-27 11:16:35+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Semi-Supervised Cognitive State Classification from Speech with Multi-View Pseudo-Labeling

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー