Contextual Paralinguistic Data Creation for Multi-Modal Speech-LLM: Data Condensation and Spoken QA Generation

要約

現在の音声-LLMは、主に両方の側面をカバーする質問回答（QA）データセットの欠如のために、麻痺性の理解とともにコンテキストの推論において限られた能力を示します。
コンテキストの推論を麻言語情報と統合する、ワイルド内の音声データからデータセット生成のための新しいフレームワークを提案します。
これは、野生の発話とLLMベースの文脈的パラリング語QA（CPQA）生成の擬似麻痺性ラベルベースのデータ凝縮で構成されています。
有効性は、フレームワークとヒト生成されたCPQAデータセットによって作成されたデータセット上のQWEN2-Audio-7B-Instructモデルの評価における強い相関によって検証されます。
また、この結果は、共感的な推論タスクの処理における音声-LLMの制限を明らかにし、そのようなデータセットとより堅牢なモデルの必要性を強調しています。
提案されたフレームワークは、この種の最初のものであり、麻痺性の推論能力を備えたより堅牢な音声llmをトレーニングする可能性があります。

要約(オリジナル)

Current speech-LLMs exhibit limited capability in contextual reasoning alongside paralinguistic understanding, primarily due to the lack of Question-Answer (QA) datasets that cover both aspects. We propose a novel framework for dataset generation from in-the-wild speech data, that integrates contextual reasoning with paralinguistic information. It consists of a pseudo paralinguistic label-based data condensation of in-the-wild speech and LLM-based Contextual Paralinguistic QA (CPQA) generation. The effectiveness is validated by a strong correlation in evaluations of the Qwen2-Audio-7B-Instruct model on a dataset created by our framework and human-generated CPQA dataset. The results also reveal the speech-LLM’s limitations in handling empathetic reasoning tasks, highlighting the need for such datasets and more robust models. The proposed framework is first of its kind and has potential in training more robust speech-LLMs with paralinguistic reasoning capabilities.

arxiv情報

著者	Qiongqiong Wang,Hardik B. Sailor,Tianchi Liu,Ai Ti Aw
発行日	2025-05-19 16:47:46+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Contextual Paralinguistic Data Creation for Multi-Modal Speech-LLM: Data Condensation and Spoken QA Generation

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー