AQUALLM: Audio Question Answering Data Generation Using Large Language Models

要約

音声質問応答 (AQA) は、機械が音声信号と自然言語の質問の両方を分析して、正確な自然言語の回答を生成する極めて重要なタスクを構成します。
AQA システムの精度を目指す上で、高品質で多様かつ広範な AQA データセットを保有する重要性は、どれだけ強調してもしすぎることはありません。
正確で効率的な AQA モデルの開発には注目が集まっていますが、当面の特定のタスクのための高品質で多様かつ広範なデータセットの作成には大きな注目が集まっていません。
この課題に対処するために、この研究はいくつかの貢献を行っています。
大規模言語モデル (LLM) に依存する、AQUALLM フレームワークと呼ばれるスケーラブルな AQA データ生成パイプラインを導入します。
このフレームワークは、既存の音声キャプションアノテーションを利用し、最先端の LLM を組み込んで、拡張的で高品質の AQA データセットを生成します。
さらに、AQA の 3 つの広範で高品質なベンチマークデータセットを紹介し、AQA 研究の進歩に大きく貢献します。
提案されたデータセットでトレーニングされた AQA モデルは、既存の最先端のものと比較して優れたベンチマークを設定します。
さらに、私たちのデータセットでトレーニングされたモデルは、人間が注釈を付けた AQA データを使用してトレーニングされたモデルと比較して、一般化性が向上していることがわかります。
コードとデータセットは、GitHub~\footnote{\url{https://github.com/swarupbehera/AQUALLM}} からアクセスできます。

要約(オリジナル)

Audio Question Answering (AQA) constitutes a pivotal task in which machines analyze both audio signals and natural language questions to produce precise natural language answers. The significance of possessing high-quality, diverse, and extensive AQA datasets cannot be overstated when aiming for the precision of an AQA system. While there has been notable focus on developing accurate and efficient AQA models, the creation of high-quality, diverse, and extensive datasets for the specific task at hand has not garnered considerable attention. To address this challenge, this work makes several contributions. We introduce a scalable AQA data generation pipeline, denoted as the AQUALLM framework, which relies on Large Language Models (LLMs). This framework utilizes existing audio-caption annotations and incorporates state-of-the-art LLMs to generate expansive, high-quality AQA datasets. Additionally, we present three extensive and high-quality benchmark datasets for AQA, contributing significantly to the progression of AQA research. AQA models trained on the proposed datasets set superior benchmarks compared to the existing state-of-the-art. Moreover, models trained on our datasets demonstrate enhanced generalizability when compared to models trained using human-annotated AQA data. Code and datasets will be accessible on GitHub~\footnote{\url{https://github.com/swarupbehera/AQUALLM}}.

arxiv情報

著者	Swarup Ranjan Behera,Krishna Mohan Injeti,Jaya Sai Kiran Patibandla,Praveen Kumar Pokala,Balakrishna Reddy Pailla
発行日	2023-12-28 20:01:27+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

AQUALLM: Audio Question Answering Data Generation Using Large Language Models

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー