Dynamic-SUPERB: Towards A Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark for Speech

要約

テキスト言語モデルは、適切に定式化された指示が提供された場合、目に見えないタスクを一般化する際に顕著なゼロショット能力を示しました。
ただし、音声処理に関する既存の研究は主に、限定されたタスクまたは特定のタスクに焦点を当てています。
さらに、標準化されたベンチマークがないため、さまざまなアプローチ間の公平な比較が妨げられます。
そこで、命令チューニングを活用してゼロショット方式で複数のタスクを実行できるユニバーサル音声モデルを構築するために設計されたベンチマークである Dynamic-SUPERB を紹介します。
多様な音声タスクを包括的にカバーし、指導の調整を活用するために、ベンチマークの動的な成長を促進するために、コミュニティの協力と貢献を呼びかけます。
まず、Dynamic-SUPERB は、33 のタスクと 22 のデータセットを組み合わせた 55 の評価インスタンスを備えています。
これは幅広い次元に及び、評価のための包括的なプラットフォームを提供します。
さらに、ベンチマークのベースラインを確立するためのいくつかのアプローチを提案します。
これには、音声モデル、テキスト言語モデル、マルチモーダルエンコーダーの利用が含まれます。
評価結果は、これらのベースラインは目に見えるタスクでは合理的に機能しますが、目に見えないタスクでは苦戦していることを示しています。
私たちはすべての資料を一般に公開し、プロジェクトに協力して現場の技術を一緒に進歩させる研究者を歓迎します。

要約(オリジナル)

Text language models have shown remarkable zero-shot capability in generalizing to unseen tasks when provided with well-formulated instructions. However, existing studies in speech processing primarily focus on limited or specific tasks. Moreover, the lack of standardized benchmarks hinders a fair comparison across different approaches. Thus, we present Dynamic-SUPERB, a benchmark designed for building universal speech models capable of leveraging instruction tuning to perform multiple tasks in a zero-shot fashion. To achieve comprehensive coverage of diverse speech tasks and harness instruction tuning, we invite the community to collaborate and contribute, facilitating the dynamic growth of the benchmark. To initiate, Dynamic-SUPERB features 55 evaluation instances by combining 33 tasks and 22 datasets. This spans a broad spectrum of dimensions, providing a comprehensive platform for evaluation. Additionally, we propose several approaches to establish benchmark baselines. These include the utilization of speech models, text language models, and the multimodal encoder. Evaluation results indicate that while these baselines perform reasonably on seen tasks, they struggle with unseen ones. We release all materials to the public and welcome researchers to collaborate on the project, advancing technologies in the field together.

arxiv情報

著者	Chien-yu Huang,Ke-Han Lu,Shih-Heng Wang,Chi-Yuan Hsiao,Chun-Yi Kuan,Haibin Wu,Siddhant Arora,Kai-Wei Chang,Jiatong Shi,Yifan Peng,Roshan Sharma,Shinji Watanabe,Bhiksha Ramakrishnan,Shady Shehata,Hung-yi Lee
発行日	2024-03-22 15:25:04+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Dynamic-SUPERB: Towards A Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark for Speech

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー