Synthia’s Melody: A Benchmark Framework for Unsupervised Domain Adaptation in Audio

要約

視覚と自然言語の深層学習は大幅に進歩しましたが、音声における教師なしドメイン適応は比較的未開発のままです。
私たちは、これは適切なベンチマークデータセットが欠如していることが部分的に原因であると考えています。
このギャップに対処するために、私たちは Synthia のメロディーを紹介します。これは、音楽のキー、音色、ラウドネスによって特徴付けられるユーザー指定の交絡構造を使用して、無限に多様な 4 秒のメロディーをシミュレートできる新しいオーディオデータ生成フレームワークです。
観察環境下で収集された既存のデータセットとは異なり、シンシアのメロディーには観察されないバイアスがなく、実験の再現性と比較可能性が保証されています。
その有用性を示すために、2 種類の分布シフト (ドメインシフトとサンプル選択バイアス) を生成し、これらのシフトの下で音響深層学習モデルのパフォーマンスを評価します。
私たちの評価により、Synthia のメロディーは、さまざまなレベルの分布シフトに対するこれらのモデルの感受性を調べるための堅牢なテストベッドを提供することが明らかになりました。

要約(オリジナル)

Despite significant advancements in deep learning for vision and natural language, unsupervised domain adaptation in audio remains relatively unexplored. We, in part, attribute this to the lack of an appropriate benchmark dataset. To address this gap, we present Synthia’s melody, a novel audio data generation framework capable of simulating an infinite variety of 4-second melodies with user-specified confounding structures characterised by musical keys, timbre, and loudness. Unlike existing datasets collected under observational settings, Synthia’s melody is free of unobserved biases, ensuring the reproducibility and comparability of experiments. To showcase its utility, we generate two types of distribution shifts-domain shift and sample selection bias-and evaluate the performance of acoustic deep learning models under these shifts. Our evaluations reveal that Synthia’s melody provides a robust testbed for examining the susceptibility of these models to varying levels of distribution shift.

arxiv情報

著者	Chia-Hsin Lin,Charles Jones,Björn W. Schuller,Harry Coppock
発行日	2023-09-26 15:46:06+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Synthia’s Melody: A Benchmark Framework for Unsupervised Domain Adaptation in Audio

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー