Rethinking Emotion Bias in Music via Frechet Audio Distance

要約

音楽の感情の主観的な性質により、特に単一のオーディオエンコーダー、感情分類子、または評価基準に依存する場合、認識と生成の両方に固有のバイアスが生じます。
この研究では、リファレンスフリーの評価指標である Frechet Audio Distance (FAD) とともに多様なオーディオエンコーダを使用して、音楽感情認識 (MER) と感情音楽生成 (EMG) に関する研究を実施します。
私たちの調査は、MER のベンチマーク評価から始まり、単一のオーディオエンコーダーの使用に関連する制限と、さまざまな測定で観察された差異を強調しています。
次に、音楽の感情をより客観的に測定するために、複数のエンコーダーからの FAD を使用して MER パフォーマンスを評価することを提案します。
さらに、生成された音楽感情のバリエーションと卓越性の両方を改善し、リアリズムを高めるように設計された強化された EMG アプローチを導入します。
さらに、EMG モデルを 2 つのベースラインモデルと比較して、実際の音楽と合成音楽で伝わる感情の間のリアリズムの差異を調査します。
実験結果は、MER と EMG の両方における感情バイアスの問題を強調し、FAD と多様なオーディオエンコーダーを使用して音楽の感情を客観的に評価する可能性を示しています。

要約(オリジナル)

The subjective nature of music emotion introduces inherent bias in both recognition and generation, especially when relying on a single audio encoder, emotion classifier, or evaluation metric. In this work, we conduct a study on Music Emotion Recognition (MER) and Emotional Music Generation (EMG), employing diverse audio encoders alongside the Frechet Audio Distance (FAD), a reference-free evaluation metric. Our study begins with a benchmark evaluation of MER, highlighting the limitations associated with using a single audio encoder and the disparities observed across different measurements. We then propose assessing MER performance using FAD from multiple encoders to provide a more objective measure of music emotion. Furthermore, we introduce an enhanced EMG approach designed to improve both the variation and prominence of generated music emotion, thus enhancing realism. Additionally, we investigate the realism disparities between the emotions conveyed in real and synthetic music, comparing our EMG model against two baseline models. Experimental results underscore the emotion bias problem in both MER and EMG and demonstrate the potential of using FAD and diverse audio encoders to evaluate music emotion objectively.

arxiv情報

著者	Yuanchao Li,Azalea Gui,Dimitra Emmanouilidou,Hannes Gamper
発行日	2024-09-27 11:28:04+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Rethinking Emotion Bias in Music via Frechet Audio Distance

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー