Meta Audiobox Aesthetics: Unified Automatic Quality Assessment for Speech, Music, and Sound

要約

オーディオ美学の定量化は、主にその主観的な性質のために、人間の認識と文化的文脈の影響を受けているため、オーディオ処理における複雑な課題のままです。
従来の方法は、多くの場合、人間のリスナーに評価に依存し、不一致と高いリソースの要求につながります。
このペーパーでは、人間の介入なしにオーディオの美学を予測できる自動システムの必要性の高まりについて説明します。
このようなシステムは、特にこれらのモデルがより洗練されるにつれて、データフィルタリング、擬似ラベルの大規模データセット、生成オーディオモデルの評価などのアプリケーションにとって重要です。
この作業では、人間のリスニングの視点を4つの異なる軸に分解する新しいアノテーションガイドラインを提案することにより、オーディオ美的評価への新しいアプローチを紹介します。
オーディオ品質のより微妙な評価を提供する、項目ごとの予測モデルを開発およびトレーニングします。
私たちのモデルは、人間の平均意見スコア（MO）および既存の方法に対して評価され、同等のパフォーマンスまたは優れたパフォーマンスを実証します。
この研究は、オーディオ美学の分野を進歩させるだけでなく、オープンソースモデルとデータセットを提供して、将来の作業とベンチマークを促進します。
https://github.com/facebookresearch/audiobox-aestheticsでコードと事前に訓練されたモデルをリリースします

要約(オリジナル)

The quantification of audio aesthetics remains a complex challenge in audio processing, primarily due to its subjective nature, which is influenced by human perception and cultural context. Traditional methods often depend on human listeners for evaluation, leading to inconsistencies and high resource demands. This paper addresses the growing need for automated systems capable of predicting audio aesthetics without human intervention. Such systems are crucial for applications like data filtering, pseudo-labeling large datasets, and evaluating generative audio models, especially as these models become more sophisticated. In this work, we introduce a novel approach to audio aesthetic evaluation by proposing new annotation guidelines that decompose human listening perspectives into four distinct axes. We develop and train no-reference, per-item prediction models that offer a more nuanced assessment of audio quality. Our models are evaluated against human mean opinion scores (MOS) and existing methods, demonstrating comparable or superior performance. This research not only advances the field of audio aesthetics but also provides open-source models and datasets to facilitate future work and benchmarking. We release our code and pre-trained model at: https://github.com/facebookresearch/audiobox-aesthetics

arxiv情報

著者	Andros Tjandra,Yi-Chiao Wu,Baishan Guo,John Hoffman,Brian Ellis,Apoorv Vyas,Bowen Shi,Sanyuan Chen,Matt Le,Nick Zacharov,Carleigh Wood,Ann Lee,Wei-Ning Hsu
発行日	2025-02-07 18:15:57+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Meta Audiobox Aesthetics: Unified Automatic Quality Assessment for Speech, Music, and Sound

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー