Evaluation of Deep Audio Representations for Hearables

要約

効果的に聞こえるデバイスには、ユーザーの周りの音響環境を理解する必要があります。
サウンドシーンの計算分析では、ファンデーションモデルが最先端として登場し、高性能で堅牢な多目的オーディオ表現を生み出しています。
聞くことができるために必須の音響特性をキャプチャする際の基礎モデルの有効性を評価するための最初のデータセットとベンチマークであるオーディオ表現（DEAR）の深い評価を紹介してリリースします。
データセットには、1,158のオーディオトラックが含まれており、それぞれ30秒の長さのオーディオトラックがあり、独自のモノローグと日常の音響シーンの商業的な高品質の録音と空間的に混合することによって作成されています。
当社のベンチマークには、オーディオシーンの一般的なコンテキスト、音声源、および技術的な音響特性を評価する8つのタスクが含まれます。
4つの汎用オーディオ表現モデルの評価を通じて、Beatsモデルがカウンターパートを大幅に上回ることを示します。
この優位性は、多様なオーディオコレクションでトレーニングされたモデルの利点を強調し、聞くことができるステアリングに必要な環境特性をエンコードするなど、幅広い聴覚タスクへの適用性を確認します。
親愛なるデータセットと関連するコードは、https：//dear-dataset.github.ioで入手できます。

要約(オリジナル)

Effectively steering hearable devices requires understanding the acoustic environment around the user. In the computational analysis of sound scenes, foundation models have emerged as the state of the art to produce high-performance, robust, multi-purpose audio representations. We introduce and release Deep Evaluation of Audio Representations (DEAR), the first dataset and benchmark to evaluate the efficacy of foundation models in capturing essential acoustic properties for hearables. The dataset includes 1,158 audio tracks, each 30 seconds long, created by spatially mixing proprietary monologues with commercial, high-quality recordings of everyday acoustic scenes. Our benchmark encompasses eight tasks that assess the general context, speech sources, and technical acoustic properties of the audio scenes. Through our evaluation of four general-purpose audio representation models, we demonstrate that the BEATs model significantly surpasses its counterparts. This superiority underscores the advantage of models trained on diverse audio collections, confirming their applicability to a wide array of auditory tasks, including encoding the environment properties necessary for hearable steering. The DEAR dataset and associated code are available at https://dear-dataset.github.io.

arxiv情報

著者	Fabian Gröger,Pascal Baumann,Ludovic Amruthalingam,Laurent Simon,Ruksana Giurda,Simone Lionetti
発行日	2025-02-10 16:51:11+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

Evaluation of Deep Audio Representations for Hearables

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー