Perceptual Musical Features for Interpretable Audio Tagging

要約

音楽ストリーミングプラットフォームの時代において、音楽音声に自動的にタグを付けるタスクは大きな注目を集めており、研究者は標準的なデータセットにおけるパフォーマンスメトリクスの向上を目的とした手法の考案に取り組んでいる。最近のアプローチのほとんどは、ディープニューラルネットワークに依存している。ディープニューラルネットワークは、その印象的な性能にもかかわらず、不透明性を持っており、与えられた入力に対する出力を解明することを困難にしている。解釈可能性の問題は、医学のような他の分野では強調されているが、音楽関連のタスクでは注目されていない。本研究では、自動音楽タグ付けの文脈における解釈可能性の関連性を探った。a）記号的知識の活用、b）補助的なディープニューラルネットワークの活用、c）オーディオファイルから知覚的特徴を抽出するための信号処理の採用である。これらの特徴はその後、タグ予測のための解釈可能な機械学習モデルを訓練するために使用された。我々は2つのデータセット、すなわちMTG-JamendoデータセットとGTZANデータセットで実験を行った。我々の手法は両方のタスクにおいてベースラインモデルの性能を上回り、ある場合には現在の最先端技術との競争力を示した。我々は、性能の劣化が解釈可能性の価値を上回るユースケースが存在すると結論付けた。

要約(オリジナル)

In the age of music streaming platforms, the task of automatically tagging music audio has garnered significant attention, driving researchers to devise methods aimed at enhancing performance metrics on standard datasets. Most recent approaches rely on deep neural networks, which, despite their impressive performance, possess opacity, making it challenging to elucidate their output for a given input. While the issue of interpretability has been emphasized in other fields like medicine, it has not received attention in music-related tasks. In this study, we explored the relevance of interpretability in the context of automatic music tagging. We constructed a workflow that incorporates three different information extraction techniques: a) leveraging symbolic knowledge, b) utilizing auxiliary deep neural networks, and c) employing signal processing to extract perceptual features from audio files. These features were subsequently used to train an interpretable machine-learning model for tag prediction. We conducted experiments on two datasets, namely the MTG-Jamendo dataset and the GTZAN dataset. Our method surpassed the performance of baseline models in both tasks and, in certain instances, demonstrated competitiveness with the current state-of-the-art. We conclude that there are use cases where the deterioration in performance is outweighed by the value of interpretability.

arxiv情報

著者	Vassilis Lyberatos,Spyridon Kantarelis,Edmund Dervakos,Giorgos Stamou
発行日	2024-01-04 15:09:37+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, DeepL

Perceptual Musical Features for Interpretable Audio Tagging

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー