A Mamba-based Network for Semi-supervised Singing Melody Extraction Using Confidence Binary Regularization

要約

Singing Melody Extraction（SME）は、音楽情報検索の分野で重要なタスクです。
ただし、既存の方法はいくつかの制限に直面しています。第一に、以前のモデルは変圧器を使用してコンテキスト依存関係をキャプチャします。これにより、推論段階での効率が低い場合の2次計算が必要です。
第二に、以前の作品は通常、周波数拡散方法に依存して基本的な頻度（F0）を推定します。これは、音楽のパフォーマンスが実際にメモに基づいていることを無視します。
第三に、トランスは通常、最適なパフォーマンスを実現するために大量のラベル付きデータを必要としますが、SMEタスクには十分な注釈付きデータがありません。
これらの問題に対処するために、このホワイトペーパーでは、自信のバイナリ正規化を使用した半学者の歌唱メロディー抽出のために、Spectmambaと呼ばれるMambaベースのネットワークを提案します。
特に、計算線形の複雑さを実現するためにVision Mambaを導入することから始めます。
次に、モデルが音楽パフォーマンスをよりよく模倣できるようにする新しいNote-F0デコーダーを提案します。
さらに、ラベル付けされたデータの希少性を軽減するために、正しいクラスの確率を最大化することにより、自信のあるバイナリ正規化（CBR）モジュールを導入して、ラベルのないデータを活用します。
提案された方法はいくつかの公開データセットで評価され、実施された実験は提案された方法の有効性を示しています。

要約(オリジナル)

Singing melody extraction (SME) is a key task in the field of music information retrieval. However, existing methods are facing several limitations: firstly, prior models use transformers to capture the contextual dependencies, which requires quadratic computation resulting in low efficiency in the inference stage. Secondly, prior works typically rely on frequencysupervised methods to estimate the fundamental frequency (f0), which ignores that the musical performance is actually based on notes. Thirdly, transformers typically require large amounts of labeled data to achieve optimal performances, but the SME task lacks of sufficient annotated data. To address these issues, in this paper, we propose a mamba-based network, called SpectMamba, for semi-supervised singing melody extraction using confidence binary regularization. In particular, we begin by introducing vision mamba to achieve computational linear complexity. Then, we propose a novel note-f0 decoder that allows the model to better mimic the musical performance. Further, to alleviate the scarcity of the labeled data, we introduce a confidence binary regularization (CBR) module to leverage the unlabeled data by maximizing the probability of the correct classes. The proposed method is evaluated on several public datasets and the conducted experiments demonstrate the effectiveness of our proposed method.

arxiv情報

著者	Xiaoliang He,Kangjie Dong,Jingkai Cao,Shuai Yu,Wei Li,Yi Yu
発行日	2025-05-13 15:43:35+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

A Mamba-based Network for Semi-supervised Singing Melody Extraction Using Confidence Binary Regularization

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー