NatureLM-audio: an Audio-Language Foundation Model for Bioacoustics

要約

テキストと音声で促される大規模言語モデル (LLM) は、スピーチ、音楽、一般的な音声を含むさまざまな聴覚タスクの最先端を表し、目に見えないタスクでの創発的な能力を示します。
しかし、これらの機能は、大規模な録音からの動物の鳴き声の検出、希少種や絶滅危惧種の分類、状況や行動のラベル付けなどの生物音響タスクではまだ十分に実証されていません。これらのタスクは、保全、生物多様性のモニタリング、動物の研究にとって重要です。
行動。
この研究では、生物音響学のために特別に設計された初の音声言語基盤モデル、NatureLM-audio を紹介します。
私たちの慎重に精選されたトレーニングデータセットは、さまざまな生体音響、音声、音楽データにわたるテキストと音声のペアで構成されており、現場の限られた注釈付きデータセットによってもたらされる課題に対処するように設計されています。
私たちは、音楽や音声から生体音響への学習された表現の転移に成功したことを実証し、私たちのモデルは、目に見えない分類群やタスクへの有望な一般化を示しています。
重要なのは、NatureLM-audio を新しいベンチマーク (BEANS-Zero) でテストし、未確認種のゼロショット分類を含むいくつかの生体音響タスクで新しい最先端 (SotA) を確立していることです。
生物音響研究を推進するために、モデルのトレーニングだけでなく、トレーニングデータとベンチマークデータを生成するためのコードもオープンソース化しています。

要約(オリジナル)

Large language models (LLMs) prompted with text and audio represent the state of the art in various auditory tasks, including speech, music, and general audio, showing emergent abilities on unseen tasks. However, these capabilities have yet to be fully demonstrated in bioacoustics tasks, such as detecting animal vocalizations in large recordings, classifying rare and endangered species, and labeling context and behavior – tasks that are crucial for conservation, biodiversity monitoring, and the study of animal behavior. In this work, we present NatureLM-audio, the first audio-language foundation model specifically designed for bioacoustics. Our carefully curated training dataset comprises text-audio pairs spanning a diverse range of bioacoustics, speech, and music data, designed to address the challenges posed by limited annotated datasets in the field. We demonstrate successful transfer of learned representations from music and speech to bioacoustics, and our model shows promising generalization to unseen taxa and tasks. Importantly, we test NatureLM-audio on a novel benchmark (BEANS-Zero) and it sets the new state of the art (SotA) on several bioacoustics tasks, including zero-shot classification of unseen species. To advance bioacoustics research, we also open-source the code for generating training and benchmark data, as well as for training the model.

arxiv情報

著者	David Robinson,Marius Miron,Masato Hagiwara,Olivier Pietquin
発行日	2024-11-11 18:01:45+00:00
arxivサイト	arxiv_id(pdf)

提供元, 利用サービス

arxiv.jp, Google

NatureLM-audio: an Audio-Language Foundation Model for Bioacoustics

要約

要約(オリジナル)

arxiv情報

提供元, 利用サービス

最近の投稿

最近のコメント

アーカイブ

カテゴリー