2025-12-25
animal2vec and MeerKAT: A self‐supervised transformer for rare‐event raw audio input and a large‐scale reference dataset for bioacoustics
Publication
Publication
Methods in Ecology and Evolution , Volume 2025
Bioacoustic research, vital for promoting conservation and understanding animal behaviour and ecology, faces a monumental challenge: analysing vast datasets where animal vocalizations are rare. While deep learning techniques are becoming standard, adapting them to bioacoustics remains difficult. We address this challenge with animal2vec, an interpretable large transformer model and a self-supervised training scheme tailored for sparse and unbalanced bioacoustic data. It learns from unlabelled audio and then refines its understanding with labelled data. Furthermore, we introduce and publicly release MeerKAT: Meerkat Kalahari Audio Transcripts, a dataset of meerkat (Suricata suricatta) vocalizations with millisecond-resolution annotations, the largest labelled dataset on a non-human terrestrial mammal currently available. Our model sets a baseline on the MeerKAT corpus, outperforming other transformer models, and improves on existing methods on the publicly available NIPS4Bplus birdsong dataset. Moreover, animal2vec performs well even with limited labelled data (few-shot learning). animal2vec and MeerKAT provide a new reference point for bioacoustic research, enabling scientists to analyse large amounts of data even with scarce ground truth information.
| Additional Metadata | |
|---|---|
| , , , , | |
| doi.org/10.1111/2041-210x.70218 | |
| Methods in Ecology and Evolution | |
| Released under the CC-BY 4.0 (“Attribution 4.0 International”) License | |
| Organisation | Staff publications |
|
J.C. Schäfer‐Zimmermann, V. Demartsev, B. Averly, K.L. Dhanjal‐Adams, M. Duteil, G. Gall, … A. Strandburg‐Peshkin. (2025). animal2vec and MeerKAT: A self‐supervised transformer for rare‐event raw audio input and a large‐scale reference dataset for bioacoustics. Methods in Ecology and Evolution, 2025. doi:10.1111/2041-210x.70218 |
|