Efficient Speech Detection in Environmental Audio Using Acoustic Recognition and Knowledge Distillation

Priebe, Drew; Ghani, Burooj; Stowell, Dan

doi:10.3390/s24072046

Priebe, Drew, Ghani, Burooj and D. Stowell (Dan)

2024-03-22

Efficient Speech Detection in Environmental Audio Using Acoustic Recognition and Knowledge Distillation

Sensors , Volume 24 - Issue 7 p. 2046- 2046

The ongoing biodiversity crisis, driven by factors such as land-use change and global warming, emphasizes the need for effective ecological monitoring methods. Acoustic monitoring of biodiversity has emerged as an important monitoring tool. Detecting human voices in soundscape monitoring projects is useful both for analyzing human disturbance and for privacy filtering. Despite significant strides in deep learning in recent years, the deployment of large neural networks on compact devices poses challenges due to memory and latency constraints. Our approach focuses on leveraging knowledge distillation techniques to design efficient, lightweight student models for speech detection in bioacoustics. In particular, we employed the MobileNetV3-Small-Pi model to create compact yet effective student architectures to compare against the larger EcoVADteacher model, a well-regarded voice detection architecture in eco-acoustic monitoring. The comparative analysis included examining various configurations of the MobileNetV3-Small-Pi-derived student models to identify optimal performance. Additionally, a thorough evaluation of different distillation techniques was conducted to ascertain the most effective method for model selection. Our findings revealed that the distilled models exhibited comparable performance to the EcoVAD teacher model, indicating a promising approach to overcoming computational barriers for real-time ecological monitoring.

Additional Metadata
Keywords	passive acoustic monitoring, eco-acoustics, deep learning, knowledge distillation, bioacoustics, classification, transfer learning, speech detection
Persistent URL	doi.org/10.3390/s24072046
Journal	Sensors
Rights	Released under the CC-BY 4.0 ("Attribution 4.0 International") License
Organisation	Staff publications
Citation APA Style AAA Style APA Style Cell Style Chicago Style Harvard Style IEEE Style MLA Style Nature Style Vancouver Style American-Institute-of-Physics Style Council-of-Science-Editors Style BibTex Format Endnote Format RIS Format CSL Format DOIs only Format	Priebe, Drew, Ghani, Burooj, & Stowell, D. (2024). Efficient Speech Detection in Environmental Audio Using Acoustic Recognition and Knowledge Distillation. Sensors, 24(7), 2046–2046. doi:10.3390/s24072046

View at Publisher

Free Full Text ( Final Version , 385kb )

Efficient Speech Detection in Environmental Audio Using Acoustic Recognition and Knowledge Distillation

Publication

Publication

About

Contact

Efficient Speech Detection in Environmental Audio Using Acoustic Recognition and Knowledge Distillation

Publication

Publication

Workflow

Workflow

Add Content