Christian Bergler
Deep Learning Applied to Animal Linguistics
Even nowadays, people have only a very limited understanding about animal communication. Scientists are still far from identifying statistically relevant, animal-specific, and recurring linguistic paradigms. However, combined with the associated situation-specific behavioral observations, these patterns represent an indispensable basis for decoding animal communication. In order to derive statistically significant communicative and behavioral hypotheses, sufficiently large audiovisual data volumes are essential covering the animal-specific communicative and behavioral repertoire in a representative, natural, and realistic manner. Hence, passive audiovisual monitoring techniques are increasingly deployed to obtain more natural insights, since the recording is performed in an unobtrusive fashion to minimize disruptive factors and simultaneously maximizing the probability of observing the entire inventory of natural communicative and behavioral paradigms in adequate numbers. Nevertheless, time- and human-resource constraints hamper scientists to efficiently process large-scale noise-heavy data archives, incorporating massive amounts of hidden audiovisual information, to derive an overall and bigger picture about animal linguistics. Thus, in order to perform a deep and detailed data analysis to derive real-world representations, the support of machine-based data-driven algorithms is a fundamental prerequisite. In the scope of this doctoral thesis, a hybrid approach between machine (deep) learning and animal bioacoustics is presented, applying a wide variety of different and novel algorithms to analyze large-scale, noise-heavy, audiovisual, and animal-specific data repositories in order to provide completely new insights into the field of animal linguistics. Due to their complex social, communicative, and cognitive abilities, the largest member of the dolphin family – the killer whale (Orcinus orca) – was chosen as target species and prototype for this study. In northern British Columbia one of the largest animal-specific bioacoustic archives – the Orchive – was acquired by the OrcaLab and used as major data foundation, further extended by additional acoustic and behavioral data material, collected during project-internal fieldwork expeditions along the West Coast of Canada in 2017, 2018, 2019, and 2022. A broad spectrum of publicly available deep learning-based algorithms is presented, originally developed on killer whales, but also transferable to other vocalizing animal species, while addressing the following essential acoustic and image-related biological research questions: (1) signal segmentation – robust, efficient, and fully-automated detection of killer whale sound types, (2) sound denoising – signal enhancement of diverse killer whale vocalizations, (3) call type identification – supervised, semi-supervised, and unsupervised deep architectures to recognize vocal killer whale paradigms, (4) sound type separation – signal segregation of overlapping killer whale vocalizations, (5) individual recognition – image-based deep learning framework to identify killer whale individuals, (6) sound source localization – underwater identification of vocalizing killer whale individuals, (7) signal generation – artificial and representative killer whale signal production, and (8) animal independence – adaption and generalization of developed killer whale-related deep learning concepts to other species-specific bioacoustic data volumes. All the inventive and publicly available machine (deep) learning frameworks demonstrate auspicious results and provide totally unprecedented analysis techniques, facilitating more profound interpretations of massive, animal-specific, and audiovisual data volumes, all together building the imperative foundation to significantly push not only the communicative and behavioral understanding of killer whales, but the entire research field of animal bioacoustics and linguistics.