Killer Whale Sound Source Localization Using Deep Learning

Type: MA thesis

Status: finished

Date: May 1, 2021 - November 2, 2021

Supervisors: Elmar Nöth,

1 Introduction
Sound source localization (SSL) is not necessarily a new eld and much has been done in the analytical
domain using multiple microphones and utilizing the distance between the microphones as well as the Time
Delay of Arrival (TDOA) to extract position information [1]. The advances in recent years of machine
learning and deep learning techniques as well as the increasing availability of powerful hardware have opened
up new pathways to solving SSL and Sound Event Detection (SED) tasks. These methods are of particular
interest due to their reported robustness when it comes to dealing with noise as well as their performance
in comparison to conventional methods [2]. Most uses of SSL seem to be involved in human tracking and
relatively little has been done with focus on other animals and even less has been done in nature as opposed
to a closed room. This project aims to utilize deep learning SSL methods to locate orcas by using calls
received by an 8-microphone array being pulled by a boat as presented in [3].
2 Problem Description
The localization of orcas based on their emitted calls presents several problems, not the least of which is
determining the actual distance of the orcas from a certain position. While several methods such as [4] can
accurately locate the position of a sound source in relation to a unit sphere around the microphones, depth
perception presents another problem. This problem is compounded by the fact that the assumption simply
cannot be made that two di erent orcas produce calls at the same amplitude. Therefore an accurate depth
estimation can only be made with a varying degree of certainty.
This audio depth perception problem is further compounded by the fact that sound waves travel at
di erent speeds in water depending on certain circumstances like temperature, salinity, and other objects
present in the water, which, when one is in nature, is a variable which is often completely out of the control of
the observer [5]. In addition to the environmental problems lending to the diculty of audio depth perception
a further issue is encountered when dealing with orcas in that the animals can be several hundred meters
away and the produced calls will still reach the microphones with enough intensity to be registered as a call
of interest which increases the viable range and adds more uncertainty.
An additional problem ecountered in the eld is while it can be easy to see an orca if they happen to
be near the surface, it is another issue entirely to associate the produced calls of a particular animal to the
individual which created these calls.
Finally what is necessary in an environment of observation is a tool which can quickly and accurately
associate the produced calls of an animal with the location of production. If this assignment is not quick or
accurate enough it can easily be the case that the orca has since moved on and the location information is
no longer necessary nor accurate.
Master’s Thesis Proposal
A Deep Learning Toolkit for Killer Whale Localization Based On Emitted Calls Alexander Barnhill
3 Goals
The goal of this project is then to develop a toolkit which functions in conjunction with the ORCA-SPOT
toolkit; that is, when ORCA-SPOT has determined that a call has been produced and this call is of interest,
the location of the call producer will be determined quickly enough to be used during active observation.
This means that the location information should be produced quickly enough to tell researchers with enough
accuracy both in time and location to be able to say with relative certainty win which direction the orcas
being observed are currently located.
In addition to this the distance of the animal should be determined as accurately as possible by gathering
enough information to say with relative certainty, depending on the intensity of the received signal, how far
removed the producer of the signal is from the point of reception.
In addition to this, time permitting, the tool should also allow for a ner analysis of received calls, possibly
including the possibility to associate the received calls with particular animals.
4 Project Plan
In order to accomplish these goals a multi-step approach will be undertaken. First simulated data from
PAMGuard will be taken and processed in order to simulate orcas at random positions. The bene t here is
that real orca samples can be used but the dataset can be expanded as much as desired in order to give not
only a large but also a representative dataset containing a wide variety of signals, positions, and amplitudes.
This dataset will be further processed by adding varying amounts of noise to the samples in order to increase
the robustness of the toolkit.
After the dataset is produced various network architectures will be tried in order to:
a) Produce a network which reliably and accurately determines the source of the sample and
b) Is small enough to quickly provide information at inference time in order to enable the localization of
orcas in real time.
During experimentation with architectures methods will also be tested to estimate distance of the orcas
including attempting to segment the distance measurements into discrete areas and then applying some
Gaussian model to these areas in the hopes of achieving a reasonable estimate of the distance of the orca
based on training data as well as amplitude of the received signal.
This toolkit will then be integrated with ORCA-SPOT to continuously accept samples from the ORCASPOT
toolkit with the goal of then localizing samples which have been deemed interesting by ORCA-SPOT.
The toolkit will have to process samples containing varying amounts of signals and samples in which the
interesting signal occurs at varying points within the sample.
4.1 Proposed Schedule
1. Preparation and generation of data for training: 2 weeks
2. Investigation of architectuers with respect to performance including experimentation and testing: 2
months
3. Analysis and implementation of depth estimation: 1 Month
4. Integration with ORCA-SPOT: 1 Month
5. Summary of results: 1 month
Master’s Thesis Proposal
A Deep Learning Toolkit for Killer Whale Localization Based On Emitted Calls Alexander Barnhill
References
[1] X. Bian, Gregory D. Abowd, and James M. Rehg. Using sound source localization to monitor and infer
activities in the home. 2004.
[2] Nelson Yalta, Kazuhiro Nakadai, and Tetsuya Ogata. Sound source localization using deep learning
models. Journal of Robotics and Mechatronics, 29:37{48, 02 2017.
[3] Christian Bergler, Hendrik Schroter, Rachael Xi Cheng, Volker Barth, Michael Weber, Elmar Noth,
Heribert Hofer, and Andreas Maier. Orca-spot: An automatic killer whale sound detection toolkit using
deep learning. Scienti c Reports, 9(1):10997, 2019.
[4] Sharath Adavanne, Archontis Politis, Joonas Nikunen, and Tuomas Virtanen. Sound event localization
and detection of overlapping sources using convolutional recurrent neural networks. CoRR,
abs/1807.00129, 2018.
[5] Jens Blauert. Spatial hearing: the psychophysics of human sound localization. 01 2001.