Index

Modelling context transitions in picture descriptions of Alzheimer patients

Deep keyword recognition in speech exercises for aphasia patients

Solar Cell Aging Prediction using Deep Learning Image2Image Translation

Cracks in solar cells, caused in production or assembly, can considerably affect the degradation of a cell in the field [1]. Predicting the impact of these cracks improves quality control and helps to cope with degradation throughout the lifetime of a solar cell.

Although information about photovoltaic module degradation has been available since the early 1970s, predictions for different types of degradation are still poorly studied [2].

This thesis aims at developing a new approach to predict the aging of solar cells using Deep Learning, given an initial electroluminescense (EL) measurement of the latter. The data used in this thesis consists of 2 measurements of 94 modules at different points in time. We will

  1. use this dataset to train an unpaired Image2Image approach (e.g. CycleGAN [3]) to assess, if the network is capable of learning the relationship between initial and aged measurements from data, using unpaired datasets only
  2. extend the approach in I. to incorporate the additional information available from using pairs of initial and aged measurements.

Since the initial and aged measurements are not registered exactly, we aim to design a custom loss function in 2. that is invariant to small registration mismatches, but enforces consistency between cracks in generated and real aged measurements. We want to assess, if the weakly supervised crack segmentation by Mayr et al. [5] can be used for that purpose. To this end, we plan to enforce consistency between the coarse segmentation maps of real and aged measurements. This can be seen as an extension to the common combination of adversarial loss with L1/L2 distance between fake and real target image [3].

The purpose of the L1/L2 distance in CycleGAN can be seen as enforcing consistency between input and output of the generative network. Since our custom loss compares the generated fake cell to the real aged cell, the input/output consistency for the generative network can possibly be ensured without taking into account the L1/L2 distance between generator input and output. Apart from combining the two common losses with our custom loss, we therefore want to evaluate whether we can get better results by replacing the L1/L2 distance altogether.

A prototype of this network will be realized in Pytorch, based on implementations of [3] and [4].

References:

[1] Quintana, M.A., King, D.L., McMahon, T.J., Osterwald, C.R., 2002. Commonly observed degradation in field-aged photovoltaic modules. In: Proc. 29th IEEE Photovoltaic Specialists Conference, pp. 1436–1439.
[2] Ndiaye, A., Charki, A., Kobi, A., Kébé, C.M., Ndiaye, P.A. and Sambou, V., 2013. Degradations of silicon photovoltaic modules: A literature review. Solar Energy, 96, pp.140-151.
[3] Zhu, Jun-Yan, Taesung Park, Phillip Isola, and Alexei A. Efros. “Unpaired image-to-image translation using cycle-consistent adversarial networks.” In Proceedings of the IEEE international conference on computer vision, pp. 2223-2232. 2017.
[4] Mayr, Martin, Mathis Hoffmann, Andreas Maier, and Vincent Christlein. “Weakly Supervised Segmentation of Cracks on Solar Cells Using Normalized L p Norm.” In 2019 IEEE International Conference on Image Processing (ICIP), pp. 1885-1889. IEEE, 2019.

Predicting Hearing Aid Fittings Based on Audiometric and Subject-Related Data: A Machine Learning Approach

Hearing aids (HA) are configured to the wearer’s individual needs, which might vary greatly from user to user. Currently, it is common practice, that the initial HA gain settings are based on generic fitting formulas that link a user’s pure-tone hearing threshold to amplification characteristics. Subsequently, a time-consuming fine-tuning process follows, in which a hearing care professional (HCP) adjusts the HA settings to the user’s individual demands. An advanced, more personalized gain prescription procedure could support HCPs by reducing fine-tuning effort and facilitate over-the-counter HAs. We propose a machine learning based prediction for HA gain to minimize subsequent fine-tuning effort. The data-driven approach takes audiometic and personal variables into account, such as age, gender, and the user’s acoustical environment.

A random forest regression model was trained on real-world HA fittings from the Connexx database (fitting software provided by Sivantos GmbH). Three months of data from Connexx version 9.1.0.364 were used. A data cleaning framework was implemented to extract a representative data set based on a list of machine learning and audiological criteria. These criteria include, for instance, using only ‘informative’ HCPs who perform fine-tuning for at least some patients. Furthermore, ‘informative’ HCPs are those who perform diagnostics beyond air conduction audiograms, use new technologies and special features. The resulting training data comprised 20,000 HA fittings and a 10-fold cross validation was used to train the random forest.

Deep Learning-based Spectral Noise Reduction for Hearing Aids

The great success of deep learning-based noise reduction algorithms makes it desirable to also use them for hearing aid applications. However, neural networks are both computationally intensive and memory intensive, making it challenging to deploy on an embedded system with limited hardware resources. Thus, in this work, we propose an efficient deep learning-based noise reduction method for hearing aid applications. Compared to a previous study, where a fully-connected neural network was used to estimate Wiener filter gains, we focus on using Recurrent Neural Networks (RNNs). Additionally, convolutional layers were integrated. The neural networks were trained to predict real-valued Wiener filter gains to denoise the noisy speech spectrum. Normalizing the input of the neural network is essential. Therefore, various normalization methods were analyzed, allowing low-cost real-time processing. The presented methods were tested and evaluated on unseen noise and speech data. In comparison to the previous study, the computational complexity and the memory requirements of the neural network were reduced by a factor of more than 400, the complexity of the normalization method by a factor of over 200, while even reaching a higher denoising quality.

Multi-Task Learning for Speech Enhancement and Phoneme Recognition

For speech intelligibility, consonants have a fundamental importance. Unfortunately, when reducing noise in speech, consonants are often also degraded while vocals are easier to preserve/enhance. To improve the detection and enhancement of consonants, we want to use multi-task learning to reduce the noise in the signal and furthermore detect phonemes (smallest acoustic unit in speech).

Development of a deep learning-based phoneme recognizer for noisy speech

For speech intelligibility, consonants have a fundamental importance. Thus, the assumption can be made that automatic phoneme and especially consonant recognition correlates well with human speech intelligibility.
In noisy environments however, speech and especially consonants may be degraded.

In this project, we want to study the effect of noise on speech intelligibility. Therefore, we train a neural network to recognize phoneme based on the TIMIT dataset. We will add diffent noise types and noise levels to the speech signal and study the effect on the recognition rate.

This project requires no preliminary knowledge in deep learning, although may be beneficial. Basic signal processing concepts like sampling theorem and FFT should be present.

Development of a pre-processing/simulation Framework for Multi-Channel Audio Signals

The goal of this thesis is to develop a framework that simulates multi-channel audio signals in a 2D/3D environment for hearing aids. For this purpose, existing head related transfer functions (HRTFs) will be used to simulate direction and hearing aid microphone characteristics. Furthermore, source movement as well as microphone movement and rotation will be implemented. The latter is mandatory for hearing aids, since especially head rotation might change the relative direction of the different sources significantly. The framework will be able to simulate multiple speakers as well as multiple noise sources. To calculate a clean speech target, a provided reference beamformer will be used on the target speech only, neglecting noise and non target speakers. Optionally, an opening angle that defines the target directions can be used to extract the clean speech targets. As a second optional aspect, room environment including absorption and reverberation will be simulated. Therefore, a reference implementation can be used.

Weakly-supervised segmentation of defects on solar cells

Using Deep Learning for segmentation of localized defects on SiC wafers