Name: Speech and Language Group Colloquium
Start: 2025-08-06
End: 2025-08-06
Location: Virtual

Our research focuses on modeling speech and language patterns using machine and deep learning methods. We develop spoken dialogue systems, enhance speech, and process out-of-vocabulary words. We analyze prosodic features such as accents and phrase boundaries, and automatically recognize emotion-related states using multi-modal data, including facial expressions, gestures, and physiological parameters. We also recognize user focus in human-machine interactions and analyze pathological speech from children with cleft lip and palate or patients with speech and language disorders. Additionally, our work extends to analyzing animal speech (e.g., such as the one from orcas) aiming to interpret communication patterns in zoos and the wild. For the natural language processing field, we develop and apply methods like Large Language Models (LLMs), topic modeling, and part-of-speech tagging, with applications in both medical and industrial domains. We also leverage LLMs and deep learning for advanced speech and language understanding, addressing ethical AI, text summarization, and question/answering systems.

Projects

A multimodal approach for automatic generation of radiology reports using chest X-ray images, clinical free-text, and spoken commands.

Advancements in Artificial Intelligence (AI) methods have enabled the development of Large Language Models (LLMs) capable of generating information from user instructions and supporting various tasks in education, research, healthcare, and others. AI has also impacted the field of medical imaging with several deep learning models capable of achieving expert-level performance across different tasks, e.g., detection, segmentation, and assisted clinical diagnosis. In addition, open-source Automatic Speech Recognition (ASR) systems can be incorporated as modules in AI-based systems. This proposed funded project aims to combine LLMs, medical imaging, and speech recognition using AI methods to generate high-quality radiology reports from chest X-ray images.

→ More information

TAPAS: Training Network on Automatic Processing of PAthological Speech

There are an increasing number of people across Europe with debilitating speech pathologies (e.g., due to stroke, Parkinson's, etc). These groups face communication problems that can lead to social exclusion. They are now being further marginalised by a new wave of speech technology that is increasingly woven into everyday life but which is not robust to atypical speech. TAPAS is a Horizon 2020 Marie Skłodowska-Curie Actions Innovative Training Network European Training Network (MSCA-ITN-ETN) project that aims to transform the well being of these people.

TAPAS adopts an inter-disciplinary and multi-sectorial approach. The consortium includes clinical practitioners, academic researchers and industrial partners, with expertise spanning speech engineering, linguistics and clinical science. All members have expertise in some element of pathological speech. This rich network will train a new generation of 15 researchers, equipping them with the skills and resources necessary for lasting success.

→ More information

DysarTrain: Development of a digital therapy tool as an exercise supplement for speech disorders and facial paralysis

Dysarthrien sind neurologisch bedingte, erworbene Störungen des Sprechens. Dabei sind vor allem die Koordination und Ausführung der Sprechbewegungen, aber auch die Mimik betroffen. Besonders häufig tritt eine Dysarthrie nach einem Schlaganfall, Schädel-Hirn-Trauma oder bei neurologischen Erkrankungen wie Parkinson auf.

Ähnlich wie in allen Sprechtherapien erfordert auch die Behandlung der Dysarthrie ein intensives Training. Anhaltende Effekte der Dysarthrie-Therapie stellen sich deshalb nur …

→ More information

Modelling the progression of neurological diseases

Develop speech technology that can allow unobtrusive monitoring of many kinds of neurological diseases. The state of a patient can degrade slowly between medical check-ups. We want to track the state of a patient unobtrusively without the feeling of constant supervision. At the same time the privacy of the patient has to be respected. We will concentrate on PD and thus on acoustic cues of changes. The algorithms should run on a smartphone, track acoustic changes during regular phone conversations…

→ More information

DeepAL: Deep Learning Applied to Animal Linguistics

Deep Learning applied to animal linguistics in particular the analysis of underwater audio recordings of marine animals (killer whales):

The project includes the automatic segmentation of killer whale signals in noise-heavy and large underwater bioacoustic archives as well as a subsequent call type identification/classification in order to derive linguistic elements/patterns. In combination with the recorded situational video footage those patterns should help to decode the killer whale language.

→ More information

Deep Learning based Noise Reduction for Hearing Aids

Reduction of unwanted environmental noises is an important feature of today’s hearing aids, which is why noise reduction is nowadays included in almost every commercially available device. The majority of these algorithms, however, is restricted to the reduction of stationary noises. Due to the large number of different background noises in daily situations, it is hard to heuristically cover the complete solution space of noise reduction schemes. Deep learning-based algorithms pose a possible so…

→ More information

Speech-Driven Generative AI for Personalized Cine-MRI Sequence Generation in Clinical Speech Therapy

Generative AI has revolutionized the ability to simulate and generate information from learned distributions, with impactful applications in fields such as natural language processing and computer vision. This project will utilize generative models for two key medical applications: (1) simulation of patient speech stimuli: AI will simulate vocal tract movements of patients undergoing therapy for tongue cancer or cleft lip and palate conditions.

→ More information

Participating Scientists

Colloquium time table

06 Aug

3:00 pm - 5:00 pm

Speech and Language Group Colloquium

Wednesday, Virtual

EVENT DETAIL

08 Aug

10:30 am - 12:30 pm

Animal Speech Group Colloquium

Friday, Virtual

EVENT DETAIL

No event found!

Publications

Schröter H., Rosenkranz T., Escalante-B AN., Maier A.:
DeepFilterNet: Perceptually Motivated Real-Time Speech Enhancement
INTERSPEECH (Dublin, Ireland, August 20, 2023 - August 24, 2023)
In: INTERSPEECH 2023 2023
Open Access: https://arxiv.org/abs/2305.08227
BibTeX: Download
Perez Toro PA., Arias Vergara T., Braun F., Hönig F., Tobón-Quintero CA., Aguillón D., Lopera F., Hincapié-Henao L., Schuster M., Riedhammer K., Maier A., Nöth E., Orozco Arroyave JR.:
Automatic Assessment of Alzheimer's across Three Languages Using Speech and Language Features
24th International Speech Communication Association, Interspeech 2023 (Dublin, IRL, August 20, 2023 - August 24, 2023)
In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2023
DOI: 10.21437/Interspeech.2023-2079
BibTeX: Download
Hauer C., Nöth E., Barnhill A., Maier A., Guthunz J., Hofer H., Cheng RX., Barth V., Bergler C.:
ORCA-SPY: Killer Whale Sound Source Simulation and Detection, Classification and Localization in PAMGuard Utilizing Integrated Deep Learning Based Segmentation
In: Scientific Reports UNDER REVIEW (2023)
ISSN: 2045-2322
BibTeX: Download

Next Events:

06 Aug

3:00 pm - 5:00 pm

Speech and Language Group Colloquium

Wednesday, Virtual

EVENT DETAIL

08 Aug

10:30 am - 12:30 pm

Animal Speech Group Colloquium

Friday, Virtual

EVENT DETAIL

No event found!

Speech Processing and Language Understanding

Projects Projects

A multimodal approach for automatic generation of radiology reports using chest X-ray images, clinical free-text, and spoken commands. A multimodal approach for automatic generation of radiology reports using chest X-ray images, clinical free-text, and spoken commands.

TAPAS: Training Network on Automatic Processing of PAthological Speech TAPAS: Training Network on Automatic Processing of PAthological Speech

DysarTrain: Development of a digital therapy tool as an exercise supplement for speech disorders and facial paralysis DysarTrain: Development of a digital therapy tool as an exercise supplement for speech disorders and facial paralysis

Modelling the progression of neurological diseases Modelling the progression of neurological diseases

DeepAL: Deep Learning Applied to Animal Linguistics DeepAL: Deep Learning Applied to Animal Linguistics

Deep Learning based Noise Reduction for Hearing Aids Deep Learning based Noise Reduction for Hearing Aids

Speech-Driven Generative AI for Personalized Cine-MRI Sequence Generation in Clinical Speech Therapy Speech-Driven Generative AI for Personalized Cine-MRI Sequence Generation in Clinical Speech Therapy

Participating Scientists Participating Scientists

Colloquium time table Colloquium time table

Publications Publications

Next Events:

Office hours

Projects

A multimodal approach for automatic generation of radiology reports using chest X-ray images, clinical free-text, and spoken commands.

TAPAS: Training Network on Automatic Processing of PAthological Speech

DysarTrain: Development of a digital therapy tool as an exercise supplement for speech disorders and facial paralysis

Modelling the progression of neurological diseases

DeepAL: Deep Learning Applied to Animal Linguistics

Deep Learning based Noise Reduction for Hearing Aids

Speech-Driven Generative AI for Personalized Cine-MRI Sequence Generation in Clinical Speech Therapy

Participating Scientists

Colloquium time table

Publications