Dr.-Ing. Tomás Arias Vergara

Dr.-Ing. Tomás Arias Vergara, M. Sc.

Lehrstuhl für Informatik 5 (Mustererkennung)
Chair of Computer Science 5 (Pattern Recognition)

Room: Room 10.134
Martensstr. 3
91058 Erlangen

I received a B.S. in Electronics Engineering from the University of Antioquia (UdeA, Colombia) in 2014, a Master of Science degree at the same institution in 2017, and a Ph.D. in a joint program between the UdeA and the FAU in 2022. Since 2015, my research has focused on speech processing and machine learning methods for the analysis of pathological speech signals resulting from neurological (e.g., Parkinson’s disease), structural (e.g., children with cleft lip and palate), and perceptual (e.g., hearing loss) disorders. I have also investigated the effect of the natural aging process on speech, participated in developing Android-based applications for collecting and analyzing data from Parkinson’s disease patients and adults/children with hearing loss, and performed research on automatic methods for the analysis of high-speed videoendoscopy data of people with voice disorders.

Projects

2024

  • A multimodal approach for automatic generation of radiology reports using chest X-ray images, clinical free-text, and spoken commands.

    (FAU Funds)

    Term: January 15, 2024 - January 14, 2025

    Advancements in Artificial Intelligence (AI) methods have enabled thedevelopment of Large Language Models (LLMs) capable of generating informationfrom user instructions and supporting various tasks in education, research,healthcare, and others. AI has also impacted the field of medical imaging withseveral deep learning models capable of achieving expert-level performanceacross different tasks, e.g., detection, segmentation, and assisted clinicaldiagnosis. In addition, open-source Automatic Speech Recognition (ASR) systemscan be incorporated as modules in AI-based systems. This proposed fundedproject aims to combine LLMs, medical imaging, and speech recognition using AImethods to generate high-quality radiology reports from chest X-ray images.

2017

  • Training Network on Automatic Processing of PAthological Speech

    (Third Party Funds Group – Overall project)

    Term: November 1, 2017 - October 31, 2021
    Funding source: Innovative Training Networks (ITN)
    URL: https://www.tapas-etn-eu.org/

    There are an increasing number of people across Europe with debilitating speech pathologies (e.g., due to stroke, Parkinson's, etc). These groups face communication problems that can lead to social exclusion. They are now being further marginalised by a new wave of speech technology that is increasingly woven into everyday life but which is not robust to atypical speech. TAPAS is a Horizon 2020 Marie Skłodowska-Curie Actions Innovative Training Network European Training Network (MSCA-ITN-ETN) project that aims to transform the well being of these people.
    The TAPAS work programme targets three key research problems:
    (a) Detection: We will develop speech processing techniques for early detection of conditions that impact on speech production. The outcomes will be cheap and non-invasive diagnostic tools that provide early warning of the onset of progressive conditions such as Alzheimer's and Parkinson's.
    (b) Therapy: We will use newly-emerging speech processing techniques to produce automated speech therapy tools. These tools will make therapy more accessible and more individually targeted. Better therapy can increase the chances of recovering intelligible speech after traumatic events such a stroke or oral surgery.
    (c) Assisted Living: We will re-design current speech technology so that it works well for people with speech impairments and also helps in making informed clinical choices. People with speech impairments often have other co-occurring conditions making them reliant on carers. Speech-driven tools for assisted-living are a way to allow such people to live more independently.
    TAPAS adopts an inter-disciplinary and multi-sectorial approach. The consortium includes clinical practitioners, academic researchers and industrial partners, with expertise spanning speech engineering, linguistics and clinical science. All members have expertise in some element of pathological speech. This rich network will train a new generation of 15 researchers, equipping them with the skills and resources necessary for lasting success.

Publications

2024

Journal Articles

Conference Contributions

2023

Authored Books

Journal Articles

Conference Contributions

2022

Authored Books

Journal Articles

Conference Contributions

2021

Journal Articles

Conference Contributions

2020

Journal Articles

Conference Contributions

2019

Book Contributions

Conference Contributions

2018

Journal Articles

Conference Contributions

2017

Authored Books

Conference Contributions

2016

Conference Contributions

Thesis Supervision

Type Title Status
MA thesis Enhancing Lithium-Ion Batteries Safety running
MA thesis Generation of Region-guided Clinical Text Reports from Chest X-Ray Images Using LLMs running
MA thesis Stammering Identification using Large Language Models running
MA thesis Master Thesis – Annotation by Speech in Radiology running
MA thesis Investigating Liquidity Forecasting with Point-Based and Probabilistic Models to Enhance Financial Business Operations running
MA thesis Enhancing SBOM Creation with Large Language Models running
MA thesis Normalization of Sensor and Smartphone Gait Signals of Parkinson’s Disease Patients Using Deep Learning running
MA thesis Signal-Specific Fault Detection in Vehicle Controller Area Network using Deep Learning running
MA thesis Distillation Knowledge of Large Language Models for Automotive HMI Applications running
BA thesis Automatic Speech Recognition at Phoneme and Word-Level To Analyze Parkinson’s Disease running
MA thesis Evaluating the Impact of Acoustic Conditions on Pathological Speech Data Analysis running
MA thesis Large Language Models for Knowledge Management in Engineering Projects running
MA thesis Identification of failure detection patterns in log files of Computer Tomography systems running
Project TSI Challenge Summer 2024: Heat & Water Demand Forecasting finished
MA thesis Text Generation in Alzheimer’s Disease running
MA thesis Improving Text Summarization through Guided Decoding of Language Models running
MA thesis Spoken Language Identification for Hearing Aids finished
MA thesis Understanding Odor Descriptors through Advanced NLP Models and Semantic Scores finished
Project Generation of Clinical Text Reports from Chest X-Ray Images finished
MA thesis Cross-Dataset Phonological Speech Analysis of Children with Cleft Lip and Palate finished
Project Automatic recognition of bavarian dialects finished
MA thesis Large Language Model for Generation of Structured Medical Report from X-ray Transcriptions finished
MA thesis Natural Language Text Generation for Symbolic Descriptions Using Language Models finished
MA thesis Development of a deep learning approach to detect faulty axial bearing components after assembly using acoustic signals finished
MA thesis Edge-AI: Self-sensing backpressure estimation in piezoelectric micropumps using machine learning methods on a limited hardware finished
BA thesis CoachLea: An Android Application to evaluate the progress of speaking and hearing abilities of children with Cochlear Implant finished
BA thesis CITA: An Android-based Application to Evaluate the Speech of Cochlear Implant Users finished