Index
Comparative Analysis of Different Deep Learning Models for Whole Body Segmentation
Federated Learning for 3D camera-based weight & height estimation
Ausschreibung_BA_Internship_FederatedLearning_final_FAUProject SENSATION: Sidewalk Environment Detection System for Assistive NavigaTION
In the project entitled Sidewalk Environment Detection System for Assistive NavigaTION (hereinafter referred to as SENSATION), our research team is meticulously advancing the development of the components of SENSATION. The primary objective of this venture is to enhance the mobility capabilities of blind or visually impaired persons (BVIPs) by ensuring safer and more efficient navigation on pedestrian pathways.
For the implementation phase, a specialized prototype was engineered: a chest-bag equipped with an NVIDIA Jetson Nano, serving as the core computational unit. This device integrates a several sensors including, but not limited to, tactile feedback mechanisms (vibration motors) for direction indication, optical sensors (webcam) for environmental data acquisition, wireless communication modules (Wi-Fi antenna) for internet connectivity, and geospatial positioning units (GPS sensors) for real-time location tracking.
Despite the promising preliminary design of the prototype, several technical challenges remain that demand investigation. These challenges are described as follows:
Sidewalk segmentation for direction estimation
To determine the location of a BVIP on the pedestrian pathway, it is imperative for our algorithms to achieve optimal segmentation of the sidewalk. To facilitate this, we continuously refine our proprietary dataset tailored to sidewalk segmentation. We are also exploring a variety of Deep Learning methodologies to enhance the accuracy of this segmentation. The primary objective in this topic is to refine our sidewalk segmentation pipeline and to comprehensively evaluate its performance using metrics such as Mean Intersection over Union (IOU), and precision metrics for both sidewalks and roads. Additionally, we employ Active Learning techniques to further analyze our dataset, aiming to gain a deeper insight into its characteristics.
Distance estimation for obstacles avoidance
To convey information to a BVIP regarding the presence of an impediment on the pedestrian pathway, obstacles such as bicycles, e-scooters, or automobiles must initially be identified via image segmentation techniques. After this identification, it is crucial to determine the distance from these detected objects. The SENSATION system employs a monocular camera to capture the surrounding environmental details of the pathway. In this domain of research, we are studying various algorithms tailored for depth estimation to determine the proximity to these impediments. The calculated distances are then conveyed to the BVIP through either tactile or auditory feedback mechanisms. A prominent challenge in this work lies in achieving precise distance measurements, particularly given the constraints of solely utilizing information from a monocular camera.
Drift correction to improve orientation of a BVIP
While navigating pedestrian pathways, it is occasionally observed that a BVIP may lose orientation with respect to the sidewalk. Addressing this, it is essential to devise a detection system capable of promptly identifying a BVIP’s deviation from the intended sidewalk. For the detection of such drifts, we employ Deep Learning algorithms that leverage optical flow or depth maps. The primary objective in this topic is to conceptualize and develop a drift correction mechanism utilizing either optical flow or depth maps to enhance a BVIP’s sidewalk orientation.
Environmental information by image captioning
To augment a BVIP’s comprehension of their surrounding environment, descriptive captions derived from environmental observations are beneficial. Examples of such captions include: “Traffic light located on your right,” “Staircase descending with a total of 5 steps,” and “Vehicle parked obstructing the sidewalk.”
In this topic, we are examining Deep Learning algorithms that possess the capacity to generate such descriptive annotations. Concurrently, we are refining our caption generation pipeline to ascertain the spectrum of captions that can be formulated to enhance the mobility and spatial understanding of a BVIP.
If you are interested in one of the above topics, please send your request to: hakan.calim@fau.de
For the development of the solutions, it will be beneficial to have experience with implementing neural networks in python with Pytorch or Tensorflow.
Large Language Model for Generation of Structured Medical Report from X-ray Transcriptions
Motivation
Large language models (LLMs) have found applications in natural language processing. In recent years, LLMs have exhibited significant advancements in abstractive question answering, enabling them to understand questions in context, akin to humans, and generate contextually appropriate answers rather than relying solely on exact word matches. This potential has extended to the field of medicine, where LLMs can play a crucial role in generating well-structured medical reports. Achieving this goal necessitates meticulous fine-tuning of LLMs. Abstractive question answering, often referred to as generative question answering, albeit with constraints on word count, leverages techniques such as beam search to generate answers. Ideally, the language model should possess few-shot learning capabilities for downstream tasks. The goal is to generate a structured medical report based on the medical diagnosis from X-ray images.
Background
The dataset comprises two columns: standard reports and structured reports. The model’s objective is to generate structured reports based on standard context. Leading transformer models, such as Roberta( [1]), Bart( [2]), XLnet( [3]), and T5( [4]), excel in generative (abstractive) question answering across multiple languages. These models offer various configurations based on different parameters, each with unique strengths. Some excel in downstream tasks through zero-shot learning or few-shot learning. For instance, models like Flan T5 can effectively handle 1,000 additional downstream tasks. Therefore, fine-tuning these models on a specialized sinusitis dataset is essential. The core pipeline for processing sentences within a transformer model includes positional encoding, multi-head attention for calculating attention scores with respect to other parts of the sentence, residual connections, normalization layers, and feed-forward layers. Practical implementations of these models and tokenizers are readily accessible through the Hugging Face hub. Model accuracy can also be improved using ensemble methods.
Research Objective
In summary, this research aims to automatically convert medical diagnoses from X-ray transcriptions into structured reports using LLMs. The aims of this project are:
- Use data augmentation techniques to finetune pre-trained LLMs with low-resource data.
- Investigate the suitability of different LLMs, e.g., T5, to create structured medical reports.
- Evaluate the proposed approach with open-source radiology reports.
References
[1] S. Ravichandiran, Getting Started with Google BERT: Build and train state-of-the-art natural language processing models using BERT. Packt Publishing Ltd., 2021.
[2] M. Lewis, Y. Liu, N. Goyal, M. Ghazvininejad, A. Mohamed, O. Levy, V. Stoyanov, and L. Zettlemoyer, “Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension,” arXiv preprint arXiv:1910.13461, 2019.
[3] Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. Salakhutdinov, and Q. V. Le, “Xlnet: Generalized autoregressive pretraining for language understanding,” arXiv preprint arXiv:1906.08237, 2019.
[4] C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, and P. J. Liu, “Exploring the limits of transfer learning with a unified text-to-text transformer,” arXiv preprint arXiv:1910.10683, 2019.
Style Transfer of High-resolution Photos to Artworks
Analysis of Recorded Single-Channel Patch-Clamp Timeseries Using Neural Networks Trained with Simulated Data
Investigating the benefits of combining CNNs and transformer architecture for rail domain perception task
ThesisDescriptionEye Tracking and Pupillometry for Cognitive Load Estimation in Tele-Robotic Surgery
Inferring the cognitive load of a surgeon during robotic surgery is important to
ensure safe and effective outcomes for patients, as high cognitive load can can
lead to errors and impact performance in robot command. This information
about cognitive load can be used in training to improve user skill.
One approach to estimate the cognitive load is, to utilize eye gaze and pupillometry
measurements, which have already been demonstrated as a potential
solution to this problem. As is has been shown, that the pupil diameter is related
to the task difficulty [1–3].
In the scope of this work, eye gaze and pupillometry measurements and tool
information will be used to infer user skill and proficiency in robot command.
Therefore, the eyetracker must be calibrated to the da Vinci robot vision pipeline
with a SPAAM-type of calibration [4, 5], and tool tracking methods in robotic
surgery must be developed.
References:
[1] Andrew T. Duchowski, Krzysztof Krejtz, Nina A. Gehrer, Tanya Bafna, and
Per Bækgaard. The low/high index of pupillary activity. In Proceedings of
the 2020 CHI Conference on Human Factors in Computing Systems, CHI
’20, page 1–12, New York, NY, USA, 2020. Association for Computing Machinery.
[2] Andrew T. Duchowski, Krzysztof Krejtz, Izabela Krejtz, Cezary Biele, Anna
Niedzielska, Peter Kiefer, Martin Raubal, and Ioannis Giannopoulos. The
index of pupillary activity: Measuring cognitive load vis-`a-vis task difficulty
with pupil oscillation. In Proceedings of the 2018 CHI Conference on Human
Factors in Computing Systems, CHI ’18, page 1–13, New York, NY, USA,
2018. Association for Computing Machinery.
[3] Krzysztof Krejtz, Andrew T. Duchowski, Anna Niedzielska, Cezary Biele,
and Izabela Krejtz. Eye tracking cognitive load using pupil diameter and
microsaccades with fixed gaze. PLOS ONE, 13(9):1–23, 09 2018.
[4] Kenneth R. Moser, Mohammed Safayet Arefin, and J. Edward Swan. Impact
of alignment point distance and posture on spaam calibration of optical seethrough
head-mounted displays. In 2018 IEEE International Symposium on
Mixed and Augmented Reality (ISMAR), pages 21–30, 2018.
[5] Mihran Tuceryan, Yakup Genc, and Nassir Navab. Single-Point Active
Alignment Method (SPAAM) for Optical See-Through HMD Calibration
for Augmented Reality. Presence: Teleoperators and Virtual Environments,
11(3):259–276, 06 2002.