Index
Dilemma Zone Prediction with Floating Car Data using Machine Learning Approaches
Multipath detection in GNSS signals measured in a position sensor using a pattern recognition approach with neural networks
Comparative Analysis of Different Deep Learning Models for Whole Body Segmentation
Large Language Model for Generation of Structured Medical Report from X-ray Transcriptions
Motivation
Large language models (LLMs) have found applications in natural language processing. In recent years, LLMs have exhibited significant advancements in abstractive question answering, enabling them to understand questions in context, akin to humans, and generate contextually appropriate answers rather than relying solely on exact word matches. This potential has extended to the field of medicine, where LLMs can play a crucial role in generating well-structured medical reports. Achieving this goal necessitates meticulous fine-tuning of LLMs. Abstractive question answering, often referred to as generative question answering, albeit with constraints on word count, leverages techniques such as beam search to generate answers. Ideally, the language model should possess few-shot learning capabilities for downstream tasks. The goal is to generate a structured medical report based on the medical diagnosis from X-ray images.
Background
The dataset comprises two columns: standard reports and structured reports. The model’s objective is to generate structured reports based on standard context. Leading transformer models, such as Roberta( [1]), Bart( [2]), XLnet( [3]), and T5( [4]), excel in generative (abstractive) question answering across multiple languages. These models offer various configurations based on different parameters, each with unique strengths. Some excel in downstream tasks through zero-shot learning or few-shot learning. For instance, models like Flan T5 can effectively handle 1,000 additional downstream tasks. Therefore, fine-tuning these models on a specialized sinusitis dataset is essential. The core pipeline for processing sentences within a transformer model includes positional encoding, multi-head attention for calculating attention scores with respect to other parts of the sentence, residual connections, normalization layers, and feed-forward layers. Practical implementations of these models and tokenizers are readily accessible through the Hugging Face hub. Model accuracy can also be improved using ensemble methods.
Research Objective
In summary, this research aims to automatically convert medical diagnoses from X-ray transcriptions into structured reports using LLMs. The aims of this project are:
- Use data augmentation techniques to finetune pre-trained LLMs with low-resource data.
- Investigate the suitability of different LLMs, e.g., T5, to create structured medical reports.
- Evaluate the proposed approach with open-source radiology reports.
References
[1] S. Ravichandiran, Getting Started with Google BERT: Build and train state-of-the-art natural language processing models using BERT. Packt Publishing Ltd., 2021.
[2] M. Lewis, Y. Liu, N. Goyal, M. Ghazvininejad, A. Mohamed, O. Levy, V. Stoyanov, and L. Zettlemoyer, “Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension,” arXiv preprint arXiv:1910.13461, 2019.
[3] Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. Salakhutdinov, and Q. V. Le, “Xlnet: Generalized autoregressive pretraining for language understanding,” arXiv preprint arXiv:1906.08237, 2019.
[4] C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, and P. J. Liu, “Exploring the limits of transfer learning with a unified text-to-text transformer,” arXiv preprint arXiv:1910.10683, 2019.
Style Transfer of High-resolution Photos to Artworks
Aim of the thesis is the generation of artistic images, such as paintings, prints or drawings, from high-resolution photos which preserve the fine details of the foreground objects. Style transfer methods using diffusion models and classical CNNs shall be implemented for this task.
Requirements:
- Implementation in PyTorch
- Experience and theoretical knowledge in the area of Deep Learning, Pattern Recognition, Computer Vision, and Image Processing
Start: October 2023
In case you are interested in the topic, then please send me (@ Aline) an email with your CV and transcript of records.
Investigating the benefits of combining CNNs and transformer architecture for rail domain perception task
Eye Tracking and Pupillometry for Cognitive Load Estimation in Tele-Robotic Surgery
Inferring the cognitive load of a surgeon during robotic surgery is important to
ensure safe and effective outcomes for patients, as high cognitive load can can
lead to errors and impact performance in robot command. This information
about cognitive load can be used in training to improve user skill.
One approach to estimate the cognitive load is, to utilize eye gaze and pupillometry
measurements, which have already been demonstrated as a potential
solution to this problem. As is has been shown, that the pupil diameter is related
to the task difficulty [1–3].
In the scope of this work, eye gaze and pupillometry measurements and tool
information will be used to infer user skill and proficiency in robot command.
Therefore, the eyetracker must be calibrated to the da Vinci robot vision pipeline
with a SPAAM-type of calibration [4, 5], and tool tracking methods in robotic
surgery must be developed.
References:
[1] Andrew T. Duchowski, Krzysztof Krejtz, Nina A. Gehrer, Tanya Bafna, and
Per Bækgaard. The low/high index of pupillary activity. In Proceedings of
the 2020 CHI Conference on Human Factors in Computing Systems, CHI
’20, page 1–12, New York, NY, USA, 2020. Association for Computing Machinery.
[2] Andrew T. Duchowski, Krzysztof Krejtz, Izabela Krejtz, Cezary Biele, Anna
Niedzielska, Peter Kiefer, Martin Raubal, and Ioannis Giannopoulos. The
index of pupillary activity: Measuring cognitive load vis-`a-vis task difficulty
with pupil oscillation. In Proceedings of the 2018 CHI Conference on Human
Factors in Computing Systems, CHI ’18, page 1–13, New York, NY, USA,
2018. Association for Computing Machinery.
[3] Krzysztof Krejtz, Andrew T. Duchowski, Anna Niedzielska, Cezary Biele,
and Izabela Krejtz. Eye tracking cognitive load using pupil diameter and
microsaccades with fixed gaze. PLOS ONE, 13(9):1–23, 09 2018.
[4] Kenneth R. Moser, Mohammed Safayet Arefin, and J. Edward Swan. Impact
of alignment point distance and posture on spaam calibration of optical seethrough
head-mounted displays. In 2018 IEEE International Symposium on
Mixed and Augmented Reality (ISMAR), pages 21–30, 2018.
[5] Mihran Tuceryan, Yakup Genc, and Nassir Navab. Single-Point Active
Alignment Method (SPAAM) for Optical See-Through HMD Calibration
for Augmented Reality. Presence: Teleoperators and Virtual Environments,
11(3):259–276, 06 2002.
Human motor intention decoding from neuroimaging data with explainable feature importance maps
Natural Language Text Generation for Symbolic Descriptions Using Language Models
In today’s automated robotics industry, there is a growing need for efficient and automated methods for generating text descriptions for actuator & sensor variables, functions, and properties, which are collectively referred to as symbolic descriptions hereby. This is because symbolic descriptions are often used to document the functionality of robotic systems, model the functionality of the robots, and communicate the functionality of robotic systems to human operators. The current manual process of generating text descriptions for symbolic descriptions is time-consuming, labor-intensive, and inconsistent.
This research proposes to develop an automated text generation system for symbolic descriptions in robotics using open-source pre-trained language models like GPT-2 [1], LLaMA [2], MPT etc. The decision to finalize the model will be based on factors including, but not limited to, performance, suitability for the downstream task, cost of training and inference, and interpretability of the results produced by the models. The implemented model will be able to generate text descriptions in English and German and preserve the structure of the original text descriptions. The system will be implemented using a fine-tuning approach and trained on a dataset of symbolic descriptions and their corresponding text descriptions.
The expected outcomes of this research are:
- Fine-tune a language model for automated text generation for symbolic descriptions in robotics.
- Demonstrating its efficiency in generating accurate and contextually relevant text descriptions. Evaluating and analyzing the model’s performance using perplexity, ROUGE, and BLEU scores.
- A conceptual lifecycle consideration of the training pipeline, highlighting scalability, maintenance, adaptability, and reusability aspects.
- One of the constraints of this research is to interpret the reasoning behind the model. This is an active research field, with some established techniques like attention visualization [3], feature attribution [4], and potentially, Weights and Bias could be used for this purpose.
The proposed system is still under development, and several future work directions could be explored. These include:
- Expanding the system to support other languages.
- Improving the system’s ability to generate text descriptions that are consistent with the original text descriptions.
This research has the potential to significantly contribute to the field of automated text generation for robotics. The proposed system has the potential to be used in a variety of applications, such as documentation, modeling functionality, and communication. The research results will be of interest to the research community and practitioners in these domains.
References:
[1] Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language Models are Unsupervised Multitask Learners.
[2] Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M., Lacroix, T., Rozière, B., Goyal, N., Hambro, E., Azhar, F., Rodriguez, A., Joulin, A., Grave, E., & Lample, G. (2023). LLaMA: Open and Efficient Foundation Language Models. ArXiv, abs/2302.13971.
[3] Yeh, C., Chen, Y., Wu, A., Chen, C., Vi’egas, F., & Wattenberg, M. (2023). AttentionViz: A Global View of Transformer Attention. ArXiv, abs/2305.03210.
[4] Zhou, Y., Booth, S., Ribeiro, M., & Shah, J.A. (2021). Do Feature Attribution Methods Correctly Attribute Features? ArXiv, abs/2104.14403.