Large Language Model for Generation of Structured Medical Report from X-ray Transcriptions


Large language models (LLMs) have found applications in natural language processing. In recent years, LLMs have exhibited significant advancements in abstractive question answering, enabling them to understand questions in context, akin to humans, and generate contextually appropriate answers rather than relying solely on exact word matches. This potential has extended to the field of medicine, where LLMs can play a crucial role in generating well-structured medical reports. Achieving this goal necessitates meticulous fine-tuning of LLMs. Abstractive question answering, often referred to as generative question answering, albeit with constraints on word count, leverages techniques such as beam search to generate answers. Ideally, the language model should possess few-shot learning capabilities for downstream tasks. The goal is to generate a structured medical report based on the medical diagnosis from X-ray images.


The dataset comprises two columns: standard reports and structured reports. The model’s objective is to generate structured reports based on standard context. Leading transformer models, such as Roberta( [1]), Bart( [2]), XLnet( [3]), and T5( [4]), excel in generative (abstractive) question answering across multiple languages. These models offer various configurations based on different parameters, each with unique strengths. Some excel in downstream tasks through zero-shot learning or few-shot learning. For instance, models like Flan T5 can effectively handle 1,000 additional downstream tasks. Therefore, fine-tuning these models on a specialized sinusitis dataset is essential. The core pipeline for processing sentences within a transformer model includes positional encoding, multi-head attention for calculating attention scores with respect to other parts of the sentence, residual connections, normalization layers, and feed-forward layers. Practical implementations of these models and tokenizers are readily accessible through the Hugging Face hub. Model accuracy can also be improved using ensemble methods.

Research Objective

In summary, this research aims to automatically convert medical diagnoses from X-ray transcriptions into structured reports using LLMs. The aims of this project are:

  • Use data augmentation techniques to finetune pre-trained LLMs with low-resource data.
  • Investigate the suitability of different LLMs, e.g., T5, to create structured medical reports.
  • Evaluate the proposed approach with open-source radiology reports.


[1] S. Ravichandiran, Getting Started with Google BERT: Build and train state-of-the-art natural language processing models using BERT. Packt Publishing Ltd., 2021.
[2] M. Lewis, Y. Liu, N. Goyal, M. Ghazvininejad, A. Mohamed, O. Levy, V. Stoyanov, and L. Zettlemoyer, “Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension,” arXiv preprint arXiv:1910.13461, 2019.
[3] Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. Salakhutdinov, and Q. V. Le, “Xlnet: Generalized autoregressive pretraining for language understanding,” arXiv preprint arXiv:1906.08237, 2019.
[4] C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, and P. J. Liu, “Exploring the limits of transfer learning with a unified text-to-text transformer,” arXiv preprint arXiv:1910.10683, 2019.