Index

Incorporating Time Series Information into Glacier Segmentation and Front Detection using U-Nets in Combination with LSTMs and Multi-Task Learning

This thesis aims at integrating time series information into the static segmentation of glaciers and their
calving fronts in synthetic aperture radar (SAR) image sequences. U-Nets have recently been shown
to provide promising results for glacier (front) segmentation using synthetic aperture radar (SAR)
imagery [1]. However, this approach only incorporates the spatial information in a single image. The
temporal information of complete image sequences, each showing one glacier at different time points,
has not been addressed thus far. To fill this gap two approaches shall be worked on:

  • approach 1; using Long Short-Term Memory (LSTM) layers in the U-Net architecture:
    Recurrent Neural Networks like LSTMs are designed such that information from previous
    inputs in a sequence can be stored in a memory and used to ameliorate the prediction for
    the current input. The combination of structured LSTMs and Fully Convolutional Networks
    (FCNs) showed promising results for joint 4D segmentation of longitudinal MRI [2]. In [3], a
    U-Net was successfully combined with a bi-directional convolutional LSTM for aortic image
    sequence segmentation outperforming a simple U-Net in segmentation accuracy. In this thesis,
    the combination of LSTMs and U-Nets will be tested for glacier segmentation and calving
    front detection in SAR image sequences. Moreover, the use of Recurrent layers (RNN), Gated
    Recurrent Units (GRU) and bi-directional LSTMS instead of simple LSTMs shall be investigated
    as well.
  • approach 2; Multi-Task Learning (MLT): As the region to be segmented for calving front
    detection is a small part of the image, this task shows a severe class-imbalance. To improve its
    performance, an MLT approach shall be implemented jointly training glacier segmentation and
    calving front detection. Performance enhancement of U-Nets have been observed using stacking
    [4] and shared encoding networks [5, 6]. In this thesis, both MLT techniques shall be tested
    using U-Nets in combination with LSTMs (see point 1).

The resulting models will be compared quantitatively and qualitatively with the state-of-the-art and
shall be implemented in Keras.

 

[1] Zhang et al. “Automatically delineating the calving front of Jakobshavn Isbræ from multitemporal
TerraSAR-X images: a deep learning approach.” The Cryosphere 13, no. 6 (2019): 1729-1741.

[2] Gao et al. “Fully convolutional structured LSTM networks for joint 4D medical image segmentation.”
In: IEEE 15th International Symposium on Biomedical Imaging, Washington, DC, 2018, IEEE, pp.
1104-1108.

[3] Bai et al. “Recurrent Neural Networks for Aortic Image Sequence Segmentation with Sparse Annotations.”
In Alejandro F. Frangi, Julia A. Schnabel, Christos Davatzikos, Carlos Alberola-L´opez, Gabor Fichtinger
(Eds.): Medical Image Computing and Computer Assisted Intervention – MICCAI, 2018, pp. 586-594.

[4] Sun et al. “Stacked U-Nets with Multi-Output for Road Extraction.” In: CVPR Workshops, Salt Lake
City, 2018, pp. 202-206.

[5] Ke et al. “Learning to segment microscopy images with lazy labels.” In: ECCV Workshop on BioImage
Computing, 2020.

[6] Lee et al. “Multi-Task Learning U-Net for Single-Channel Speech Enhancement and Mask-Based Voice
Activity Detection.” Applied Sciences 10, no. 9 (2020): p. 3230.

torchsense – a PyTorch-based Compressed Sensing reconstruction framework for dynamic MRI

In this master thesis a novel deep learning-based reconstruction method specifically tailored for cardiac radial cine MRI image sequences is investigated. Despite the many advantages presented by state-of-the-art unrolled networks, their applicability is limited due to integration of the forward operator into the scheme which poses a computational challenge within the scope of dynamic non-Cartesian MRI. The novelty of our algorithm constitutes the decoupling of regularization and data consistency enforcement into two separate steps that can be combined into an end-to-end reconstruction scheme which reduces the usage of the forward operator and, thereby, offers more flexibility. In contrast to unrolled networks, the regularization step will be achieved by a lightweight denoising CNN, in some cases leading to a closed-form solution of the data-consistency step.

Utilizing the flexibility (e.g., variable network length at test time), we will seek to increase the undersampling ratio of the k-space, thereby, allowing a higher temporal resolution using an existing acquisition scheme.

Automatic segmentation of whole heart

Congenital Disease (CD) are defects that exist in newborn babies. Neural tube defects, craniofacial
anomalies, congenital heart diseases (CHD) are some of them and amongst them, Congenital Heart
Diseases are the most common type of anomalies that a ect 4 to 50 per 1000 infants based on the
di erence in demographic characteristics and experiment conditions [1].
Medical Image segmentation is one of the most important parts of planning the steps of treatment
for patients with CHD. Image segmentation techniques aim to detect boundaries within a 2D or 3D
image and partition the image into meaningful parts based on pixel level information e.g. intensity
value and spatial information e.g. anatomical knowledge [3]. However, segmentation for a single 3D
medical image might take some hours. In addition to that, the complexity of images and the fact that
understanding these images needs medical expertise make them costly to annotate which makes an
automatic segmentation framework crucial.
Previously an interactive segmentation method is suggested for this purpose [2]. This master
thesis aims to reduce the manual interaction of the users by investigating di erent machine learning
approaches to nd a highly accurate model that could potentially replace the interactive solution.
The thesis has to comprise the following work items:
• Literature overview of state-of-the-art segmentation methods, particularly deep learning meth-
ods, for 3D medical images.
• Implementation and training of di erent deep learning segmentation models.
• Evaluation of trained models based on dice score and comparing them to previous interactive
approaches.
References
[1] Manuel Giraldo-Grueso, Ignacio Zarante, Alejandro Meja-Grueso, and Gloria Gracia. Risk factors
for congenital heart disease: A case-control study. Revista Colombiana de Cardiologa, 27(4):324{
329, 2020.
[2] Danielle F Pace. Image segmentation for highly variable anatomy: applications to congenital heart
disease. PhD thesis, Massachusetts Institute of Technology, 2020.
[3] Felix Renard, Soulaimane Guedria, Noel De Palma, and Nicolas Vuillerme. Variability and repro-
ducibility in deep learning for medical image segmentation. Scienti c Reports, 10(1):1{16, 2020.

Automatic Bird Individual Recognition in Multi-Channel Recording Scenarios

Problem background:
At the Max-Planck-Institute for Ornithology in Radolfszell several birds are equipped with
backpacks to record their calls. But not only the sound of the equipped bird is recorded but also
of the birds in its surroundings and as a result the scientists receive several non-synchronous
audio tracks with bird calls. The biologists have to manually match the calls to the individual
birds, which is time-consuming and can easily lead to mistakes.
Goal of the thesis:
The goal of this thesis is to implement a python framework that can assign the calls to the
corresponding birds.
Since the intensity of a call decreases exponentially with distance, the loudest call can be
matched to the bird with this recorder. Also, the call of the mentioned bird appears earlier on
its own recording device than on the other devices.
To assign the further calls to the remaining birds, the soundtracks must be compared by
overlaying the audio signals. For this purpose, the audio signals have to be modified first:
Since different devices are used for capturing data and because the recordings cannot be started
at the same time, a linear time offset between the recordings occurs. Also, a linear distortion
appears as the devices record at different frequencies.
To remove these inconsistencies, similar characteristics must be found in the audio signals and
then the audio tracks have to be shifted and processed until these characteristics lie one above
another. There are several methods to filter out these characteristics, whereby the most precise
methods require human assistance [1]. But there are also some automated approaches, where
the audio track is scanned for periodic signal parameters such as pitch or spectral flatness.
Effective features are essential for the removal of distortion as well as a good ability of the
algorithm to distinguish between minor similarities of the characteristics [2].
The framework will be implemented in Python. It should process the given audio tracks and
recognize and reject disturbed channels.
References:
[1] Brett G. Crockett, Michael J. Smithers. Method for time aligning audio signals using
characterizations based on auditory events, 2002
[2] Jürgen Herre, Eric Allamanche, Oliver Hellmuth. Robust matching of audio signals using
spectral flatness features, 2002

Height Estimation for Patient Tables from Computed Tomography Data

Image Segmentation via Transformers

The recent outburst of Transformers has started after having outperformed previously known stateof-
the-art approaches like long short-term memory and gated recurrent neural networks in sequence
modelling and transduction problems such as language modelling and machine translation. Transformers
avoid recurrence and instead rely entirely on an attention mechanism to draw global dependencies
between input and output [1]. Furthermore, Transformers are now being incorporated and tested out
in domains of computer vision tasks like classification [2], detection [3], segmentation [4] and as
generative adversarial networks (GANs) [5] by considering image-patches to have a sequence-potential.
Transformer Architecture was successfully used to perform object detection which helped drop away
many hand-designed components like a non-maximum suppression procedure or anchor generation
that explicitly encodes our prior knowledge about the task. Subsequently, it was extended for panoptic
segmentation. Although, Transformers used for segmentation did not only exploit the sequencepotential
but typically still used some form of Convolutional Neural Networks (CNNs) along with it.
However, Jiang et al. has proposed a pure Transformer based model in GAN environment (TransGAN)
for image generation ensuring the possibility of dropping CNNs in GANs [5].
In this work, the idea of using image patches as a sequence input into a Transformer model without
CNNs is carried out for segmentation tasks.
The thesis consists of the following milestones:

  • Modifying TransGAN discriminator and generator as encoder and decoder respectively for
    segmentation
  • Evaluating performance on the Cityscapes dataset [6].
  • Further experiments and improvements regarding learning and network architecture.
    The implementation should be done in PyTorch Lightning.

 

[1] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser,
and Illia Polosukhin. Attention is all you need. In Proceedings of the 31st International Conference on
Neural Information Processing Systems, pages 6000–6010, 2017.

[2] Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner,
Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. An image is worth 16×16
words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.

[3] Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey
Zagoruyko. End-to-end object detection with transformers. In European Conference on Computer Vision,
pages 213–229. Springer, 2020.

[4] Sixiao Zheng, Jiachen Lu, Hengshuang Zhao, Xiatian Zhu, Zekun Luo, Yabiao Wang, Yanwei Fu, Jianfeng
Feng, Tao Xiang, Philip HS Torr, et al. Rethinking semantic segmentation from a sequence-to-sequence
perspective with transformers. arXiv preprint arXiv:2012.15840, 2020.

[5] Yifan Jiang, Shiyu Chang, and Zhangyang Wang. Transgan: Two transformers can make one strong gan.
arXiv preprint arXiv:2102.07074, 2021.

[6] Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson,
Uwe Franke, Stefan Roth, and Bernt Schiele. The cityscapes dataset for semantic urban scene understanding.
In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.

Manifold Forests

Random Forests for Manifold Learning

 

Description: There are many different methods for manifold learning, such as Locally Linear Embedding, MDS, ISOMAP or Laplacian Eigenmaps. All of them use a type of local neighborhood that tries to approximate the relationship of the data locally, and then try to find a lower dimensional representation which preserves this local relationship. One method to learn a partitioning of the feature space is by training a density forest on the data [1]. In this project the goal is to implement a Manifold Forest algorithm that finds a 1-D signal of length N in a series of N input images by learning a density forest on the data and afterwards applying Laplacian Eigenmaps on the data. For this, existing frameworks, like [2], [3], or [4] can be used as forest implementation. The Laplacian Eigenmaps algorithm is already implemented and can be integrated.

The concept of Manifold Forests is also introduced in the FAU lecture Pattern Analysis by Christian Riess, which makes candidates who have already heard this lecture preferred.

This project is intended for students wanting to do a 5 ECTS sized module like a research internship, starting now or asap. The project will be implemented in Python.

 

References:

[1]: Criminisi, A., Shotton, J., & Konukoglu, E. (2012). Decision Forests: A Unified Framework for Classification, Regression, Density Estimation, Manifold Learning and Semi-Supervised Learning. Foundations and Trends® in Computer Graphics and Vision, 7(2–3), 81–227. ; https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/CriminisiForests_FoundTrends_2011.pdf

[2]: https://github.com/CyrilWendl/SIE-Master

[3]: https://github.com/ksanjeevan/randomforest-density-python

[4]: https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomTreesEmbedding.html#sklearn.ensemble.RandomTreesEmbedding

 

Learning Multi-Catheter Reconstructions for Interstitial Breast Brachytherapy

Thesis Description

Female breast cancer accounts for 355.000 new cases among all types of cancer in EU-27 countries in 2020. In Germany alone, approximately 69,000 new cases are diagnosed each year [1]. During the past four decades, breast conserving surgery (BCS) after lumpectomy in combination with radiotherapy (RT) has been most widely accepted as this treatment technique reduces both a patient’s emotional as well as psychological traumata due to superior aesthetic outcome [2]. The standard technique of giving RT after BCS is whole breast irradiation (WBI) where a patient’s entire breast is irradiated up to a total dose of 40 to 50 Gray (Gy). BCS with adjuvant WBI yields evident equivalence in terms of local tumor control compared to mastectomy where the entire breast is amputated. However, approximately 50 % of early breast cancer patients still undergo mastectomy in order to omit either RT at all or 5 to 7 weeks of treatment time [3]. In contrast to external breast irradiation, accelerated partial breast irradiation (APBI) is an emerging standalone post-operative alternative treatment option in brachytherapy [4]. One valid strategy of applying APBI is multi-catheter interstitial brachytherapy (iBT). Thereby, up to 30 highly flexible plastic catheters are implanted into a patient’s breast in order to precisely and locally damage the tumor by guiding a radioactive source through the tissue. In BCT, the radioactive dose is delivered by utilizing a high dose rate (HDR) technique where the prescribed dose is administered with a rate of 12 Gy/h by single source within minutes [5]. This is performed by an afterloading system connected to the catheters via transfer tubes [4, 6]. Sole APBI is not only intended to drastically reduce treatment times to only 4 to 5 days but also to decrease the amount of radiation exposure of adjacent organs at risk (OAR) such as the lung, the skin and, in particular, the heart [7]. After implantation, catheter traces are manually reconstructed based on an acquired computed tomography (CT) image for treatment planning and determining the implant geometry. Then, in the acquired CT of the patient’s breast, physicians precisely define the target volume depending on a tumor’s size and location [6]. While treatment planning, implanted plastic catheters are manually reconstructed slice by slice which takes approximately 45% of the whole treatment time [8]. Along each catheter trajectory dwell positions (DPs) connecting the points in the slices as well as dwell times (DTs) are defined. DPs determine positions where the radioactive source stops for a certain DT, thus irradiation surrounding tumor tissue. Active DPs and DTs are defined at the location of the target volume to optimally deliver prescribed radioactive dose [9]. As treatment plan dosimetry and DP positioning are directly related, accurate and fast catheter trace reconstructions are crucial [4].

However, the manual reconstruction of up to 30 catheter tubes is a time-consuming process. Kallis et al. state that manual reconstructions on average take up to 139 ± 47 seconds(s) per catheter. They also observed an interobserver variability of 0.6 ± 0.35 millimeter (mm) in terms of mean Euclidean distance between two experienced medical physicists and the autoreconstruction approach proposed by [8], thus, yielding reproducible and reliable reconstructions [6]. Similar findings were proven by Milickovic et al. in 2001 [10]. The insufficient amount of ground truth catheter trace positions as well as blurry CT imaging quality make it hard to reliably and accurately reconstruct DPs. Hence, this suggests further research to conducting automated reconstruction approaches [10].

In the last 20 years, mainly two different catheter auto-reconstruction approaches were proposed. Both techniques aim to minimize the error of implant geometries, thus, improve optimal dose coverage as well as drastically reduce reconstruction times. Milickovic et al. developed an automated catheter reconstruction algorithm based on analyzing post-implant CT data [8, 10]. However as stated by Kallis et al., CT based treatment planning in multi-catheter iBT highly depends on image quality. Due to patient movements, artifacts, as well as acquisition noise, automatically extracted DPs have to be corrected by manual intervention which increases reconstruction times [6]. As introduced by Zhou et al. in 2013, electromagnetic tracking (EMT) became a promising alternative compared to CT based auto-reconstruction [11]. Further analysis has proven that EMT is applicable to iBT as this technique of localizing dwell positions in iBT offers sparse, precise, and sufficiently accurate dose calculations [12]. Reducing uncertainties including measurement noise is investigated by postprocessing of sensor data by particle filters. In their work, a mean error of 2.3 mm between clinically approved plan and reconstructed DPs has been reported [13]. Although tracking multi-catheter positions in iBT based on EMT offers imaging artifact independent and fast results, the performance of EMT systems depends heavily on system configurations, e.g. the distance between CT table and patient bed. The error drastically increases from approximately 1 to 4 mm when decreasing the table/bed distance [12].

In recent years, deep learning (DL) has shown to be a powerful technique tackling a variety of computer vision tasks such as medical image analysis. DL based approaches offer highly competitive results in terms of accuracy and efficiency [14, 15]. Deep neural network (DNN) model architectures are able to represent high dimensional non-linear spaces, thus are well suited for the task of automatically reconstructing multi-catheter traces in iBT. Built upon an elegant way of designing DNN architectures – so called Fully Convolutional Networks (FCN) [16] – the UNet architecture has proven to be well suited for image based segmentation tasks as this specific model structure’s output has the same shape as the input [17]. C¸i¸cek et al. developed an extended version of the UNet where all 2D operations are replaces with corresponding 3D ones. This topological modification enables volumetric semantic segmentations [18]. In this Master’s thesis a deep learning based multi-catheter reconstruction method for iBT is presented, investigated, and evaluated using real world breast cancer data from the radiation clinic in Erlangen, Germany. To the best of our knowledge this is the first approach where we introduce artificial intelligence based multi-catheter reconstruction algorithm in breast brachytherapy.

References

  1. Jacques Ferlay et al. Global cancer observatory: Cancer today. https://gco.iarc.fr/today. Accessed: 2021-03-22.
  2. Csaba Polg´ar et al. High-dose-rate brachytherapy alone versus whole breast radiotherapy with or without tumor bed boost after breast-conserving surgery: Seven-year results of a comparative study. International journal of radiation oncology, biology, physics, 60:1173–81, 12 2004.
  3. Vratislav Stranad et al. 5-year results of accelerated partial breast irradiation using sole interstitial multicatheter brachytherapy versus whole-breast irradiation with boost after breast-conserving surgery for low-risk invasive and in-situ carcinoma of the female breast: a randomised, phase 3, non-inferiority trial. The Lancet, 387(10015):229–238, 2016.
  4. Vratislav Strnad, R. P¨otter, G. Kov´acs, and T. Block. Practical Handbook of Brachytherapy. UNI-MED Science. UNI-MED-Verlag, 2010.
  5. Daniela Kauer-Dorner and Daniel Berger. The role of brachytherapy in the treatment of breast cancer. Breast Care, 13, 05 2018.
  6. Karoline Kallis et al. Impact of inter- and intra-observer variabilities of catheter reconstruction on multicatheter interstitial brachytherapy of breast cancer patients. Radiotherapy and Oncology, 135:25–32, 06 2019.
  7. Vratislav Strnad et al. Estro-acrop guideline: Interstitial multi-catheter breast brachytherapy as accelerated partial breast irradiation alone or as boost – gec-estro breast cancer working group practical recommendations. Radiotherapy and Oncology, 128, 04 2018.
  8. Milickovic et al. Catheter autoreconstruction in computed tomography based brachytherapy treatment planning. Medical Physics, 27(5):1047–1057, 2000.
  9. Cheng B. Saw, Leroy J. Korb, Brenda Darnell, K.V. Krishna, and Dennis Ulewicz. Independent technique of verifying high-dose rate (hdr) brachytherapy treatment plans. International Journal of Radiation Oncology*Biology*Physics, 40(3):747–750, 1998.
  10. Natasa Milickovic, Dimos Baltas, and Nikolaos Zamboglou. Automatic reconstruction of catheters in ct based bracytherapy treatment planning. In ISPA 2001. Proceedings of the 2nd International Symposium on Image and Signal Processing and Analysis. In conjunction with 23rd International Conference on Information Technology Interfaces (IEEE Cat., pages 202–206, 2001.
  11. Jun Zhou, Evelyn Sebastian, Victor Mangona, and Di Yan. Real-time catheter tracking for high-dose-rate prostate brachytherapy using an electromagnetic 3d-guidance device: A preliminary performance study. Medical Physics, 40(2):021716, 2013.
  12. Markus Kellermeier, Jens Herbolzheimer, Stephan Kreppner, Michael Lotter, Vratislav Strnad, and Christoph Bert. Electromagnetic tracking (emt) technology for improved treatment quality assurance in interstitial brachytherapy. Journal of Applied Clinical Medical Physics, 18:211–222, 01 2017.
  13. Theresa Ida G¨otz et al. A tool to automatically analyze electromagnetic tracking data from high dose rate brachytherapy of breast cancer patients. PLOS ONE, 12(9):1–31, 09 2017.
  14. Florian Kordon et al. Multi-task localization and segmentation for x-ray guided planning in knee surgery. In Dinggang Shen, Tianming Liu, Terry M. Peters, Lawrence H. Staib, Caroline Essert, Sean Zhou, Pew-Thian Yap, and Ali Khan, editors, Medical Image Computing and Computer Assisted Intervention – MICCAI 2019, pages 622–630, Cham, 2019. Springer International Publishing.
  15. Florian Kordon, Ruxandra Lasowski, Benedict Swartman, Jochen Franke, Peter Fischer, and Holger Kunze. Improved x-ray bone segmentation by normalization and augmentation strategies. In Heinz Handels, Thomas M. Deserno, Andreas Maier, Klaus Hermann Maier-Hein, Christoph Palm, and Thomas Tolxdorff, editors, Bildverarbeitung fu¨r die Medizin 2019, pages 104–109, Wiesbaden, 2019. Springer Fachmedien Wiesbaden.
  16. Jonathan Long, Evan Shelhamer, and Trevor Darrell. Fully convolutional networks for semantic segmentation. CoRR, abs/1411.4038, 2014.
  17. Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical image segmentation. In Nassir Navab, Joachim Hornegger, William M. Wells, and Alejandro F. Frangi, editors, Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, pages 234–241, Cham, 2015. Springer International Publishing.
  18. Ozgu¨n C¸i¸cek, Ahmed Abdulkadir, Soeren S. Lienkamp, Thomas Brox, and Olaf Ronneberger. 3d u-net:¨ Learning dense volumetric segmentation from sparse annotation. CoRR, abs/1606.06650, 2016.

Abnormality detection on musculoskeletal radiographs

Thesis Description

The primary objective of this thesis project is to develop an algorithm that can determine whether a musculoskeletal X-ray study is normal or abnormal. For this purpose, we only consider X-rays of the upper extremities including the shoulder, humerus, elbow, forearm, wrist, hand, and finger. By abnormalities we consider fractures, hardware, degenerative joint diseases, lesions, subluxations, and other deviations from the standard structural composition and morphology. Given an X-ray image as an input, the devised algorithm should output a labeled image which indicates the presence or absence of an abnormality.  Such a system could be used to enhance the confidence of the radiologist or prioritize subsequent analysis and treatment options.

The task to determine abnormality on musculoskeletal radiographs is particularly critical since more than 1.7 billion people around the globe are affected by musculoskeletal conditions [12]. Since a radiograph is the cheapest, best available and usually the first measure to detect musculoskeletal abnormalities, automatic detection and localization of such potential abnormalities enables a faster initial diagnosis, saves valuable time for physicians, and reduces the number of subsequent diagnostic treatments required on the patient.  This will also reduce the work pressure and fatigue of radiologists [10] which is caused by overwhelming number of X-ray studies they have to diagnose every day [11].

In this project we will use a large public data set called ‘MURA-v1.1’ published by Stanford Machine Learning Group of Stanford University [1]. The data set consists of 14,863 studies from 12,173 patients with a total of 40,561 multi-view radiographic images. Board-certified radiologists from Stanford Hospital manually labeled the radiographs as normal or abnormal. Out of 14,863 studies 9,045 are normal and 5,818 are abnormal.

The project is structured into three parts. First, a learning-based classification algorithm is used to predict whether a radiograph is normal or abnormal [1,2]. Second, anatomical information derived from the dataset’s annotation is incorporated to additionally predict the anatomical origin of the radiograph [3,4,6,7,8]. In a last step, the abnormality is localized and visualized by incorporating the results from the previous steps in combination with targeted feature space analysis. All components should then be combined to a framework capable to predict, localize and visualize musculoskeletal abnormality. Algorithmic development is based on recent advances in deep learning techniques building upon the DenseNet [9] and ResNet [13] neural network architecture. A main aspect of the work is the conception and implementation of an integration strategy of additional anatomical information. It shall also be analyzed to what extent this information can support and improve the classification of abnormal and normal radiographs. Prior work of multi-task/multi-label optimization is investigated and examined for applicability to this project’s task [3,4,5,6,7].  The project is fixed to a six-month period timeline and will be concluded by a detailed project report. Technical implementation of the prototype will be performed within the PyTorch environment for the Python programming language.

 

References

  1. Rajpurkar P., Irvin J., Bagul A., Ding D., Duan T., Methta H., Yang B., Zhu K., Laird D., Ball R., Langlotz C., Shpanskaya K., Lungren M., Ng A. , “MURA: Large Dataset for Abnormality Detection in Musculoskeletal Radiographs” 1st Conference on Medical Imaging with Deep Learning (MIDL 2018)
  2. Guendel S., Grbic S., Gerogescu B., Zhou K., Ludwig R., Meier A., “Learning to recognize abnormalities in chest x-rays with location aware dense networks.” arxiv preprint arXiv:1803.04565 ,2018
  3. Guendel S., Ghesu F., Grbic S., Gibson E., Gerogescu B., Maier A.,“ Multi-task Learning for Chest X-ray Abnormality Classification on Noisy Labels “ arxiv preprint arXiv:1905.06362 ,2019
  4. Yang, X., Zeng, Z., Yeo, S.Y., Tan, C., Tey, H.L., Su, Y., “A novel multi-task deep learning model for skin lesion segmentation and classification.” arxiv preprint arXiv:1703.01025 ,2017
  5. Vesal S., Ravikumar N., Maier A., ‘‘A Multi-task Framework for Skin Lesion Detection and Segmentation’’ arxiv preprint arXiv:1808.01676 ,2018
  6. Kendall A., Gal Y., Cipolla R.,”Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics” arxiv preprint arXiv:1705.07115 ,2017
  7. Vandenhende S., Brandere B., Gool Luc., “Branched Multi-Task Networks: Deciding What Layers To Share“ arxiv preprint arXiv:1904.02920 ,2019
  8. Berlin L., ”Liability of interpreting too many radiographs.” American Journal of Roentgenology, 175(1):17–22, 2000
  9. Huang G., Liu Z., Weinberger K.Q., and van der Maaten, Laurens, “Densely connected convolutional networks.” arXiv preprint arXiv:1608.06993, 2016.
  10. Lu Y., Zhao S., Chu P.W., and Arenson R.L., “An update survey of academic radiologists’ clinical productivity.” Journal of the American College of Radiology, 5(7):817–826, 2008.
  11. Nakajima Y., Yamada K., Imamura K., and Kobayashi K.. ,”Radiologist supply and workload: international comparison.” Radiation medicine, 26(8):455–465, 2008.
  12. URL http://www.boneandjointburden.org/2014-report.
  13. He K., Zhang X., Ren S., Sun J., “Deep Residual Learning for Image Recognition” arxiv preprint arXiv:1512.03385

 

Localization and Standard Plane Regression of Vertebral Bodies in Intra-Operative CBCT Volumes

Thesis Description

Spinal fractures account for 4.3% of all bone fractures and occur in 30 – 50% of people with an age above 50 [1, 2]. These fractures often remain unnoticed and do not induce issues for the affected. However, this behaviour poses a direct risk. Untreated damages increase the probability and severity of future fractures which lead to pain, reduced quality of life or increased mortality [1, 3]. Moreover, about 10% of the vertebral fractures are directly linked to injuries of the spinal cord [2]. Spinal cord damages result in a wide range of disorders including movement restrictions, loss of sensitivity, autoimmune diseases or even total paralysis. Their treatment highly depends on the phase and the advance of the damage. Therefore, an early damage detection improves the chances of a positive outcome [4]. Computed Tomography (CT) represents the gold standard for diagnosis of bone fractures. But especially for smaller clinics, the utilization within the intraoperative environment is denied due to high costs, a restricted access to the patient as well as substantial space requirements [5]. To allow image acquisition without the pitfalls of CT acquisition in the intraoperative suite, mobile CTs have become the clinical standard. These systems acquire 2D projections during a 190°rotation around the patient, which are used for reconstruction of a 3D volume [6].
An essential task in image-guided surgery is the generation of the so-called standard planes. These standard planes are used to obtain a standardized view of the anatomical structure showing its key features [7]. This helps to facilitate the evaluation process as well as reduces the risk to overlook damages. The standard plane regression is done manually by the reader. Although this action is used as a normalization procedure, it is dependent on the physician and leaves room for mistakes [8]. Furthermore, the physician can not adjust the image output and perform the surgery on the patient at the same time. Therefore, the adjustment needs to be done first and the surgery can only start afterwards. This increases the overall surgery duration. To fasten this process, standardize it and make it less physician dependent, an automation of the vertebral body detection as well as standard plane regression for cone beam computed tomography (CBCT) volumes is needed.
To the best of our knowledge there are no publications on this exact topic yet. However, there are many possible building blocks for an approach to automate standard plane regression. One possibility is to use segmentation for that task. Shi et al. proposed a two-step algorithm to localize and segment vertebral bodies in CT images using a combination of a 2D and a 3D U-net [9]. Thomas et al. proposed an assistance system forankle evaluation. A sliding window approach in combination with two consecutive 3D U-Nets is used on CBCT volumes to segment the ankles and regress the standard planes for each one [7]. Another idea is to determine the standard plane parameters based on a preceding bounding box prediction of the respective object. For that purpose, Jaeger et al. introduced a Retina U-Net. The fusion of a Retina Net one-stage detector with the U-Net architecture, is used to predict bounding boxes of lesions in CT lung images [10]. In real-time object detection the gold standard is the You Only Look Once (YOLO) algorithm. The YOLO processes the whole input in one step. Multiple layers are used to predict each object’s surrounding bounding box as well as their class affiliation [11].
This thesis aims to design a framework for the localization of vertebral bodies followed by a standard plane regression in intra-operative CBCT volumes based on deep learning algorithms. The detection as well as the determination should be fast and accurate at the same time. Therefore, existing algorithms are utilized and compared against one another. The U-Net architecture, the Retina U-Net idea as well as the YOLO algorithm will be analyzed to realize the task at hand. In detail, the thesis will comprise the following work items:
  • Literature overview of state-of-the-art object detection
  • Characterization of standard planes for vertebral bodies
  • Implementation of a deep learning based method
  • Overview and explanation of the algorithms used
  • Quantitative evaluation on real-world data

 

References

[1] Ghada Ballane et al. Worldwide prevalence and incidence of osteoporotic vertebral fractures. OsteoporosisInternational, 28(5):1531–1542, 2017.
[2] Zhao Wen Zong et al. Chinese expert consensus on the treatment of modern combat-related spinal injuries. Military Medical Research, 6(1), 2019.
[3] Neil Binkley et al. Lateral vertebral assessment: A valuable technique to detect clinically significant vertebral fractures.Osteoporosis International, 16(12):1513–1518, 2005.
[4] Katari Venkatesh et al. Spinal cord injury: pathophysiology, treatment strategies, associated challenges, and future implications.Cell and Tissue Research, 377(2):125–151, 2019.
[5] Stefan Wirth et al. C-arm-based mobile computed tomography: a comparison with established imaging ont he basis of simulated treatments of talus neck fractures in a cadaveric study.Computer Aided Surgery,9(1-2):27–38, 2004.
[6] Jan Von Recum et al. Die intraoperative 3D-C-Bogen-Anwendung.Unfallchirurg, 115(3):196–201, 2012.
[7] Sarina Thomas et al. Computer-assisted contralateral side comparison of the ankle joint using flat panel technology. Technical report, 2021.
[8] Lisa Kausch et al. Toward automatic C-arm positioning for standard projections in orthopedic surgery. International Journal of Computer Assisted Radiology and Surgery, 15(7):1095–1105, 2020.
[9] Dejun Shi et al. Automatic Localization and Segmentation of Vertebral Bodies in 3D CT Volumes with Deep Learning.ISICDM 2018: Proceedings of the 2nd International Symposium on Image Computing andDigital Medicine, pages 42–46, 2018.
[10] Paul Jaeger et al. Retina U-Net: Embarrassingly Simple Exploitation of Segmentation Supervision for Medical Object Detection. Technical report, 2018.
[11] Joseph Redmon et al. You Only Look Once: Unified, Real-Time Object Detection. InProceedings of theIEEE conference on computer vision and pattern recognition, pages 779–788, 2016.