Anatomical Landmark Detection for Pancreatic Vessels in Computed Tomography

Thesis Description

Pancreatic cancer remains one the most lethal forms of cancer with a five-year survival rate of approximately
6% [1]. As a consequence, clinically motivated visualizations of the vessel system around the pancreas are indispensable
to guide interventions and early diagnosis. For this purpose, computed tomography (CT) presents
a non-invasive modality that allows for the rapid, reliable and accurate visualization of the vascalature around
the pancreas [2]. However, prior to such investigations, the cumbersome manual detection of the vessels of
interest is necessary. In this work, the automated anatomical landmark detection of pancreatic vessels in CT is
of major interest. Subsequent to the detection of the start and end point of each vessel, an existing path tracing
algorithm can be initialized to determine the vessels pathway which facilitates the visual guidance for physicians.

Hitherto, existing anatomical landmark detection models predominantley rely on fully convolutional neural
networks (FCNNs). There are two main CNN-based approaches for anatomical landmark detection. First, the
landmark coordinates can be directly regressed from the input image. Novel approaches showcase high performing
landmark detection models by combining YOLO-based object detectors and ResNets for sophisticated
hierachical coordinate regression [3, 4]. However, this involves a complex image-to-coordinate mapping that
exhibits limited performance, particulary when dealing with volumetric medical image data [5]. Moreover, the
presented methods work either with 2D data or on sliced 3D data which fails to capture spatial context in
all three dimensions. Secondly, the landmarks can be retrieved from predicted segmentation heatmaps with a
subsequent post-processing of the heatmaps to coordinates [5, 6]. Thus, this approach harnesses the exceptional
successes of CNN-based image-to-image models in the medical segmentation realm. Ronneberger et al. [7] laid
the foundation for numerous U-shaped segmentation models which are also used for anatomical landmark detection
[6]. In 2021, Isensee et al. [8] introduced nnU-Net which suits as a baseline model due to its remarkable
performance and automatic configuration. Additionally, Baumgartner et al. [9] extended up-on the segmentation
framework nnU-Net and present a framework specialized for medical object detecion, named nnDetection.
However, despite the excellent performance of FCNN-based models, they fail to learn explicit global semantic
information owing to the intrinsic locality of convolutions. Consequently, Vision Transformers (ViTs) can
be employed to better capture long-range dependencies and resolve ambiguities for the anatomical landmark
detection task. The work of Tang et al. introduces Swin-UNETR [10], a ViT-based segmentation architecture
which uses Swin-Transformer modules [11] for 3D medical image data. As a result, the comparison of ViT and
CNN approaches including direct coordinate regression as well as the segmentation of landmark heatmaps for
the detection of vessels around the pancreas is of principal importance for this work.

To conclude, the overall goal of this thesis is to find a robust anatomical landmark detection model for the
start and end points of pancreatic blood vessels. Firstly, at least two state of the art segmentation models are
implemented and evaluated for the given landmark detection task. Based on the literature review, the U-shaped
FCNN approaches nnU-Net and nnDetection as well as the Swin-UNETR, which is a promising ViT-based
model, are investigated. Optionally, a direct coordinate regression model is implemented (e.g. YARLA [3]).
Then, all models are fine-tuned and extended to optimally solve the pancreatic vessel detection problem. This
could involve the integration of prior knowledge of the vasculature, enhancing the pre- and/ or post-processing,
or lastly, evolving the model architectures itself.

Summary:
1. Preprocess volumetric 3D CT data and vessel centerline annotations
2. Investigate appropriate landmark detection models for volumetric 3D medical imaging data
3. Implementation and evaluation of baseline models from literature including CNNs, ViT, coordinate regressors
(a) nnU-Net / nnDetection: U-Net based segmentation frameworks
(b) Swin-UNETR: Swin-Transformer for image encoding, CNN based decoder
(c) YARLA: YOLO + ResNet approach for anatomical landmark regression
4. Improving the baseline models by
(a) Incorporate prior knowledge (spatial configuration context, segmentation maps, ..)
(b) Combine Swin-UNETR Pre-trained encoder and Swin-Unet [12] decoder
(c) Improved post-processing of segmentated heatmaps

References

[1] Milena Ilic and Irena Ilic. Epidemiology of pancreatic cancer. World Journal of Gastroenterology, 22:9694–
9705, 2016.
[2] Vinit Baliyan, Khalid Shaqdan, Sandeep Hedgire, and Brian Ghoshhajra. Vascular computed tomography
angiography technique and indications. Cardiovascular Diagnosis and Therapy, 9, 8 2019.
[3] Alexander Tack, Bernhard Preim, and Stefan Zachow. Fully automated assessment of knee alignment from
full-leg X-rays employing a ”YOLOv4 and resnet landmark regression algorithm” (YARLA): Data from
the osteoarthritis initiative 4. Computer Methods and Programs in Biomedicine, 205, 2021.
[4] Mohammed A. Al-Masni, Woo-Ram Kim, Eung Yeop Kim, Young Noh, and Dong-Hyun Kim. A two
cascaded network integrating regional-based YOLO and 3D-CNN for cerebral microbleeds detection. International
Conferences of the IEEE Engineering in Medicine and Biology Society, 42:1055–1058, 6 2020.
[5] Christian Payer, Darko ˚A tern, Horst Bischof, and Martin Urschler. Integrating spatial configuration into
heatmap regression based CNNs for landmark localization. Medical Image Analysis, 54:207–219, 5 2019.
[6] Heqin Zhu, Qingsong Yao, Li Xiao, and S. Kevin Zhou. You only learn once: Universal anatomical landmark
detection. Medical Image Computing and Computer Assisted Intervention-MICCAI, 24:85–95, 9 2021.
[7] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical image
segmentation. Medical Image Computing and Computer-Assisted Intervention-MICCAI, 18:234–241, 10
2015.
[8] Fabian Isensee, Paul F. Jaeger, Simon A.A. Kohl, Jens Petersen, and Klaus H. Maier-Hein. nnU-Net: A selfconfiguring
method for deep learning-based biomedical image segmentation. Nature Methods, 18:203–211,
2 2021.
[9] Michael Baumgartner, Paul F. Jaeger, Fabian Isensee, and Klaus H. Maier-Hein. nnDetection: A selfconfiguring
method for medical object detection. Medical Image Computing and Computer-Assisted
Intervention-MICCAI, pages 530–539, 6 2021.
[10] Yucheng Tang vanderbilt Tang, Dong Yang, Wenqi Li, et al. Self-supervised pre-training of swin transformers
for 3D medical image analysis. Proceedings of the IEEE/CVF Conference on Computer Vision and
Pattern Recognition (CVPR), pages 20730–20740, 2022.
[11] Ze Liu, Yutong Lin, Yue Cao, et al. Swin transformer: Hierarchical vision transformer using shifted
windows. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages
10012–10022, 2021.
[12] Hu Cao, Yueyue Wang, Joy Chen, et al. Swin-unet: Unet-like pure transformer for medical image segmentation.
European Conference on Computer Vision, 17:205–218, 10 2022.