Binary Mask Generation for Killer Whale Vocalizations

Beyond Visuals: A Research Odyssey to Redefine Diagnostic Metrics for Virtual Contrast-Enhanced MRI Images


Geometry-Aware Key-Point / Object Detection and Pose-Estimation

For a wide range of emerging applications an increasing demand for reliable and accurate object detection and pose estimation, using machine learning based systems arises. This is particularly the case for autonomous systems such as autonomous vehicles and robotics but also in the context of augmented reality [1]. These applications require detecting and locating objects in real-time and in various environments, including cluttered scenes and objects with similar appearances.

However, traditional object detection and pose estimation methods often can only partially detect and locate objects in these challenging situations, leading to inaccurate and unreliable results [2]. This is where Geometry-Aware Key-Point / Object Detection and Pose-Estimation comes in, as it aims to explicitly incorporate additional geometric information into these tasks to improve their accuracy and robustness. In object detection, the goal is to identify the presence and location of objects within an image or video. Pose estimation, on the other hand, refers to estimating the position and orientation of objects in 3D space based on 2D images. By including human domain knowledge in the form of geometric constraints, we would like to utilize the knowledge of domain experts to create more robust and accurate solutions by simultaneously reducing the labeling effort associated with training data-driven solutions for novel applications.

There are various approaches to incorporating geometric information into object detection and pose estimation tasks. One common approach is to use geometry-aware convolutional neural networks (Geo-CNNs) [4], which are designed to incorporate geometric information into the model architecture explicitly. Another approach is to use geometry-aware scene graph generation [5], which uses a graph-based representation to model the geometric relationships between objects in a scene. However, our approach depends on the task at hand, object shape and orientation variability, scene complexity, and we would like to utilize the knowledge of domain experts to create more robust and accurate solutions by simultaneously reducing the labeling effort associated with training data-driven solutions for novel applications. An assessment of existing methods according to those requirements is part of the literature review corresponding to the proposed work. Afterwards, a potential adaptation of an existing method or the design and implementation of a novel approach and the corresponding evaluation should be the central task of the work.

Evaluation will be performed on industrial object detection use-case with high requirements on robustness and performance. The use-case considered for evaluating the proposed method is given by the detection of pallets in the context of an autonomous pallet unloading application. For this work it is planned to start from a public data set [7] and afterwards try to transfer results to our use-case and data, which is at least partially already collected. The thesis shall be carried out within a time period of six months including the literature review.

[1] Realtime 3D Object Detection for Automated Driving Using Stereo Vision and Semantic Information
[2] Viewpoint-Independent Object Class Detection using 3D Feature Maps
[3] Unsupervised 3D Pose Estimation With Geometric Self-Supervision
[4] Modeling Local Geometric Structure of 3D Point Clouds using Geo-CNN
[5] A Comprehensive Survey of Scene Graphs: Generation and Application
[6] From Points to Parts
[7] GitHub – tum-fml/loco: Home of LOCO, the first scene understanding dataset for logistics.
[8] Nothing But Geometric Constraints
[9] DeepIM: Deep Iterative Matching for 6D Pose Estimation

A Deep Learning-Based Approach to Analyze Speech of Children with Cleft Lip and Palate

Object Detection of more than 100,000 Industrial Parts

Deep Learning-based Balloon Marker Detection from Angiography Data

Thesis Description
Coronary Artery Disease (CAD) is one the most predominant contributors to cardiovascular disease
that stands as the major cause of death globally [1]. Usually, it manifests itself as narrowed or blocked
arteries caused by plaque buildup, a condition known as Atherosclerosis. Percutaneous Coronary In-
tervention (PCI) is a frequently used treatment for CAD in which the narrowed arteries are widened.
A typical type of PCI is revascularization using angioplasty with a stent [2]. Generally, as part of this
treatment, a thin flexible tube is inserted into the femoral artery through the groin. Once the tip is
properly positioned in the blockage site, a balloon which is surrounded by a stent graft is inflated to
compress the plaque against the arterial walls. After the procedure is completed and the balloon is
removed, the stent keeps the artery open and supports the blood flow.
Real-time 2D X-ray projections serve as the guidance method of choice for catheter-based interventions
like PCI to help the physicians visually determine the position and extent of the stents [3]. However,
visualizing stents in conventional X-ray images is challenging because of their low radio-opacity. Hence,
digital stent enhancement (DSE) methods have been developed to enhance the stent visibility in X-ray
image sequences [4]. Angioplasty balloons often incorporate two highly radio-opaque markers [5]. De-
tecting and tracking these markers enables DSE methods by registering all frames within the sequence
followed by a mean intensity projection [6]. Therefore, an accurate and robust detection of the stent
markers is a crucial component of all DSE methods. This task is usually performed automatically
using machine learning (ML). An additional challenge in this domain is that due to the high frame
rates of up to 30 frames per second the marker detection needs high computational efficiency.
Possible ML techniques for this task include landmark detection and object detection approaches.
These play a crucial role in the computer vision area, particularly in the field of medical image pro-
cessing [7]. They facilitate the identification of anatomical features, recognition and precise localization
of pathological conditions, and the accurate delineation of structures of interest within medical im-
ages [8]. A few of these methodologies have been employed for balloon marker detection and stent
localization [9–15]. Some approaches rely on conventional ML approaches such as template-based-
matching [14] and adaptive thresholding [9]. On the other hand, the continuous progress in deep
learning offers promising potential for further advancements in stent localization. U-net, a widely
adopted CNN architecture in medical image segmentation, has been employed as the backbone of
several state-of-the-art models to segment the catheter shaft [12] or generate markers heatmap by
treating each landmark as a 2D Gaussian distribution [15]. Although these approaches have improved
stent visualization significantly, they primarily focus on detecting and tracking single balloon marker
pairs in each frame. To address the challenge of stabilizing multiple stents, some researchers have
employed object detection methods through guidewire endpoint localization employing an extended
variant of the Faster R-CNN model [10, 11]. However, detecting objects with common architectures
like R-CNN family models is computationally demanding. Since real-time performance is crucial for
providing instant information about the position of stents to physicians, faster models like YOLO [16]
are needed to accelerate the process. Therefore, a model based on YOLOv3 is proposed that meets
the requirement for real-time guidewire detection and endpoint localization [13].
This thesis aims to investigate a set of research questions with respect to real-time models for detecting
multiple balloon markers in fluoroscopic images. Firstly, the most promising approaches from the
literature need to be compared, also with the most recent developments in the field included. These
include the object detection networks of the YOLO family [17] as well as the heatmap-based point
regression approaches.
Secondly, the pre-processing of the fluoroscopic images can vary to a large extent as multiple algorithms
with a plethora of parameters can be altered. Therefore, it is of high interest to evaluate which influence
different pre-processing parameterization has on the performance of a marker detector.
Finally, the training of a robust network requires a large collection of data covering a large variation
of potential physical influences. To mitigate the need to collect a large amount of clinical data, an
evaluation of whether the use of suitable phantom data is sufficient shall be conducted.
The thesis will comprise the following work items:
Literature overview of state-of-the-art automated landmark and object detection approaches
Balloon marker detection
Data annotation and preprocessing
Train a network of the YOLO-family on phantom data
Train a U-net model for heatmap regression
Possibly: post-processing
Analysis of the Deep Learning models
Evaluate the effect of various pre-processing parameterizations on the marker detector
Evaluate and compare the performance of implemented algorithms on both phantom and
clinical data

[1] Gregory A Roth, George A Mensah, Catherine O Johnson, Giovanni Addolorato, Enrico Ammi-
rati, et al. Global burden of cardiovascular diseases and risk factors, 1990–2019: update from the
GBD 2019 study. Journal of the American College of Cardiology, 76(25):2982–3021, 2020.
[2] Javaid Iqbal, Julian Gunn, and Patrick W Serruys. Coronary stents: historical development,
current status and future directions. British Medical Bulletin, 106(1), 2013.
[3] Ardit Ramadani, Mai Bui, Thomas Wendler, Heribert Schunkert, Peter Ewert, et al. A survey of
catheter tracking concepts and methodologies. Medical Image Analysis, page 102584, 2022.
[4] Vincent Bismuth, R ́egis Vaillant, Fran ̧cois Funck, Niels Guillard, and Laurent Najman. A com-
prehensive study of stent visualization enhancement in X-ray images by image processing means.
Medical Image Analysis, 15(4):565–576, 2011.
[5] Robert A Close, Craig K Abbey, and James Stuart Whiting. Improved image guidance of coronary
stent deployment. In Medical Imaging 2000: Image Display and Visualization, volume 3976, pages
301–304. SPIE, 2000.
[6] Kyle McBeath, Krishnaraj Rathod, Matthew Cadd, Anne-Marie Beirne, Oliver Guttmann, et al.
Use of enhanced stent visualisation compared to angiography alone to guide percutaneous coro-
nary intervention. International Journal of Cardiology, 321:24–29, 2020.
[7] Zhuoling Li, Minghui Dong, Shiping Wen, Xiang Hu, Pan Zhou, et al. CLU-CNNs: Object
detection for medical images. Neurocomputing, 350:53–59, 2019.
[8] Andreas Maier, Christopher Syben, Tobias Lasser, and Christian Riess. A gentle introduction
to deep learning in medical image processing. Zeitschrift f ̈ur Medizinische Physik, 29(2):86–101,
[9] Negar Chabi, Oliver Beuing, Bernhard Preim, and Sylvia Saalfeld. Automatic stent and catheter
marker detection in X-ray fluoroscopy using adaptive thresholding and classification. In Current
Directions in Biomedical Engineering, volume 6. De Gruyter, 2020.
[10] Xiaolu Jiang, Yanqiu Zeng, Shixiao Xiao, Shaojie He, Caizhi Ye, et al. Automatic detection of
coronary metallic stent struts based on YOLOv3 and R-FCN. Computational and Mathematical
Methods in Medicine, 2020, 2020.
[11] Rui-Qi Li, Xiao-Liang Xie, Xiao-Hu Zhou, Shi-Qi Liu, Zhen-Liang Ni, et al. A unified framework
for multi-guidewire endpoint localization in fluoroscopy images. IEEE Transactions on Biomedical
Engineering, 69(4):1406–1416, 2021.
[12] Ina Vernikouskaya, Dagmar Bertsche, Tillman Dahme, and Volker Rasche. Cryo-balloon catheter
localization in X-ray fluoroscopy using U-Net. International Journal of Computer Assisted Radi-
ology and Surgery, 16:1255–1262, 2021.
[13] Rui-Qi Li, Xiao-Liang Xie, Xiao-Hu Zhou, Shi-Qi Liu, Zhen-Liang Ni, et al. Real-time multi-
guidewire endpoint localization in fluoroscopy images. IEEE Transactions on Medical Imaging,
40(8):2002–2014, 2021.
[14] Ahmed G Kotb, Ahmed M Mahmoud, and Muhammad A Rushdi. Template-based balloon-
marker and guidewire detection for coronary stents in cardiac fluoroscopy. In 2022 44th Annual
International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), pages
2199–2202. IEEE, 2022.
[15] Luojie Huang, Yikang Liu, Li Chen, Eric Z Chen, Xiao Chen, et al. Robust landmark-based
stent tracking in X-ray fluoroscopy. In European Conference on Computer Vision, pages 201–216.
Springer, 2022.
[16] Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. You only look once: Unified,
real-time object detection. In Proceedings of the IEEE Conference On Computer Vision and
Pattern Recognition, pages 779–788. IEEE, 2016.
[17] Chien-Yao Wang, Alexey Bochkovskiy, and Hong-Yuan Mark Liao. YOLOv7: Trainable bag-of-
freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recognition, pages 7464–7475. IEEE, 2023

Generalizable X-Ray View Synthesis

CT Material Decomposition with Deep Learning

Creation of a Workflow Troubleshoot Companion for Magnetic Resonance Imaging Systems using Large Language Models

Fitting a 2D to 3D Transformation with Neural Fields for Vessel Unfolding

Thesis Description

“Time is brain” is a frequently used term associated with the fast diagnosis and treatment of cerebrovascular
disease, especially with stroke. In fact, cerebral cell death in the affected areas starts within only a few
minutes after the impairment of the vessel [vB¨ud22]. A distinction is made between ischemic (occlusion of
vessels that results in blood shortage) and hemorrhagic strokes (injury or rupture of a vessel that results in a
bleeding) [Kha+23]. The relevant anatomical structures for this include the circle of Willis (CoW) and the main
surrounding arteries. For the diagnostic process, the tracking of all of these vessels in the volumetric CT data
can be time consuming. In order to improve this process, it would be beneficial to unfold these vessels from the
3D CT data onto a 2D image plane. However, the complex geometry of the CoW and the almost perpendicular
orientation of some vessels to each other make it infeasible for the whole structure to be unfolded properly by
common visualization techniques like the curved planar reformation [Kan+02]. A heuristic mesh-based approach
to solve this problem has been presented with CeVasMap [Ris+23], but the unfolding and merging of all major
vessels creates strong distortions in some areas, especially for the internal carotid artery and the basilar artery.
Instead of a heuristic method, the transformation itself can be seen as an optimization problem, which allows
for a more flexible transformation and a better incorporation of constraints through the loss function.

Neural fields are currently gaining more and more popularity in computer vision due to their ability to
provide a continuous approximation of a field function, which enables sampling at arbitrary points and resolutions
[Ram+22]. The core element is a simple multi-layer perceptron which typically gets coordinates as
input and outputs the respective field value at those coordinates [Xie+22]. Since these values are lying in the
reconstruction domain, a mapping to the sensor domain is needed in order to evaluate the results and compute
the loss [Xie+22]. A few examples for fields that can be described by this are 2D images, 3D shapes or even
full 3D scenes [Xie+22; Mil+20]. Outside of the field of computer vision, neural fields have also gained popularity
in robotics, audio processing, physics and medical imaging [Xie+22]. In the latter, many applications
have emerged, including CT and MRI image reconstruction [Zan+21; Sun+21] and medical image segmentation
[Kha+22]. In addition to that, the implicit deformable image registration model proposed by Wolterink et
al. [Wol+22] fits a 3D deformation vector field for the registration of CT images, which has similarities to the
transformation, that this thesis aims to find. However, in this work the neural field is trained to produce the
3D coordinates that need to be sampled into the respective 2D image coordinates, such that the resulting image
contains the unfolded vessels. The goal is to find a suitable transformation for each sample in the dataset, such
that the neural field is trained to overfit on only one sample, unlike a typical neural network which needs a
large amount of data for training and testing. In order to achieve a visually pleasing training result, different
quality criteria have to be evaluated and incorporated into the loss function. These include multiple constraints
for the transformation, such as diffeomorphism and isometry. Due to the high degrees of freedom for the transformation,
a good initialization has to be found as well. The mentioned constraints and defined loss terms are
iteratively improved to further enhance the quality of the unfolded image. First, the neural field is trained to
unfold one singular vessel, to reduce the complexity of the geometric structure that has to be unfolded. Once
this yields satisfying results, the neural field can be extended to unfold multiple vessels at the same time.

1. Implement a basic neural field architecture to fit a 2D to 3D transformation inspired by Wolterink et
al. [Wol+22]
2. Investigate and tune different loss terms and quality measures for the neural field to unfold phantom
geometries and individual vascular structures
3. Assess and discuss the quality of the resulting images
4. If possible, extend the approach to unfold multiple vessels



[Kan+02] A. Kanitsar, D. Fleischmann, R. Wegenkittl, P. Felkel, and E. Groller. CPR – curved planar reformation.
In IEEE Visualization, 2002. VIS 2002. Pages 37–44, 2002. doi: 10.1109/VISUAL.2002.
[Kha+22] M. O. Khan and Y. Fang. Implicit Neural Representations for Medical Imaging Segmentation. In L.
Wang, Q. Dou, P. T. Fletcher, S. Speidel, and S. Li, editors, Medical Image Computing and Computer
Assisted Intervention – MICCAI 2022, pages 433–443, Cham. Springer Nature Switzerland, 2022.
isbn: 978-3-031-16443-9.
[Kha+23] A. S. Khaku and P. Tadi. Cerebrovascular Disease.
NBK430927/, January 2023.
[Mil+20] B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, and R. Ng. NeRF:
Representing Scenes as Neural Radiance Fields for View Synthesis. In ECCV, 2020.
[Ram+22] S. Ramasinghe and S. Lucey. Beyond Periodicity: Towards a Unifying Framework for Activations
in Coordinate-MLPs. In S. Avidan, G. Brostow, M. Ciss´e, G. M. Farinella, and T. Hassner, editors,
Computer Vision – ECCV 2022, pages 142–158, Cham. Springer Nature Switzerland, 2022. isbn:
[Ris+23] L. Rist, O. Taubmann, H. Ditt, M. S¨uhling, and A. Maier. Flexible Unfolding of Circular Structures
for Rendering Textbook-Style Cerebrovascular Maps. In H. Greenspan, A. Madabhushi, P.
Mousavi, S. Salcudean, J. Duncan, T. Syeda-Mahmood, and R. Taylor, editors, Medical Image
Computing and Computer Assisted Intervention – MICCAI 2023, pages 737–746, Cham. Springer
Nature Switzerland, 2023. isbn: 978-3-031-43904-9.
[Sun+21] Y. Sun, J. Liu, M. Xie, B. Wohlberg, and U. S. Kamilov. CoIL: Coordinate-Based Internal Learning
for Tomographic Imaging. IEEE Transactions on Computational Imaging, 7:1400–1412, 2021. doi:
[vB¨ud22] H. J. von B¨udingen. Was Bedeutet ”Zeit ist Hirn”? https : / / schlaganfallbegleitung . de /
wissen/zeit-ist-hirn, July 2022.
[Wol+22] J. M. Wolterink, J. C. Zwienenberg, and C. Brune. Implicit Neural Representations for Deformable
Image Registration. In E. Konukoglu, B. Menze, A. Venkataraman, C. Baumgartner, Q. Dou, and
S. Albarqouni, editors, Proceedings of The 5th International Conference on Medical Imaging with
Deep Learning, volume 172 of Proceedings of Machine Learning Research, pages 1349–1359. PMLR,
June 2022. url:
[Xie+22] Y. Xie, T. Takikawa, S. Saito, O. Litany, S. Yan, N. Khan, F. Tombari, J. Tompkin, V. Sitzmann,
and S. Sridhar. Neural Fields in Visual Computing and Beyond. Computer Graphics Forum, 2022.
issn: 1467-8659. doi: 10.1111/cgf.14505.
[Zan+21] G. Zang, R. Idoughi, R. Li, P. Wonka, and W. Heidrich. IntraTomo: Self-supervised Learning-based
Tomography via Sinogram Synthesis and Prediction. In 2021 IEEE/CVF International Conference
on Computer Vision (ICCV), pages 1940–1950, 2021. doi: 10.1109/ICCV48922.2021.00197.