Index
Advanced Model Architectures for Interactive Segmentation and Segmentation Enhancement in CT Images
Thesis Description
Cerebrovascular accidents are a world disease with a severe impact on patients and healthcare systems. Approximately 15 million people suffer an ischemic stroke each year worldwide [1, 2]. More detailed information about the condition of arterial vessels can play a critical role in both preventing stroke and improving stroke therapy [1, 3, 4].
Since about one third of patients die from the consequences of a stroke, it is of great interest to detect indications of cerebrovascular diseases as quickly and as efficiently as possible, enabling to intervene in time or even to take preventive measures [1, 4]. Currently, however, vascular imaging in clinical routine is primarily assessed by visual-qualitative means only. The technical difficulties in extracting cerebral arteries and quantifying their parameters have prevented this data from becoming part of routine clinical practice [1, 5].
Image segmentation in general remains challenging for many applications. In particular, advanced implementations such as ischemic infarct tissue segmentation require highly accurate results to ensure optimal patient care and treatment [6, 7]. Thus, if at all, segmentation of cerebral vessels to date are predominantly performed manually or semi-manually. Since manual vessel segmentation is time consuming, research has focused on developing faster and more general automatic vessel segmentation methods [1, 5].
In recent years, deep learning techniques have demonstrated to be a very useful approach to this problem, as they can, unlike traditional threshold approaches, incorporate spatial information into their predictions [8, 9]. Therefore, the current development trend is shifting away from the rule-based methods proposed in previous decades, such as vessel intensity distributions, geometric models and vessel extraction methods [10, 11]. Although most rule-based approaches such as midline tracing, active contour models, or region growth use various vessel image features for reconstruction [12, 10], they are either hand-crafted or insufficiently validated [11, 10]. Therefore, it is difficult to achieve the desired level of robustness in vessel segmentation, and none of the proposed methods has found widespread application in the clinical setting or in research [5].
However, even deep learning methods that have shown to be particularly powerful and adaptable have their specific drawbacks, as they demand a large amount of training data [13, 14]. Providing this data is challenging, because it usually contains sensitive personal data and therefore is not publicly available [15, 16, 17]. In addition, successful deep segmentation also requires ground truth data which is, as discussed earlier, both extremely time-consuming and thus costly to create [1, 5].
Recently, several alternative strategies to circumvent this lack of commentary have been explored. For example, methods for semi-supervised semantic segmentation have been successfully developed, based on the generative adversarial network (GAN) approach [17, 14, 18]. Subsequent work has further improved this approach by explicitly accounting particular issues, such as domain shift, during translation and utilizing contrastive learning for translating unpaired images [19, 20].
In addition, pretraining algorithms have emerged that promise to improve performance by preparing the model in an unsupervised manner. This is referred to as self-supervised learning. Its popularity can be traced back to well-known pretraining networks like [21, 22, 23, 24]. These networks are able to incorporate unlabeled samples into the training and thus make use of the entirety of the datasets despite the lack of annotations, ultimately increasing model performance [21, 22, 17].
An alternative approach eliminating this shortage of clinical annotations might involve accelerating the time consuming manual segmentation process. The idea of using deep learning methods to optimize this process has recently become more popular [25, 26, 27]. These interactive segmentations can be used not only for the creation of annotations, but also for the improvement of already existing ones. In doing so, a segmentation can be created in a first step and optimized in subsequent steps either automatically, interactively or manually. These changes are then automatically applied to the entire vessel, saving valuable time [25, 26].
For the reasons stated above, this work aims to investigate whether advanced model architectures can be successfully used for semi-supervised and unsupervised image segmentation, with the overall goal of improving deep vessel segmentation and will conduct an in-depth examination of the potential of pretraining methodologies to increase model performance. This work will investigate whether interactive segmentation might be applied in the medical field and how it can be integrated into the clinical workflow to reduce annotational workload.
- Literature overview of the current state of the art and collection of frameworks
- Pretraining methods
- Interactive segmentation strategies
- Expanding the current state of the art for carotid artery segmentation
- Utilizing semi-supervised contrastive learning mechanisms
- Enabling interactive segmentation
- Systematic analysis and evaluation of the developed deep learning approaches
References
[1] Michelle Livne, Jana Rieger, Orhun Utku Aydin, Abdel Aziz Taha, Ela Marie Akay, Tabea Kossen, Jan Sobesky, John D Kelleher, Kristian Hildebrand, Dietmar Frey, et al. A u-net deep learning framework for
high performance vessel segmentation in patients with cerebrovascular disease. Frontiers in neuroscience, 13:97, 2019.
[2] Walter Johnson, Oyere Onuma, Mayowa Owolabi, and Sonal Sachdev. Stroke: a global response is needed. Bulletin of the World Health Organization, 94(9):634, 2016.
[3] Jason D Hinman, Natalia S Rost, Thomas W Leung, Joan Montaner, Keith W Muir, Scott Brown, Juan F Arenillas, Edward Feldmann, and David S Liebeskind. Principles of precision medicine in stroke. Journal
of Neurology, Neurosurgery & Psychiatry, 88(1):54–61, 2017.
[4] James C Grotta, Gregory W Albers, Joseph P Broderick, Scott E Kasner, Eng H Lo, Ralph L Sacco, Lawrence KS Wong, and Arthur L Day. Stroke E-Book: Pathophysiology, Diagnosis, and Management.
Elsevier Health Sciences, 2021.
[5] Renzo Phellan, Alan Peixinho, Alexandre Falc˜ao, and Nils D Forkert. Vascular segmentation in tof mra images of the brain using a deep convolutional neural network. In Intravascular Imaging and Computer
Assisted Stenting, and Large-Scale Annotation of Biomedical Data and Expert Label Synthesis, pages 39–46. Springer, 2017.
[6] Maryam Rastgarpour and Jamshid Shanbehzadeh. The problems, applications and growing interest in automatic segmentation of medical images from the year 2000 till 2011. International Journal of Computer Theory and Engineering, 5(1):1, 2013.
[7] Richard Szeliski. Computer vision: algorithms and applications. Springer Science & Business Media, 2010.
[8] Jonathan Long, Evan Shelhamer, and Trevor Darrell. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3431–3440, 2015.
[9] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention, pages 234–241. Springer, 2015.
[10] David Lesage, Elsa D Angelini, Isabelle Bloch, and Gareth Funka-Lea. A review of 3d vessel lumen segmentation techniques: Models, features and extraction schemes. Medical image analysis, 13(6):819–845, 2009.
[11] Fengjun Zhao, Yanrong Chen, Yuqing Hou, and Xiaowei He. Segmentation of blood vessels using rule-based and machine-learning-based methods: a review. Multimedia Systems, 25(2):109–118, 2019.
[12] Yun Tian, Qingli Chen, Wei Wang, Yu Peng, Qingjun Wang, Fuqing Duan, Zhongke Wu, and Mingquan Zhou. A vessel active contour model for vascular segmentation. BioMed research international, 2014, 2014.
[13] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep learning. MIT press, 2016.
[14] Wei-Chih Hung, Yi-Hsuan Tsai, Yan-Ting Liou, Yen-Yu Lin, and Ming-Hsuan Yang. Adversarial learning for semi-supervised semantic segmentation. arXiv preprint arXiv:1802.07934, 2018.
[15] Brett K Beaulieu-Jones, Zhiwei Steven Wu, Chris Williams, Ran Lee, Sanjeev P Bhavnani, James Brian Byrd, and Casey S Greene. Privacy-preserving generative deep neural networks support clinical data
sharing. Circulation: Cardiovascular Quality and Outcomes, 12(7):e005122, 2019.
[16] Omer Tene and Jules Polonetsky. Big data for all: Privacy and user control in the age of analytics. Nw. J. Tech. & Intell. Prop., 11:xxvii, 2012.
[17] Nima Tajbakhsh, Laura Jeyaseelan, Qian Li, Jeffrey N Chiang, Zhihao Wu, and Xiaowei Ding. Embracing imperfect datasets: A review of deep learning solutions for medical image segmentation. Medical Image Analysis, 63:101693, 2020.
[18] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. Advances in neural information processing
systems, 27, 2014.
[19] Yawei Luo, Liang Zheng, Tao Guan, Junqing Yu, and Yi Yang. Taking a closer look at domain shift: Category-level adversaries for semantics consistent domain adaptation. In Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recognition, pages 2507–2516, 2019.
[20] Taesung Park, Alexei A Efros, Richard Zhang, and Jun-Yan Zhu. Contrastive learning for unpaired imageto-image translation. In European Conference on Computer Vision, pages 319–345. Springer, 2020.
[21] Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. A simple framework for contrastive learning of visual representations. In International conference on machine learning, pages 1597–
1607. PMLR, 2020.
[22] Jean-Bastien Grill, Florian Strub, Florent Altch´e, Corentin Tallec, Pierre H Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Avila Pires, Zhaohan Daniel Guo, Mohammad Gheshlaghi Azar, et al. Bootstrap your own latent: A new approach to self-supervised learning. arXiv preprint arXiv:2006.07733, 2020.
[23] Xinlei Chen and Kaiming He. Exploring simple siamese representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15750–15758, 2021.
[24] Mahmoud Assran, Mathilde Caron, Ishan Misra, Piotr Bojanowski, Armand Joulin, Nicolas Ballas, and Michael Rabbat. Semi-supervised learning of visual features by non-parametrically predicting view assignments with support samples. arXiv preprint arXiv:2104.13963, 2021.
[25] Sabarinath Mahadevan, Paul Voigtlaender, and Bastian Leibe. Iteratively trained interactive segmentation. In British Machine Vision Conference (BMVC), 2018.
[26] Konstantin Sofiiuk, Ilia Petrov, and Anton Konushin. Reviving iterative training with mask guidance for interactive segmentation. arXiv preprint arXiv:2102.06583, 2021.
[27] Xiangde Luo, Guotai Wang, Tao Song, Jingyang Zhang, Michael Aertsen, Jan Deprest, Sebastien Ourselin, Tom Vercauteren, and Shaoting Zhang. Mideepseg: Minimally interactive segmentation of unseen objects from medical images using deep learning. Medical Image Analysis, 72:102102, 2021.
Learning-based reduction of non-significant changes in subtraction volumes
Computed Tomography (CT) is a diagnostic tool that allows doctors or radiologists to visualize the internal morphology of the body. Radiologists compare CT studies to identify tumors, infections, blood clots, and to assess the response to treatment. To identify changed features, radiologists visually compare the current with a prior study. They align both studies while scrolling through the images and switch between the acquisitions to identify relevant changes.
Overlaying the current study with a color-coded confidence mask indicative of changes is a helpful tool to mark areas with potential changes. To compute such a mask, a registration of both datasets is required. Here, inaccurate registrations can introduce misalignments, which will be marked as tissue changes but are not of clinical relevance. Such misalignments can cause shadow-like effects at tissue boundaries, which can obscure pathologically relevant features. Another source of non-significant changes is related to different acquisition parameters, resulting in salt and pepper noise.
The goal of this master thesis is to train a deep learning model to detect and remove non-significant changes. Generative Adversarial Networks (GANs) have shown promising results on image processing tasks with no or limited ground truth data available. GANs consist of two models, Generator, and Discriminator that by design learn the distribution of the training data. The generator model generates fake data to be fed to the discriminator which aims to identify fake examples. With this adversarial training method we aim to leverage the image quality of the difference images and, hence, enable easier identification of non-significant changes by the physician.
Character Height Estimation in Historical Document Images
During past decades, the field of Document Image Analysis and Recognition (DIAR) has been the subject of many researches due to its wide range of applications. DIAR can be applied to either printed or handwritten, textual or graphical document images with the purpose of automatically analyzing their contents in order to retrieve useful information [1, 2]. The applications of DIAR arise in different fields such as the storage and indexing of cultural heritage by analyzing historical manuscripts. Text detection and recognition in imagery are two key components of most techniques in DIAR [3, 4]. Since the existing methods for text detection rely on texture estimation [5] or edge detection [6] as stated by Wolf et al. [7], the text characteristics may affect the document analysis. For this reason, text recognition pipelines typically resize text lines to a specific height which is the one they were trained
on.
In this thesis, the influence of the text height on document analysis is investigated. Document resizing
to a specific text height will be inserted as first step of several DIAR methods for running experiments. The thesis consists of the following milestones:
• Producing a data set with text height labeled for a sufficient amount of ancient books and
manuscripts [8, 9].
• Developing a system which detects text in the documents and resizes it to a predetermined height
in pixels.
• Running various experiments to determine whether this improves the results of different DIAR
methods.
[1] Deepika Ghai and Neelu Jain. Text extraction from document images-a review. International Journal of Computer Applications, 84(3), 2013.
[2] Vikas Yadav and Nicolas Ragot. Text extraction in document images: Highlight on using corner points. In 2016 12th IAPR Workshop on Document Analysis Systems (DAS), pages 281–286, 2016.
[3] Xinyu Zhou, Cong Yao, He Wen, Yuzhi Wang, Shuchang Zhou, Weiran He, and Jiajun Liang. East: an efficient and accurate scene text detector. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 5551–5560, 2017.
[4] Adam Coates, Blake Carpenter, Carl Case, Sanjeev Satheesh, Bipin Suresh, Tao Wang, David J Wu, and Andrew Y Ng. Text detection and character recognition in scene images with unsupervised feature learning. In 2011 International Conference on Document Analysis and Recognition, pages 440–445. IEEE, 2011.
[5] Bangalore S Manjunath and Wei-Ying Ma. Texture features for browsing and retrieval of image data. IEEE Transactions on pattern analysis and machine intelligence, 18(8):837–842, 1996.
[6] Chung-Ching Chen et al. Fast boundary detection: A generalization and a new algorithm. IEEE Transactions on computers, 100(10):988–998, 1977.
[7] Christian Wolf, Jean-Michel Jolion, and LIRIS INSA de Lyon. Model based text detection in images and videos: a learning approach. Laboratoire dInfoRmatique en Images et Systemes dinformation, Palmas, TO, 2004.
[8] Vincent Christlein, Anguelos Nicolaou, Mathias Seuret, Dominique Stutzmann, and Andreas Maier. Icdar 2019 competition on image retrieval for historical handwritten documents. In 2019 International Conference on Document Analysis and Recognition (ICDAR), pages 1505–1509. IEEE, 2019.
[9] https://lme.tf.fau.de/competitions/icdar-2021-competition-on-historical-document-classification.
Investigating the class-imbalance problem using deep learning techniques on real industry printed circuit board data
Investigating the class-imbalance problem using deep learning
techniques on real industry printed circuit board data
Quality control tasks in industry provide the ideal environment for the application of machine learning
due to large volumes of machine-generated data. However, some of the collected data is heavily
unbalanced or even unlabelled, since labelling the data is very labour- and cost-intensive. The objective
of this work is to investigate and apply deep learning methods to address this problem. For this
purpose, a real industry printed circuit board data set will be utilized, which is provided by Continental
corporation.
The first part of this work is a literature review in order to investigate available methods to overcome the
mentioned class-imbalance problem. The emphasis of this review is set to three different subsections:
The first group of methods will deal with synthetic over-sampling, which is a mechanism to enforce
the generation of data points in the convex hull of the intended underrepresented classes. A concrete
application to achieve this would be the Polarity-GAN approach proposed in [1]. In order to make use
of unlabelled data, semi-supervised learning approaches are examined next. The starting point for this
will be the work of Hyun et al. [2], who looked into available semi-supervised deep learning methods
for class-imbalances. The last subsection will deal with classical deep learning methods addressing the
class-imbalance problem, such as the Focal Loss [3].
After a detailed review, the investigated methods will be implemented and applied to the real industry
use case. For this purpose, the data pre-processing and sampling will be fixed to ensure reproducibility
across all experiments. Furthermore, the baseline against which all experiment results are compared
will be a ResNet50 architecture [4]. With a fixed framing and baseline, the performance of all acquired
methods will be evaluated using the real industry data. In addition, a possibility will be sought to
combine the various methods in such a way that the classification performance will become more
robust.
The thesis consists of the following milestones:
• Literature review to acquire possible methods regarding the class-imbalance problem
• Implement fixed machine learning pipeline to ensure reproducible experiments
• Apply found methods to the fixed framing and evaluate performance
• Evaluate performance against a ResNet50
The implementation will be done in Python with the help of PyTorch [5].
References
[1] Kumari Deepshikha and Anugunj Naman. Removing class imbalance using polarity-gan: An uncertainty
sampling approach, 2020.
[2] Minsung Hyun, Jisoo Jeong, and Nojun Kwak. Class-imbalanced semi-supervised learning. CoRR,
abs/2002.06815, 2020.
[3] Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Doll´ar. Focal loss for dense object
detection, 2018.
[4] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition,
2015.
[5] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen,
Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary
DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and
Soumith Chintala. Pytorch: An imperative style, high-performance deep learning library. In H. Wallach,
H. Larochelle, A. Beygelzimer, F. d’Alch´e-Buc, E. Fox, and R. Garnett, editors, Advances in Neural
Information Processing Systems 32, pages 8024–8035. Curran Associates, Inc., 2019.
Digitization of Handwritten Rey Osterrieth Complex Figure Test Score Sheets
The Rey Osterrieth Complex Figure Test (ROCF) is a neuropsychological test to detect cognitive
impairments.
As the scoring is mostly implemented by hand from experts the goal is to automate the ROCF by
means of machine learning.
The whole project consists of four milestones:
1. State-of-the-art literature research
2. Development of an OCR-based algorithm to digitize the handwritten score sheet into machine
readable structured format for training an automatic algorithm
3. Development of a deep learning algorithm for automatic scoring ROFCs based on the 36-point
scoring system
4. Evaluation of the algorithm based on the data and publication of the results
This thesis will mainly examine the first two steps.
The used scoring sheets consist of an identical structure and just the score itself is handwritten.
Therefore only digits have to be recognized.
The idea is to use networks already trained on the MNIST database (e.g. [1], [2], [3]) and to gain the
best outcome performance for the described issue.
Therefore some preprocessing of the scanned scoring sheets such as detecting areas of interest, binari-
zation or rotation will be necessary to match the requirements for input data of the specific algorithms
as well as for improving performance.
Other options for preprocessing could be template matching or taking advantage of the HU-moments
[4]. Hereby text detection, i.e. finding areas of interests, is one of the typically performed steps in any
text processing pipeline [5].
Furthermore modifying algorithms and weights will be used to achieve different outcomes which than
can be compared in relation to their performances.
The implementation should be done in Python.
References
[1] Gargi Jha. Mnist handwritten digit recognition using neural network, Sep 2020.
[2] Muhammad Ardi. Simple neural network on mnist handwritten digit dataset, Sep 2020.
[3] Dan Claudiu Ciresan, Ueli Meier, Luca Maria Gambardella, and Jürgen Schmidhuber. Deep big simple
neural nets excel on handwritten digit recognition. CoRR, abs/1003.0358, 2010.
[4] Zengshi Chen, Emmanuel Lopez-Neri, Snezana Zekovich, and Milan Tuba. Hu moments based handwritten
digits recognition algorithm. In Recent advances in knowledge engineering and systems science: Proceedings
of the 12TH international conference on artificial intelligence, knowledge engineering and data bases, page
98–104. WSEAS Press, 2013.
[5] Simon Hofmann, Martin Gropp, David Bernecker, Christopher Pollin, Andreas Maier, and Vincent Christlein.
Vesselness for text detection in historical document images. In 2016 IEEE International Conference on
Image Processing (ICIP), pages 3259–3263, 2016.
Glioma Growth Prediction Using Reaction-Diffusion Modelling and Machine Learning
Fully Automated Classification of Anatomical Variants of the Coronary Arteries from Cardiac Computed Tomography Angiography
Radiomics, Delta- and Dose-Radiomics in brain metastases
Writer Identification using Transformer-based Deep Neural Networks
Deep learning-based respiratory navigation for abdominal MRI
In Magnetic Resonance Imaging (MRI) of the abdomen, breathing motion
and cardiac motion are the main confounding factors introducing
artifacts and causing diminished image quality. Different strategies to
minimize the susceptibility to (breathing) motion-related artifacts have
been developed over the last decades, the most routinely used ones being
breath-held acquisitions, retrospective gating, and prospective
triggering. Breath-held techniques are sampling efficient but may not be
applicable in seriously ill patients and pediatric patients. In
addition, MRI techniques such as 3D high-resolution Magnetic Resonance
Cholangiopancreatography (MRCP) require parameter sets making it extremely
difficult to perform the exam in a single breath-hold or multi
breath-hold fashion. Under the assumption that breathing patterns are
stable and regular, triggered acquisition schemes aim to acquire data in
certain states/sectors of the breathing cycle, typically during the
relatively stable end-expiratory phase. This technique is less sampling
efficient and has additional challenges in irregular breathers.
The aim of this thesis is to analyze existing techniques for breathing
trigger point detection and breathing pattern analysis and to explore if
neural networks are suitable to derive optimal trigger points, adapt
triggering schemes to changing patient conditions, and investigate
whether breathing irregularities can be predicted from previous
breathing cycles.