Index

Mutual Information-Based Segmentation for Unseen Domain Generalization in Digital Pathology

The introduction of automated slide scanners has facilitated the digitization of histopathological samples, enhancing the capabilities of traditional light microscopy by allowing the use of automated image analysis algorithms. Machine learning algorithms have demonstrated great potential in this regard by extrapolating learned characteristics from annotated datasets to unseen data, thus providing valuable assistance to pathologists in their diagnostic work. The performance of these models, however, can be significantly degraded by variations in image characteristics, including differences in scanners used for image acquisition, staining methods, resolution, illumination, and artifacts [1, 2]. These challenges highlight the difficulty of applying trained models across environments, necessitating domain adaption techniques.

Previous studies have already addressed color inconsistencies in histological samples, with calibration slides being one approach to resolving scanner-dependent variations [3]. Further notable pre-processing (-)/ training (⋆) techniques include:
– Data augmentation to simulate variability in the input data (e.g. domain-, spatial transformations) [4,5]
– Image-level domain adaption to align visual features across domains, mitigating distributional discrepancies, e.g. stain normalization to reduce inter-sample/ inter-scanner color variation [5,6]
– Multi-scale processing to capture features at different resolutions [2]
⋆ Heterogeneous dataset training to improve model generalization across multiple sources [7]
⋆ Transfer learning to utilize pre-trained models which is ideal for sparsely annotated data [2]
⋆ Domain-invariant feature learning to ensure robustness to scanner and staining variability [8,9], and in particular adversarial training to reinforce robustness against domain shifts [2]
⋆ Disentangled feature learning to isolate distinct underlying factors of data variations, compelling the network to learn shared statistical components across different domains [5]

This thesis investigates the applicability of a mutual information-based method for feature disentanglement [5] for cross-domain tumor segmentation in histopathology samples. By separating anatomical features from domain-specific variations, we aim for robust scanner-invariant segmentation performance. The objective is to enhance the generalizability of the network and enable direct application to unseen domains without adaptation.

The proposed work comprises the following work items:
– Literature review of device-induced variations in microscopy image data and state-of-the-art methods to address them
– Conceptualization and adaptation of mutual information-based segmentation [5] to address generalization for unseen domains in microscopy image data
– Exploration of targeted augmentation methods for addressing domain shifts in histopathology (e.g. stain augmentation [6])
– Exploration of suitable metrics for evaluating cross-domain generalization performance
– Documentation and presentation of the findings, documentation of code

[1] F. Wilm, M. Fragoso, C. A. Bertram, N. Stathonikos, M. Öttl, J. Qiu, R. Klopfleisch, A. Maier, K. Breininger, and M. Aubreville, “Multi-scanner canine cutaneous squamous cell carcinoma histopathology dataset,” in Bildverarbeitung für die Medizin 2023: Proceedings, German Workshop on Medical Image Computing, Braunschweig, July 2-4, 2023 (T. M. Deserno, H. Handels, A. Maier, K. Maier-Hein, C. Palm, and T. Tolxdorff, eds.), Informatik aktuell, Wiesbaden: Springer Fachmedien Wiesbaden, 2023.
[2] C. L. Srinidhi, O. Ciga, and A. L. Martel, “Deep neural network models for computational histopathology: A survey,” Medical Image Analysis, vol. 67, p. 101813, Jan. 2021.
[3] X. Ji, R. Salmon, N. Mulliqi, U. Khan, Y. Wang, A. Blilie, B. G. Pedersen, K. D. Sørensen, B. P. Ulhøi, R. Kjosavik, E. A. M. Janssen, M. Rantalainen, L. Egevad, P. Ruusuvuori, M. Eklund, and K. Kartasalo, “Physical Color Calibration of Digital Pathology Scanners for Robust Artificial Intelligence Assisted Cancer Diagnosis.”
[4] M. Balkenhol, N. Karssemeijer, G. J. S. Litjens, J. Van Der Laak, F. Ciompi, and D. Tellez, “H&E stain augmentation improves generalization of convolutional networks for histopathological mitosis detection,” in Medical Imaging 2018: Digital Pathology (M. N. Gurcan and J. E. Tomaszewski, eds.), (Houston, United States), p. 34, SPIE, Mar. 2018.
[5] Y. Bi, Z. Jiang, R. Clarenbach, R. Ghotbi, A. Karlas, and N. Navab, “MI-SegNet: Mutual Information-Based US Segmentation for Unseen Domain Generalization,” Feb. 2024. arXiv:2303.12649.
[6] M. Macenko, M. Niethammer, J. S. Marron, D. Borland, J. T. Woosley, Xiaojun Guan, C. Schmitt, and N. E. Thomas, “A method for normalizing histology slides for quantitative analysis,” in 2009 IEEE International Symposium on Biomedical Imaging: From Nano to Macro, (Boston, MA, USA), pp. 1107–1110, IEEE, June 2009.
[7] M. Aubreville, F. Wilm, N. Stathonikos, K. Breininger, T. A. Donovan, S. Jabari, M. Veta, J. Ganz, J. Ammeling, P. J. Van Diest, R. Klopfleisch, and C. A. Bertram, “A comprehensive multi-domain dataset for mitotic figure detection,” Scientific Data, vol. 10, p. 484, July 2023.
[8] A. Moyes, “A Novel Method For Unsupervised Scanner-Invariance With DCAE Model.”
[9] M. W. Lafarge, J. P. W. Pluim, K. A. J. Eppenhof, P. Moeskops, and M. Veta, “Domain-adversarial neural networks to address the appearance variability of histopathology images,” 2017. arXiv:1707.06183.

Modernizing and Extending miRNexpander: A Web-Based Interface for Network Expansion of Molecular Interactions in Biomedical Research

RPA-Bots zur Prozessautomatisierung im Workflow Management der DATEV eG

Context-Aware Emotion Recognition from Pictures using Frozen CLIP

ThesisProposalVinzenzDewor

Evaluation of SHViT for volumetric Semantic Segmentation in Industrial CT Scans

Industrial computed tomography (iCT) is a widely applied tool in non-destructive testing, material analysis, quality control, and metrology. Semantic segmentation of industrial CT data plays a central role in these applications by enabling quality inspection, material differentiation and part separation [1]. While convolutional neural networks (CNNs) have traditionally performed well in segmentation tasks by capturing local structures, their limited ability to model long-range dependencies poses challenges in complex 3D datasets.

Transformer-based models have recently emerged as promising alternatives. By dividing the input into patches and using self-attention mechanisms, transformers can model global dependencies. However, early vision transformers had difficulties capturing spatial structure and learning from limited data. The Swin Transformer was one of the first models to address these issues by introducing a hierarchical structure and shifted windows, combined with an inductive bias that improves generalization on small datasets [2].

Despite these advances, transformers remain resource intensive. New models such as the Shifted-window Hierarchical Vision Transformer (SHViT) aim to reduce computational costs while maintaining performance. SHViT extends the Swin architecture and improves spatial modeling and efficiency through a refined hierarchical structure with shifted windows [3].

This thesis focuses on the implementation and evaluation of a volumetric SHViT model for 3D semantic segmentation. The model is tested on a real-world dataset of industrial CT scans of boxed shoes, which includes several segmentation tasks: separating the shoes from their surroundings and identifying individual components such as the insole, outsole, and upper [4]. Typically for industrial CT data, the dataset is limited in size. Yet, its structural variability makes it an interesting benchmark for assessing model generalization. As evaluation metric for the class imbalanced segmentation dataset primarily the F₁-score is used. The network is also evaluated in terms of memory and computational resource use.

The SHViT model will be compared to a CNN-based baseline, evaluating accuracy, robustness, and computational efficiency in the context of 3D industrial segmentation. While the study aims to inform the selection of neural architectures for iCT applications, its conclusions are limited using a single dataset. Nonetheless, SHViT shows potential for broader use in iCT, as it could enable the efficient application of transformer-based models to volumetric segmentation across diverse industrial datasets.

Literature

[1]	S. a. G. P. a. V. P. a. D. W. Bellens, „Machine learning in industrial X-ray computed tomography–a review,“ CIRP Journal of Manufacturing Science and Technology, pp. 324–341, 2024.
[2]	Z. a. L. Y. a. C. Y. a. H. H. a. W. Y. a. Z. Z. a. L. S. a. G. B. Liu, „Swin Transformer: Hierarchical Vision Transformer using Shifted Windows,“ in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, CA, 2021.
[3]	S. a. R. Y. Yun, „SHViT: Single-Head Vision Transformer with Memory Efficient Macro Design,“ Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5756–5767, 2024.
[4]	M. Leipert, G. Herl, J. Stebani, S. Zabler und A. Maier, „Three Step Volumetric Segmentation for Automated Shoe Fitting,“ e-Journal of Nondestructive Testing, Bd. 28, Nr. 3, 2023.

Automated calibration of the scan trajectory in dedicated breast CT with circle-spiral trajectory

Video-based pose and distance estimation of an excavator bucket

Automatic Speech Recognition at Phoneme and Word-Level To Analyze Parkinson’s Disease

A Comparative Study of Transformer-Based Models and CNNs for Semantic Segmentation in Industrial CT Scans

Industrial computed tomography (iCT) is extensively utilized for non-destructive testing, material analysis, quality control, and metrology. In these applications, semantic segmentation is crucial, particularly for material analysis [1]. In recent years, Convolutional Neural Networks (CNNs) have been employed successfully for material segmentation, handling low-quality reconstructions, and performing complex segmentations where local context is vital. However, CNNs often struggle to capture long-range dependencies due to their localized nature.

Recently, transformer architectures have shown superior performance in various segmentation tasks. Unlike CNNs, which rely on filter banks to extract local features, transformers encode image patches into visual tokens and utilize self-attention mechanisms to process the entire input in parallel. This allows transformers to effectively capture long-distance relationships, though they may be less efficient when it comes to preserving fine local details [2]. In the context of three-dimensional (3D) data segmentation, both CNNs and transformers face challenges due to the increased memory requirements and the higher complexity of patterns that arise in 3D space.

One promising model for addressing these challenges is the Swin Transformer, which incorporates a hierarchical structure with shifted windows, enabling it to capture both local and global dependencies more efficiently [3]. To tackle the limitations of CNNs and transformers individually, hybrid models combining both architectures have been proposed. For instance, Cai et al. introduced a model that combines the Swin Transformer with CNNs for 3D segmentation tasks, taking advantage of each architecture’s strengths [2]. Both CNNs and Swin Transformers are known for their ability to generalize well on smaller datasets, thanks to their inductive bias [3] [4].

This thesis focuses on applying a hybrid approach combining the Swin Transformer and CNNs to a complex dataset of iCT scans of shoes, where the objective is to segment the shoes into their individual components, as demonstrated in previous work by Leipert et al. [5]. Despite the dataset’s small size, its high intrinsic variability and the relevance of both local and global dependencies make it an ideal candidate for evaluating segmentation methods. The segmentation will be performed in 3D, highlighting the challenges and opportunities of using these advanced models.

Through a comparative analysis of CNNs, the Swin Transformer, and their combined approaches, this thesis aims to provide insights into the strengths and limitations of each method in the context of 3D semantic segmentation on complex industrial CT datasets. The findings will contribute to improving segmentation techniques for iCT applications, potentially enhancing the accuracy and efficiency of material analysis in industrial contexts. The main limitation of the study is the application to a single dataset.

Literaturverzeichnis

[1]	S. Bellens, P. Guerrero, P. Vandewalle and W. Dewulf, “Machine learning in industrial X-ray computed tomography – a review,” CIRP Journal of Manufacturing Science and Technology, vol. 51, pp. 324-341, 2024.
[2]	Y. Cai, Y. Long, Z. Han, M. Liu, Y. Zheng, W. Yang and L. Chen, “Swin Unet3D: a three-dimensional medical image segmentation network combining vision transformer and convolution,” BMC Medical Informatics and Decision Making, vol. 23, p. 33, 2023.
[3]	Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin and B. Guo, “Swin Transformer: Hierarchical Vision Transformer using Shifted Windows,” in IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 2021.
[4]	Y. Z. J. Z. D. Z. R. Y. Y. X. Xingwei He, “Swin Transformer Embedding UNet for Remote Sensing Image Semantic Segmentation,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1-15, 2022.
[5]	M. Leipert, G. Herl, J. Stebani, S. Zabler and A. K. Maier, “Three Step Volumetric Segmentation for Automated Shoe Fitting,” in 12th Conference on Industrial Computed Tomography (iCT) 2023, Fürth, Germany, 2023.

EcoScapes: LLM-powered advice for crafting sustainable cities

Climate adaptation is vital for the sustainability and sometimes the mere survival of our urban
areas [1, chapters TS.C.8 and TS.D.1]. However, small cities often struggle with limited personnel
resources and integrating vast amounts of data from multiple sources for a comprehensive analysis
[1, chapter TS.D.1.3]. Moreover, the complexity of the topic can overwhelm administrative staff and
local politicians alike. To overcome these challenges, this thesis proposes a multi-layered system
combining specialized Large Language Models (LLMs), satellite imagery and a knowledge base to aid
in developing effective climate adaptation strategies.
Initially, the system uses provided location information to request relevant satellite imagery, which can
be used by all subsequent components.
The architecture’s modular core encompasses several LLMs and expert systems that examine the
satellite data to offer insights on different climate adaptation aspects. Examples of potential functions
might include, but are not limited to, the identification of heat islands, areas threatened by flooding, or
the assessment of vegetation cover.
In the last step, the system consolidates the findings from the preceding modules to generate a
comprehensive report on the existing situation and recommend potential adaptation strategies.
In order to assess the system’s performance, we will compare the generated outputs with those of
unaltered LLMs and a model inspired by ChatClimate [2].

References

[1] P¨ortner, H.-O., D.C. Roberts, H. Adams. et al. 2022: Technical Summary. [H.-O. P¨ortner, D.C. Roberts,
E.S. Poloczanska, K. Mintenbeck, M. Tignor, A. Alegr´ıa, M. Craig, S. Langsdorf, S. L¨oschke, V. M¨oller,
A. Okem (eds.)]. In: Climate Change 2022: Impacts, Adaptation, and Vulnerability. Contribution of
Working Group II to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change
[H.-O. P¨ortner, D.C. Roberts, M. Tignor, E.S. Poloczanska, K. Mintenbeck, A. Alegr´ıa, M. Craig, S.
Langsdorf, S. L¨oschke, V. M¨oller, A. Okem, B. Rama (eds.)]. Cambridge University Press, Cambridge,
UK and New York, NY, USA, pp. 37-118, doi:10.1017/9781009325844.002.

[2] Vaghefi, S.A., Stammbach, D., Muccione, V. et al. ChatClimate: Grounding conversational AI in climate
science. Commun Earth Environ 4, 480 (2023). https://doi.org/10.1038/s43247-023-01084-x