Index

reAction: Automatic Speech Recognition in German Automotive Domain

Deep Learning for Cancer Patient Survival Prediction Using 2D Portrait Photos Based on StyleGAN Embedding

Risk Classification of Brain Metastases via Deep Learning Radiomics

Simulation of Spike Artifact Obstructed MR Images for Machine Learning Methods

Automated Scoring of Rey-Osterrieth Complex Figure Test Using Deep Learning

Novel View Synthesis for Augmentation of Fine-Grained Image Datasets

Current deep-learning-based classification methods require large amounts of data for training, and in certain scenarios such as in the surveillance imaging there is only a limited amount of data. The aim of the research is to generate new training images of vehicles with the same characteristics as the training data but from novel view points and investigate its suitability for fine-grained  classification of vehicles.

Generative models such as generative adversarial networks (GANs) [1] allow for customization of images. However, adjusting the perspective through methods such as conditional GANs for unsupervised image-to-image translation has proven to be particularly difficult [1]. Methods such as StyleGANs [2] or neural radiance fields (NeRFs) [3] are relevant approaches to generate images with different styles and perspectives.
StyleGAN is an extension to the GAN architecture that proposes changes to the generator model such as the introduction of a mapping network. The mapping network generates intermediate latent codes which are transformed into styles that is integrated at each point in the generator network. It also includes a progressive growing approach for training generator models capable of synthesizing very large high-quality images.
NeRF can generate novel views of complex 3D scenes based on a partial set of 2D images. It is trained to directly map from spatial location and viewing direction (5D input) to opacity and color, using volume rendering [4] to render new views.

The thesis consists of the following milestones:

  • Literature review on the state-of-the-art approaches for GAN- and neural radiance fields-based
    image synthesis
  • Adoption of existing GAN- and neural radiance fields-based image synthesis methods to generate
    car images using different styles and camera poses [5]
  • Experimental evaluation and comparison of different image synthesis methods
  • Investigate the suitability of the generated images for fine-grained vehicle classification using
    different classification methods [6], [7]

The implementation will be done in Python.

References
[1] Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio, “Generative Adversarial Networks ”, in NIPS, 2014
[2] Tero Karras, Samuli Laine, Timo Aila, “A Style-Based Generator Architecture for Generative Adversarial Networks ”, in proceedings of the IEEE/CVF Conference on CVPR, 2019
[3] Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, Ren Ng, “NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis ”, in ECCV 2020
[4] Robert A. Drebin, Loren Carpenter, Pat Hanrahan, “Volume Rendering ”, in Proceedings of SIGGRAPH 1988
[5] Jiatao Gu, Lingjie Liu, Peng Wang, Christian Theobalt, “StyleNeRF: A Style-based 3D-Aware Generator for High-resolution Image Synthesis ”, in ICLR 2022
[6] Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, Baining Guo, “Swin Transformer: Hierarchical Vision Transformer using Shifted Windows ”, in IEEE/CVF conference on ICCV, 2021
[7] Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, “Deep residual learning for image recognition ”, Proceedings of the IEEE conference on CVPR, 2016

Modelling of the breast during the mammography examination

Metal-conscious Transformer Enhanced CBCT Projection Inpainting

Computed tomography device (CT) is a means of tomographic imaging technology, and it has been developed
rapidly. Due to beam hardening effect, metallic artifacts occur and degrade the quality of CT images. Metal
artifacts have been the focus and difficulty in the field of CT imaging research because of their direct impact on
clinical diagnosis and the diversity of manifestations and causes [1]. In order to reconstruct metal-free CT
images, the inpainting task is an essential part.
The traditional method of inpainting replaces the metal-affected region of the projected data by interpolation
[2][3]. Recently, deep convolutional networks (CNNs) have shown strong potential in all computer vision tasks,
including image inpainting. Several approaches have been proposed for image restoration using CNN based
encoder-decoder network. Shift-Net based on U-Net architecture is one of these approaches, which has good
restoration accuracy in structure and texture [4]. Zeng et al. [5] built a pyramidal-context architecture called
PEN-NET for high-quality image inpainting. Liao et al. [6] proposed a new generative mask pyramid network
to reduce for CT/CBCT Metal Artifact Reduction. Although CNNs have many advantages, their field of
perception is usually small and not conducive to capturing global features. On the contrary, Vision Transformer
(ViT) uses attention to model long-term dependencies among image patches. The shifted window Transformer
(Swin Transformer) is proposed to adapt to the high resolution of images in vision tasks [8], taking into account
the translational invariance of CNNs, the perceptual field and the hierarchical relationship.
To overcome the shortage of medical image data and the domain shift problem in the field of deep learning, this
research is based on simulated X-ray images using ViT as the encoder and CNN as the decoder for image
inpainting. In order to further improve the inpainting performance, some variants of the backbone network are
considered, such as using Swin Transformer instead of ViT and adding the adversarial loss.
The paper will include the following points:
• Literature review in inpainting and metal artifacts reduction.
• Traditional method and CNN based model implementation.
• ViT-based model construction; parameter optimization and incorporation with adversarial loss; results
evaluation.
• Thesis writing.

References
[1] Netto, C., Mansur, N., Tazegul, T., Lalevee, M., Lee, H., Behrens, A., Lintz, F., Godoy-Santos, A., Dibbern,
K., Anderson, D. Implant Related Artifact Around Metallic and Bio-Integrative Screws: A CT Scan 3D
Hounsfield Unit Assessment. Foot & Ankle Orthopaedics. 7, 2473011421S00174 (2022)
[2] Kalender WA, Hebel R, Ebersberger J. Reduction of CT artifacts caused by metallic implants. Radiology.
1987 Aug;164(2):576-7. doi: 10.1148/radiology.164.2.3602406. PMID: 3602406.
[3] Meyer E, Raupach R, Lell M, Schmidt B, Kachelriess M. Normalized metal artifact reduction (NMAR) in
computed tomography. Med Phys. 2010 Oct;37(10):5482-93. doi: 10.1118/1.3484090. PMID: 21089784.
[4] Zhaoyi Yan, Xiaoming Li, Mu Li, Wangmeng Zuo, and Shiguang Shan. Shift-net: Image inpainting via
deep feature rearrangement, 2018.
[5] Yanhong Zeng, Jianlong Fu, Hongyang Chao, and Baining Guo. Learning pyramid-context encoder
network for high-quality image inpainting, 2019.
[6] Haofu Liao, Wei-An Lin, Zhimin Huo, Levon Vogelsang, William J Sehnert, S Kevin Zhou, and Jiebo Luo.
Generative mask pyramid network for ct/cbct metal artifact reduction with joint projection-sinogram
correction. In International Conference on Medical Image Computing and Computer-Assisted Intervention,
pages 77–85. Springer, 2019.
[7] Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas
Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. An image is worth
16×16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
[8] Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin
transformer: Hierarchical vision transformer using shifted windows, 2021.

Detection and Classification of Photovoltaic Modules in Electroluminescence Videos

The hippocampus and language: Word to word prediction in terms of the successor representation

The theoretical background of the master thesis is formed by the place and grid cells of the hippocampus, which are responsible for a wide variety of navigation tasks. This ranges from classical spatial navigation in a city or a building to abstract assignments in cognitive rooms, like the maximum speed of a vehicle based on engine power and weight. Since basic place cell firing patterns have already been investigated by machine learning, the thesis will focus on whether this method can also be used to process speech in order to draw conclusions about the involvement of place and grid cells in this domain. For this purpose, the theory of cognitive maps and its mathematical formulation the Successor Representation will be used.
To apply this concept to language, different techniques of Natural Language Processing as well as a neural network will be used. The former are mainly used to provide the training data for the network. These consist of successive pairs of words, one serving as input, the other as output. The goal is to infer the grammatical structure from the word-by-word predictions. To achieve this, several configurations are investigated, with the main focus on processing books that are used as a proxy for valid language data.