Index
Terahertz Image Reconstruction for Historical Document Analysis
Network Deconvolution as Sparse Representations for Medical Image Analysis
Scene Evolution on Polarimetric Radar Data in Automated Driving Scenarios
Description
The task of autonomous driving demands vehicles to create a precise understanding of their surroundings. Currently, camera, LiDAR, and radar sensors are used to perceive the vehicle’s environment. While conventional automotive radar sensors have the advantage of instantly detecting velocities and are less prone to disturbances by difficult weather conditions than the aforementioned alternatives, they also offer a lower resolution and therefore less information for classification and tracking tasks. [1]
To compensate for this weakness, the usage of newly available polarimetric radar sensors for the automotive domain is researched. Polarimetric radar sensors emit and detect radar waves with different polarizations. The analysis of polarization changes adds additional information about reflection patterns, allowing, inter alia, estimation of vehicle orientation and extend in traffic scenarios. [2]
Prior research also indicates improvements in classification tasks using polarimetric radar data of stationary targets in test areas [3] and stationary and moving traffic participants in real-world urban scenarios [4]. In this work, polarimetric radar data is used to track and predict the evolution of the vehicle’s environment over time.
The thesis consists of the following milestones:
- Implement and compare optical flow methods to create correlation images from polarimetric and conventional radar imagery
- Implement and compare track generation methods using the correlation images
- Implement and compare track prediction methods using the generated tracks
- Evaluate advantages of polarimetric radar information
References
[3] Tristan Visentin.Polarimetric Radar for Automotive Applications, volume 90 ofKarlsruher Forschungs-berichte aus dem Institut f ̈ur Hochfrequenztechnik und Elektronik. KIT Scientific Publishing, Karlsruhe,Baden, 2019.
[4] J. F. Tilly, F. Weishaupt, O. Schumann, J. Dickmann, and G. Wanielik. Road user classification with polarimetric radars. In2020 17th European Radar Conference (EuRAD), pages 112–115, 2021.
Detection of In-plane Rotation of Extremities on X-ray Images
Thesis Description
A core goal in a medical imaging pipeline is to optimize the workflow for improving the patient throughput at a radiography system. By limiting manual tasks in a workflow, we can efficiently optimize the pipeline [1]. A common manual task in a X-ray radiography workflow is to manually rotate digital X-ray images to a canonical orientation preferred by radiologists. This has an impact on the number of patients that can be analyzed in a given period of time. A Deep Learning based detection system for the in-plane rotation of body parts in X-ray images can solve this problem, but it is still an open research topic. We identified three major challenges that such automatic systems need to address: First, in clinical routine there are up to 23 different examinations, consisting of 13 anatomies (e.g. hand, chest, etc) and 4 projection types (e.g. posterior-anterior (PA), anterior-posterior (AP), lateral, and oblique), makes this task very diverse. Second, computation time must be as small as possible and third, a high alignment accuracy with respect to the canonical orientation is needed [2]. A simulation estimates that technologists at a medium to large sized hospital spend nearly 20 hours, or 3 working days a year, doing 70,000+ manual clicks to rotate chest images on portable x-ray machines. With an Artificial Intelligence (AI) algorithm being 99.4% accurate, it is estimated that the 19.59 hours of manual ”clicks” would be reduced to 7 minutes a year, and the 70,512 clicks to 423 clicks respectively [3]. This shows that a deep learning based AI system has the potential to significantly improve the overall workflow in X-ray radiography.
To the best of our knowledge, this is the first work that shows to detect the in-plane rotation of the extremities of the body in x-ray images. Several methods were published on automatic orientation detection of a single anatomy, for e.g. chest [3, 4, 5]. However, most of these approaches focus on orienting the x-ray images into 4 sectors e.g. 0◦, 90◦, 180◦, 270◦and not a precise orientation prediction for the full angular range of 0◦– 360◦. Baltruschat et al. proposed a transfer learning approach with ResNet architecture for precise orientation regression in hand radiographs, achieving state-of-the-art performance with a mean absolute angle error of 2.79◦ [2]. Luo et al. addressed the orientation correction for radiographs in PACS environment by using well-defined low-level visual features from the anatomical region with a SVM classifier, achieving 96.1% accuracy [6]. The idea of estimating the hand orientation in probability density form by Kondo et al., solves the cyclicity problem in direct angular representation and uses multiple predictions based on different features [7]. Kausch et al. proposed a Convolutional Neural Network (CNN) regression model that predicts 5◦of freedom pose updates directly from an initial X-ray image [8]. Here, they used a two-step approach (coarse CNN regressor and fine CNN regressor) to detect the orientation of the anatomy.
This thesis aims to develop a framework for the detection of the in-plane rotation of the extremities of the human body in a single 2D X-ray image using deep learning algorithms. Based on this information, the image shall be subsequently rotated to a predefined orientation based on the anatomy instead of the detector orientation (with respect to the X-ray source). This is especially important with portable Wireless Fidelity (WiFi) detectors, where the original orientation of the anatomy w.r.t. the detector plane can theoretically take on any angular value. In this work, we will initially focus on hands and fingers (also partially visible hands), but other extremities can also be taken into account at later point in time. In detail, the thesis will comprise the following work items:
- Literature overview of the state-of-the-art regression models for the detection of the body part orientation
- Survey for the optimal canonical orientation of each projection of the X-ray image
- Implementation of a deep learning based method with direct learning of the orientation Comparing and evaluating the performance of the deep learning models based on specific projection vs. combined projections and specific anatomy vs. combined anatomies.
- Visualizing the features learned by the model in each approach
- Quantitative evaluation on real-world data
References
[1] Paolo Russo. Handbook of X-ray imaging: physics and technology. CRC press, 2017.
[2] Ivo M Baltruschat, Axel Saalbach, Mattias P Heinrich, Hannes Nickisch, and Sascha Jockel. Orientation regression in hand radiographs: a transfer learning approach. In Medical Imaging 2018: Image Processing, volume 10574, page 105741W. International Society for Optics and Photonics, 2018.
[3] Khaled Younis, Min Zhang, Najib Akram, German Vera, Katelyn Nye, Gireesha Rao, Gopal Avinash, and John M. Sabol. Leveraging deep learning artificial intelligence in detecting the orientation of chest x-ray images. 09 2019.
[4] Ewa Pietka and HK Huang. Orientation correction for chest images. Journal of Digital Imaging, 5(3):185– 189, 1992.
[5] Hideo Nose, Yasushi Unno, Masayuki Koike, and Junji Shiraishi. A simple method for identifying image orientation of chest radiographs by use of the center of gravity of the image. Radiological physics and technology, 5(2):207–212, 2012.
[6] Hui Luo and Jiebo Luo. Robust online orientation correction for radiographs in pacs environments. IEEE transactions on medical imaging, 25(10):1370–1379, 2006.
[7] Kazuaki Kondo, Daisuke Deguchi, and Atsushi Shimada. Hand orientation estimation in probability density form. arXiv preprint arXiv:1906.04952, 2019.
[8] Lisa Kausch, Sarina Thomas, Holger Kunze, Maxim Privalov, Sven Vetter, Jochen Franke, Andreas H Mahnken, Lena Maier-Hein, and Klaus Maier-Hein. Toward automatic c-arm positioning for standard projections in orthopedic surgery. International Journal of Computer Assisted Radiology and Surgery, 15(7):1095–1105, 2020.
Analysis of Deep Learning Methods for Re-identification on Chest Radiographs
Two-Dimensional-Dwell-Time Analysis of Ion-Channel Kinetics using Deep Learning
In this project, we want to explore the capability of neural networks to infer kinetics of electrophysiological time series with Markov models. Patch-Clamp recordings of single ion channels provide a wealth of information on the functional properties of proteins that by far outperform macroscopic measurements. However, modelling of single-channel data is still a major challenge not only because it is very time-consuming. Likely, the most sophisticated way to relate kinetics and protein function is to utilize hidden Markov models (Huth et al., 2008). Scientists have developed several different methods for this purpose (Sakmann and Neher, 1995). All these methods share at least some of the following disadvantages. They require specific assumptions, specific corrections dependent on the time series, are sensitive to noise (Huth et al., 2006), are limited to the bandwidth of the recording system (Huth et al., 2006; Qin, 2014) or do not provide statistics to estimate how well they have approximated the data.
We have developed a 2D-Fit algorithm with simulations and have improved it over the years (Huth et al., 2006). The algorithm is based on the idealization of time series and the generation of two-dimensional-dwell-time-distribution from neighboring events. To a certain extent, it does not share afore mentioned limitations and it has some unique features that makes it superior to other tools. It does capture gating kinetics with a high background of noise and can extract rate constants even beyond the recording bandwidth. That could make the 2D-Fit exceptional valuable for relating electrophysiology kinetics with data from simulations of single protein molecules. In addition, 2D-distributions preserve the coherency of connected states. Thereby the algorithm can extract the full complexity of underlying models and distinguish different Markov models. However, the computational requirements are enormous, repeatedly for each time series that is analyzed. Neural networks have the reverse approach. Once datasets are generated and the networks are trained, time series could be analyzed in real time during experiments. It has to be determined whether deep networks are capable of outperforming the powerful simulation approach.
The basic aim of this thesis is to analyze two-dimensional-dwell-time-histograms with neuronal networks, a task of image analysis, to extract the underlying kinetics of Markov models. In parallel, another master student will explore the direct analysis of time series. Both approaches have to our knowledge not yet been investigated for Patch-Clamp data. It will be very interesting to compare the results of both approaches.
In the first part of the project, the objective is to generate training datasets with the 2D-Fit algorithm (already implemented) and to deploy networks capable to analyze simple Markov models (preliminary results are available for a 3-state model). The master student will evaluate the capabilities of the networks related to bandwidth and noise of time series. The next step will be to find strategies to increase the number of states of the underlying models that the network is able to distinguish. Finally, and not directly related, the capabilities of networks to distinguish different Markov models will be explored. We expect that networks could really excel in this task of pattern recognition.
Sources
Huth T, Schmidtmayer J, Alzheimer C, Hansen UP (2008) Four-mode gating model of fast inactivation of sodium channel Nav1.2a. Pflugers Archiv European Journal of Physiology 457:103–119.
Huth T, Schroeder I, Hansen U-P (2006) The power of two-dimensional dwell-time analysis for model discrimination, temporal resolution, multichannel analysis and level detection. Journal of Membrane Biology 214:19–32.
Qin F (2014) Principles of single-channel kinetic analysis. Methods in Molecular Biology 1183:371–399.
Sakmann B, Neher E (1995) Single-Channel Recordings (Sakmann B, Neher E, eds)., 2nd ed. New York and Lodon: Plenum Press.
DeepTechnome – Mitigating Bias Related to Image Formation in Deep Learning Based Assessment of CT Images
Multi-task Learning for Historical Document Classification with Transformers
Description
As of recent, transformer models[1] have started to outperform the classic deep convolutional neural networks in many classic computer vision tasks. These transformer models consist of multi-headed self-attention layers followed by linear layers. The former layer soft-routes value information based on three matrix embeddings: query, key and value. The inner product of query and key are input into a softmax function for normalization and the resulting similarity matrix is multiplied with the value embedding. Multi-headed self-attention creates multiple sets of query, key and value matrices that are independently computed, then concatenated and projected into the original embedding dimension. Visual transformers excel in their ability to incorporate non-local information into their latent representation, allowing for better results when classification relevant information is scattered across the entire image.
The downside of pure attention models like ViT [2], which treat image-patches as sequence-tokens, is the requirement of lots of samples to make up for their lack of inductive priors. This makes them unsuitable for low-data regimes like historical document analysis. Further, the computation of the similarity matrix leads to a matrix quadratic in input length, complicating high-resolution computations.
One solution promising to alleviate the data hunger of transformers while still profiting from their global representation ability, is the usage of hybrid methods that combine CNN and self-attention layers. Those models jointly train a network comprised of a number of convolutional layers to preprocess and downsample inputs, followed by a form of multi-headed self-attention. [3] differentiates hybrid self-attention models into “transformer blocks” and “non-local blocks”, the latter of which is equivalent to single-headed self-attention sans the lack of value embeddings and positional encodings.
The objective of this thesis is the classification of script type, date and location of historical documents, using a single multi-headed hybrid self-attention model.
The thesis consists of the following milestones:
- Construction of hybrid models for classification
- Benchmarking on the ICDAR 2021 competition dataset
- Further architectural analyses of hybrid self-attention models
References
[2] Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Un-terthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and NeilHoulsby. An image is worth 16×16 words: Transformers for image recognition at scale. InInternationalConference on Learning Representations, 2021.
[3] Aravind Srinivas, Tsung-Yi Lin, Niki Parmar, Jonathon Shlens, Pieter Abbeel, and Ashish Vaswani. Bottleneck transformers for visual recognition, 2021.
3D Segmentation of metal objects based on Cone-Beam CT Projection Images for Metal Artefact Removal
Computer Tomography (CT) imaging from intraoperative mobile C-Arms is commonly used to validate tool and implant placement during surgery. As a majority of tools and implants are composed of metal, physical effects such as beam hardening, photon scattering, and high absorption induce artefacts in the volume domain. These Metal Artefacts arise from a loss of signal in the projection images which is not accounted for in standard reconstruction algorithms. Metal Artefact Reduction (MAR) techniques rely on an accurate segmentation of the metal volume.[1], [2] This first segmentation step is commonly based on thresholding the volume domain which makes it prone to errors induced by metal artefacts. This thesis investigates an end-to-end trainable segmentation model which produces 3D-metal masks from 2D projection data of a 3D Cone Beam Scan of a Cios Spin System. The robustness against metal artefacts shall be evaluated and compared to common volume-domain metal segmentation approaches.
Low Dose Helical CBCT denoising by domain filtering with deep reinforcement learning improved by Neural Ordinary Differential Equations approach
In previous research, we have developed a method, based on reinforcement learning, to denoise cone-beam CT. This method involved the use of denoisers in both the sinogram and the reconstructed image domain. The denoisers are bilateral filters with the sigma parameters tuned by a convolutional agent.
Recent research has shown the use of neural ODEs to improve the speed of convergence of neural network training. Neural ODEs have been applied to tasks which can be modelled by differential equations, such as fluid mechanics. They have also been expanded to cover classical deep learning tasks, such as image segmentation.
In this thesis we aim to complete the following tasks:
- Experiment with different recon kernels (B40, B70 etc.) to observe the effect of sharpness dependent noise.
- Implement neural ODE to speed up reinforcement learning convergence, and also reduce parameter count.
- Implement data consistent reward to ensure correct reconstruction and data consistent denoising.
- Experiment with deep learned quality metrics as additional reward functions for parameter tuning
As a dataset, we will use the Mayo Clinic TCIA dataset for testing the quality of our denoising algorithms. Quality can be compared with standard dose images using PSNR and SSIM, and can be calculated reference-free using the IRQM. If time permits, we can use deep model observers to assess low contrast preservation.
Requirements:
- Knowledge of CT reconstruction techniques. Knowledge of the ASTRA toolbox is a plus.
- Understanding of reinforcement learning
- Experience with PyTorch for developing neural networks
- Experience with image processing