Index
Development of a comprehensive SPECT phantom dataset using Monte Carlo Simulation
Background
Single Photon Emission Computed Tomography (SPECT) [1] is a medical imaging technique used
to study the biological function and detection of various diseases in humans and animals. Due
to the low amount of radioactivity typically used in SPECT scans, we have a lot of noise in our
SPECT acquired images, and because it is an inverse problem we do not have an exact ground truth.
For this reason we simulate objects with numerical ground truth, that will be used to create our
simulated dataset. The created dataset can then be used to train a Neural Network, analyze noise,
test multiple reconstruction techniques or evaluate the effects of acquisition geometry.
The objective of this research laboratory is to generate a large dataset of SPECT images, that will
be useful in the applications of deep learning in medical image processing.
Methods
We simulate 100 phantoms with different shapes and properties e.g. attenuation and activity maps.
Simulating simple geometric phantoms such as spheres, cubes and cylinders is the first step of
this research laboratory. In the following step we generate alphabetic letters phantoms. Last we
simulate more realistic physical phantoms like the Shepp-Logan or XCAT phantoms. To simulate
measurements of these phantoms, we use SIMIND, a Monte Carlo based simulation program [2].
SIMIND can describe different scintillation cameras, that can be used to obtain sets of projection
images of the simulated phantom. SIMIND allows the adjustment of different acquisition parameters
e.g. photon energy, number of projections, detector size, energy resolution, allowing the creation of a
comprehensive database of SPECT acquisitions in terms of geometry, and acquisition configuration.
After postprocessing the projection data, we obtain the reconstructed 3D images from the data by
applying iterative reconstruction techniques like Ordered Subset Expectation Maximization (OSEM)
and Ordered Subset Conjugate Gradient Minimization (OSCGM).
Expected Results
At the end of this research laboratory, the student shall have a deeper knowledge of Monte Carlo
Simulation (MCS) and reconstruction for SPECT imaging. Further, the student shall have created
a dataset that will be available for future projects, including denoising, reconstruction and other
image processing related tasks. Additionally, the student shall summarize their findings in a short
report and write a documentation about the database and how to use it.
References
[1] Miles N Wernick and John N Aarsvold. Emission tomography: the fundamentals of PET and
SPECT. Elsevier, 2004.
[2] Michael Ljungberg and Sven-Erik Strand. A monte carlo program for the simulation of scintillation
camera characteristics. Computer methods and programs in biomedicine, 29(4):257–272,
1989.
A Review of Diagnosis Rheumatoid Arthritis, with Evaluating Parameters of Micro-CT Scanner and Laboratory Measurements
Continuous Non-Invasive Blood Pressure Measurement Using 60GHz-Radar – A Feasibility Study
Hypertension – high blood pressure (BP) – is known to be a silent killer. Untreated, it can cause
severe damage to the human’s organs, mainly to the heart and kidneys [5, 6]. BP is
usually classified by using the highest – systolic – and the lowest – diastolic – pressures during one
cardiac cycle [2]. The gold standard for measuring BP remains the oscillometric method,
which is employed in traditional arm-cuffs [4]. This method, however, suffers from extensive
deficiencies: Discomfort leads to unreliable measures [2]. Additionally, it only captures
the static status of the very dynamic arterial BP and thus loses important variation information,
leading to poor time resolution [2, 3, 4, 7] However, there is a strong
need for continuous beat-to-beat BP readings [4], as they are more reliable predictors of
aforementioned cardiovascular risks than single readings [1].
The goal of this master thesis is to show whether it is feasible to use a 60GHz radar device to
continuously estimate BP. Radar is chosen as it has a very small form factor and very low power
consumption – both being favorable characteristics for integrating into a wearable device. The
radar is put into an 3D-printed enclosure which is fastened to the left wrist using a velcro strap. It
is capable of extracting the skin displacement caused by the expansion of the underlying artery,
which is localized using a beamforming algorithm. The extracted skin displacement contains the
pulse waveforms which are used for extracting the BP.
In literature, mainly two methods have been used to design continuous BP devices. One is based
on Pulse-Wave-Velocity, and in that context also Pulse-Transit-Time, the other is based on Pulse-
Wave-Analysis [4]. Since the first method depends on the usage of an electrocardiograph,
this method was not employed in this work, as the goal is to implement a stand-alone solution
which does not require additional devices. Therefore, the second method is implemented.
For that, the extracted skin displacement is split into individual pulse waveforms. Each is used
as input for a support vector machine, that decides whether it is good enough as an input for the
neural network, such that only sufficiently good waveforms are used. Then, 21 distinctive features
are extracted for the individual good waveforms. These features, together with the calibration
parameters gender, age, height and weight, are used as features for a neural network. The network
is then used to predict systolic and diastolic values.
It is expected that some correlation between the skin displacement, captured by the radar, and
the corresponding BP will become apparent, allowing for future research to further improve the
accuracy.
References
[1] D. Buxi, J.-M. Redout´e, and M. R. Yuce. Blood pressure estimation using pulse
transit time from bioimpedance and continuous wave radar. IEEE Transactions on
Biomedical Engineering, 64(4):917–927, 2016.
[2] Y. Kurylyak, F. Lamonaca, and D. Grimaldi. A neural network-based method for
continuous blood pressure estimation from a ppg signal. In 2013 IEEE International
instrumentation and measurement technology conference (I2MTC).
IEEE, 2013.
[3] M. Proenc¸a, G. Bonnier, D. Ferrario, C. Verjus, and M. Lemay. Ppg-based blood
pressure monitoring by pulse wave analysis: calibration parameters are stable for three
months. In 2019 41st Annual International Conference of the IEEE Engineering in
Medicine and Biology Society (EMBC), pages 5560–5563. IEEE, 2019.
[4] J. Sol`a and R. Delgado-Gonzalo. The handbook of cuffless blood pressure monitoring.
Springer. Available online at: https://link. springer. com/book/10, 1007:978–3, 2019.
[5] WHO. Hypertension.World Health Organization. URL: https://www.who.int/
news-room/fact-sheets/detail/hypertension.
[6] X. Xing, Z. Ma, M. Zhang, Y. Zhou, W. Dong, and M. Song. An unobtrusive and
calibration-free blood pressure estimation method using photoplethysmography and
biometrics. Scientific reports, 9(1):1–8, 2019.
[7] Y. Yoon, J. H. Cho, and G. Yoon. Non-constrained blood pressure monitoring using
ecg and ppg for personal healthcare. Journal of medical systems, 33(4):261–266, 2009.
Optimizing the Preprocessing Pipeline for “virtual Dynamic Contrast Enhancement” in Breast MRI
Detection of positions of K-Wires Tips in X-Ray Images using Deep learning
Semi-supervised learning for multi-modal bone segmentation
Since AlexNet won the ImageNet Challenge by a wide margin in 2012, the popularity of deep learning has been steadily increasing. In the last years, a technique that has been especially popular is semantic segmentation, as it is used in self-driving cars and medical image analysis. A big challenge that arises when training neural networks (NN) for this task is the acquisition of adequate segmentation masks, because the labeling has often times to be performed by industry experts and is very time consuming. Resulting from that, solutions circumventing this problem had to be found. A popular solution for this task is semi-supervised learning, where only a certain amount of the data is annotated. This approach has the obvious advantage of reducing the time needed for the data acquisition process, but NNs trained this way still have a worse performance compared to ones that were trained fully-supervised.
A common disease affecting one in three women and one in twelve men is osteoporosis. It’s symptoms include low bone mass and a deterioration of bone tissue, leading to an increased fracture risk. The malady affects especially elderly people and for their protection, providing diagnostic tools and suitable treatments is important [1]. Structures that can be found in the bone include lacunae containing osteocytes and trans-cortical vessels (TCV). Murine and human tibia consists of two parts; the inner trabecular bone and the outer cortical bone, where TVCs can be found. To study them and their importance for the development of osteoporosis, we are trying to automatically segment the cortical bone from the surrounding tissue. Additionally, we will attempt to build a NN for the detection of TVCs and lacunae.
We want to achieve this using a model based on convolutional neural networks (CNN) for semantic segmentation. Similar tasks have already been performed [2], but our approach differs as we try to use as few labels as possible for the training process. Methods we want to incorporate are pre-training and the use of image transformations to make the most out of a limited amount of segmentation masks. If those approaches do not yield the desired results, we will also try to incorporate techniques of weakly- and self-supervised learning.
In detail, the thesis will consist of the following parts:
• implementation of multiple CNN-based architectures [3][4] to find a suitable model for our task,
• optimization of this model using different approaches,
• evaluation of the usefulness of pre-training and different semi-supervised learning techniques,
• integration of different techniques to increase the accuracy
References
[1] S P Tuck and R M Francis. Osteoporosis. Postgraduate Medical Journal, 78(923):526–532, 2002.
[2] Oliver Aust, Mareike Thies, DanielaWeidner, FabianWagner, Sabrina Pechmann, Leonid Mill, Darja Andreev, Ippei Miyagawa, Gerhard Kronke, Silke Christiansen, Stefan Uderhardt, Andreas Maier, and Anika Gruneboom. Tibia cortical bone segmentation in micro-ct and x-ray microscopy data using a single neural network. In Klaus Maier-Hein, Thomas M. Deserno, Heinz Handels, Andreas Maier, Christoph Palm, and Thomas Tolxdorff, editors, Bildverarbeitung fur die Medizin 2022 , pages 333–338, Wiesbaden, 2022. Springer Fachmedien Wiesbaden.
[3] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical image segmentation. CoRR, abs/1505.04597, 2015.
[4] Jonathan Long, Evan Shelhamer, and Trevor Darrell. Fully convolutional networks for semantic segmentation. CoRR, abs/1411.4038, 2014.
[5] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. Pytorch: An imperative style, high-performance deep learning library. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alche-Buc, E. Fox, and R. Garnett, editors, ´ Advances in Neural Information Processing Systems 32, pages 8024–8035. Curran Associates, Inc., 2019
Ground Truth based Convolution Kernel Initialization Method for Medical Image Segmentation
Thesis Description
Proper initialization of convolution kernel is crucial for a highly optimized deep learning neural network [1]. A popular way to instantiate these kernels is random assignment of weights [2]. It follows a gaussian distribution pattern with a mean value of 0 and standard deviation of 1. Despite being easy to implement in a neural network it has quite a few downsides like not finding the global optima or slowing the training process down. As a further improvement to random assignment Xavier Glorot et al. proposed “Xavier initialization value (Xe)” [3] for convolution kernels. This method follows an uniform distribution with a 0 mean and a variance of 1/n where n is the total number of input neurons. Although, training process is faster with increased convergence speed, the derivation process of Xe initialization is based on the assumption that the activation function is linear, which is not the case for popular activation functions such as Rectified Linear Unit (Relu). To mitigate this issue Kaiming He et al. proposed He initialization [4] targeted more toward Relu activation function. He uses a gaussian uniform distribution of 0 mean and a variance of 2/n. All of the above initialization techniques for convolution kernels are based on independent initialization of kernel weights, not taking into account the already available data of training samples. The kernel weights are trained in such a way that these randomly generated values are tried to be matched against the local pattern of the images. In every iteration, the trainer tries to minimize the error between the kernel weights and the local features, which leads further to convergence. As this is a probability event, so it takes quite a lot of iteration after which the convolution kernels can have better match with the local features. This translates into slow down of network, with a larger training time and longer convergence rate. Different methods for initializing the convolution kernel have taken this issue into account. OrthoNorm is another method that uses orthogonal matrix for kernel initialization. It can successfully be used in non-linear networks as well unlike random assignment [5]. There is also “Layer sequence unit variance (LSUV)” method which takes the orthogonal initialization to the iterative process. It uses singular value decomposition SVD to replace the weights initiated with gaussian noise [6]. In 2014 Tsung-HanChan et al. proposed a Principal Component Analysis (PCA) based method for convolution kernel initialization [7]. The model gets all image patches from a feature map and initializes
the convolution kernel by calculating the principal components of image patches. This thesis aims to further improve the PCA based kernel initialization method by incorporating
ground truth GT images. GT images are already labeled and can be used to find suitable feature sets. Leveraging the dominant features from these sets and using them as convolution kernel weights, a dependency between training images and convolution kernels is created. It could theoretically decrease the training time and improve overall convergence rate [1]. Extensive benchmarking of the proposed initialization method along with other quantitative measures needs to be taken into account
while developing the system which is also included in the scope of this thesis. To achieve the goals of the thesis work, already existing tools and libraries such as, Pytorch Lightning(
www.pytorchlightning.ai), Monai(monai.io),Weights and Biases(wandb.ai), python(www.python.org) and notable python scientific packages shall be used and re-used where possible.
The thesis will comprise the following work items:
Literature overview of improved convolution kernel initialization method
Design and formalization of the system to be developed
Overview and explanation of the algorithms used
System development including code implementation
Quantitative evaluation of the implemented system on medical image data
References
[1] Chunyu Xu and Hong Wang. Research on a convolution kernel initialization method for speeding
up the convergence of cnn. Applied Sciences, 12:633, 01 2022.
[2] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional
neural networks. In F. Pereira, C.J. Burges, L. Bottou, and K.Q. Weinberger, editors,
Advances in Neural Information Processing Systems, volume 25. Curran Associates, Inc., 2012.
[3] Xavier Glorot and Yoshua Bengio. Understanding the difficulty of training deep feedforward
neural networks. In Yee Whye Teh and D. Mike Titterington, editors, AISTATS, volume 9 of
JMLR Proceedings, pages 249–256. JMLR.org, 2010.
[4] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Delving deep into rectifiers: Surpassing
human-level performance on imagenet classification, 2015.
[5] Andrew M. Saxe, James L. McClelland, and Surya Ganguli. Exact solutions to the nonlinear
dynamics of learning in deep linear neural networks, 2013.
[6] Dmytro Mishkin and Jiri Matas. All you need is a good init, 2015.
[7] Tsung-Han Chan, Kui Jia, Shenghua Gao, Jiwen Lu, Zinan Zeng, and Yi Ma. PCANet: A
simple deep learning baseline for image classification? IEEE Transactions on Image Processing,
24(12):5017–5032, dec 2015.
Evaluation of a Modified U-Net with Dropout and a Multi-Task Model for Glacier Calving Front Segmentation
Evaluation of a Modified U-Net with Dropout and a Multi-Task Model for Glacier Calving Front Segmentation
With global temperatures rising, the tracking and prediction of glacier changes become more and more relevant. Part of these efforts is the development of Neural Network Algorithms to automatically detect calving fronts of marine-terminating glaciers. Gourmelon et al. [1] introduce the first publicly available benchmark dataset for calving front delineation on synthetic aperture radar (SAR) imagery dubbed CaFFe. The dataset consists of the SAR imagery and two corresponding labels: one showing the calving front vs the background and the other showing different landscape regions. Moreover, the paper provides two deep learning models as baselines, one for each label. As there are many different approaches to calving front delineation the question of what method provides the best performance arises. Subsequently, the aim of this thesis is to evaluate the codes of the following two papers [2],[3] on the CaFFe benchmark dataset and compare their performance with the baselines provided by Gourmelon et al. [1].
- paper 1:
Mohajerani et al. [2] employs a Convolutional Neural Network (CNN) with a modified U-Net architecture that also incorporates additional dropout layers. The CNN uses, in contrast to Gourmelon et al. [1], optical imagery as its input.
- paper 2:
Heidler et al. [3] introduces a deep learning model for coastline detection, which combines the two tasks of segmenting water and land and binary coastline delineation into one cohesive multi-task deep learning model.
To make a fair and reasonable comparison, the hyperparameters of each model will be optimized on the CaFFe benchmark dataset and the model weights will be re-trained on CaFFe’s train set. The evaluation will be conducted on the provided test set and the metrics introduced in Gourmelon et al. [1] will be used for the comparison.
References
[1] Gourmelon, N.; Seehaus, T.; Braun, M.; Maier, A.; and Christlein, V.: Calving Fronts and Where to Find Them: A Benchmark Dataset and Methodology for Automatic Glacier Calving Front Extraction from SAR Imagery, Earth Syst. Sci. Data Discuss. [preprint]. 2022, https://doi.org/10.5194/essd-2022-139, in review.
[2] Mohajerani, Y.; Wood, M.; Velicogna, I.; and Rignot, E.: Detection of Glacier Calving Margins with Convolutional Neural Networks: A Case Study, Remote Sens. 2019, 11, 74. https://doi.org/10.3390/rs11010074
[3] Heidler, K.; Mou, L.; Baumhoer, C.; Dietz A.; and Zhu, X.: HED-UNet: Combined Segmentation and Edge Detection for Monitoring the Antarctic Coastline, IEEE Transactions on Geoscience and Remote Sensing. 2022, vol. 60, 1-14, Art no. 4300514, doi: 10.1109/TGRS.2021.3064606.
Animal-Independent Signal Enhancement Using Deep Learning
Examining and segmenting bioacoustic signals is an essential part of biology. For example, by analysing orca calls it is possible to draw several conclusions regarding the animals’ communication and social behavior.1 However, to avoid having to manually go through hours of audio material to detect those calls, the so-called ORCA-SPOT toolkit was developed, which uses Deep Learning to separate relevant signals from pure ambient sounds.2 These may nevertheless still contain background noise, which makes the examination rather difficult. To remove this background noise, ORCA-CLEAN was developed. Again, using a Deep Learning approach by adapting the Noise2Noise concept, as well as using machine-generated binary masks as an additional attention mechanism, the orca calls are denoised as best as possible without requiring clean data as foundation.3
But, as mentioned, this toolkit is optimized for the denoising of orca calls. Marine biologists are of course not the only ones who require clean audio signals for their research. Ornithologists alone deal with a great variety of different noise. One that studies urban bird species would like city sounds to be filtered from his audio samples, whereas one who works with tropical birds rather wants his recordings clean from forest noise. One could argue that almost every biologist who analyses recordings of animal calls would have use for a denoising tool kit.
Another task where audio denoising is of great relevance is when interpreting and processing human speech. It can be used to improve the sound quality of a phone call or a video conference, to preprocess a voice command to a virtual assistant on a smartphone, to improve voice recognition software and many other examples. Even in medicine it can help when analysing pulmonary auscultative signals, which is the key method to detect and evaluate respiratory dysfunctions.4
It therefore makes sense to generalize ORCA-CLEAN and to make it trainable for other animal sounds, perhaps even human speech, or body noises. One would then have a generalized version of ORCA-CLEAN, which can then be trained according to the desired purpose. The goal of this thesis will be to describe and explain the respective changes in the code, as well as to evaluate how differently trained models perform on audio recordings of different animals. The transfer from a model specialized on orcas to one specialized on another animal species will be demonstrated using recordings of hyraxes. The data used contains tapes of 34 hyrax individuals. For each individual there are multiple tapes available, and for each tape there is a corresponding table containing information like the exact location, the length, the peak frequency or the call type for each call on the tape.
The hyrax is a small hoofed mammal of the family of Procaviidae.5, 6 They usually weigh 4 to 5 kg, are about 30 to 50 cm long and are mostly herbivorous.5 Their calls, especially the advertisement calls, are helpful for distinguishing different hyrax species and for analysing the animals’ behaviour.6
Here are a few rough approaches how I would realize this thesis. I would begin by modifying the ORCA-CLEAN code. Since orca calls very much differ from hyrax ones in terms of frequency range as well as in length, the prepocessing of the audio tapes would have to be modified. I would also like to add some more input/output spectrogram variants to the training process.
One could use pairs of noisy and denoised human speech samples for example, or a pure noise spectrogram versus a completely empty one. The probability with which each of these variants is chosen could additionally be made variable.
After that, I would train different models with hyrax audio tapes, including original ORCA-CLEAN as well as an the newly created adaptions, and evaluate their performance. Since the provided hyrax tapes aren’t all equally noisy, they can be sorted by the so-called Signal to Noise Ratio (SNR). One can then compare these values before and after denoising, e.g., by correlating them, and check if the files were denoised correctly or if relevant parts were removed.
With help of these results further alterations can be made, for example by changing the probabilities of the training methods or by adapting the hyperparameters of the deep network, until hopefully in the end, a suitable network which doesn’t require huge amount of data is the result.
I hope I was able to give some insight into what I imagine the subject to be, and how I would roughly execute it.
Sources
[1] https://lme.tf.fau.de/person/bergler/#collapse_0
[2] C. Bergler, H. Schröter, R. X. Cheng, V. Barth, M. Weber, E. Nöth, H. Hofer, and A. Maier, “ORCA-SPOT: An Automatic Killer Whale Sound Detection Toolkit Using Deep Learning” Scientific Reports, vol. 9, 12 2019.
[3] C. Bergler, M. Schmitt, A. Maier, S. Smeele, V. Barth, and E. Nöth, “ORCA-CLEAN: A Deep Denoising Toolkit for Killer Whale Communication“ Interspeech 2020 (pp. 1136-1140). International Speech Communication Association.
[4] F. Jin and F. Sattar “Enhancement of Recorded Respiratory Sound Using Signal Processing Techniques“ In A. Cartelli, M. Palma (Eds.) “Encyclopedia of Information Communication Technology” (pp. 291-300), 2009.
[5] https://www.britannica.com/animal/hyrax
[6] https://www.wildsolutions.nl/vocal-profiles/hyrax-vocalizations/