Index
Detection of Arterial Occlusion on MRI Angiography of the Lower Limbs using Deep Learning
Proposal Tri NguyenAutomated detection and defect recognition of photovoltaic modules in photoluminescence videos
Automatic Detection of Microorganisms on Microscopic Images of Fluid Samples using Machine Learning
The objective of this thesis is to apply machine learning tools to rapidly analyze large datasets of microscopic images to identify and classify microbial infections.
Microorganisms can cause a wide range of diseases, e.g. tuberculosis, and left untreated, infections can quickly become fatal [1]. Fortunately, the discovery of penicillin and the subsequent invention of other antibiotics has significantly decreased the lethality of these diseases. This early success has led to an era of frequent use of antibiotics [2]. However, overprescription of these drugs has caused bacteria to develop mechanisms that confer resistance against particular drugs. This emergence of multi-drug resistance strains poses an extreme risk [2]. Therefore, these developments necessitate a more targeted application of antibiotics, which, however, requires the classification and characterization of bacteria found in patient samples. Existing methods for classifying microorganisms can be categorized into chemical, physical, molecular biological, and morphological methods [1]. While the latter is positioned to be the most direct and cost-effective method of the four, it requires a high amount of manual work and is thereby laborious and time-consuming [1].
The Weiss group develops and applies technologies for rapidly imaging and analyzing biological samples using high-resolution fluorescence microscopy, microfluidics. In collaboration with the Pattern Recognition Lab of the Friedrich-Alexander-Universität Erlangen-Nürnberg, this toolkit is extended by computer vision components to analyze the image data.
Computer vision tools are well-suited for image classification problems and have been widely applied to microscopy. On relatively pure laboratory samples these tools can perform astonishingly well [1]. However, a single droplet of a fluid from a patient sample is significantly more challenging as it can contain a large array of different types of objects in very large quantities which complicates the detection and classification of single objects. Under the assumption that high quality and high-resolution microscopy images are provided, the key challenges are thus twofold: first to find the objects of interest and second to classify them.
In general, various approaches to solving this problem can be considered:
– Pursuing a supervised approach by establishing a large, bounding box or segment annotated database to train a deep neural net that can process unseen data and extract location and class of objects.
– Following semi-supervised techniques by only establishing a small, annotated database to train the classifier and using uniform or random segments of the input data.
– Processing uniform or random segments unsupervised by clustering them in multiple distinct classes.
In this thesis, we will investigate these strategies to address the problem of automatic object detection in microscopic images of fluid samples.
The thesis comprises the following tasks:
– literature review concerning sources and state-of-the-art approaches to construct classifiers with limited data available
– Implementation of one or more solutions to address the classification problem
– Evaluation of proposed method and emerging challenges
– Documentation and presentation of the findings, documentation of code
– Discussion of progress in weekly meetings with mentors Dr. Lucien Weiss and Frauke Wilm
[1] Zhang, Jinghua, Chen Li, and Marcin Grzegorzek. “Applications of Artificial Neural Networks in Microorganism Image Analysis: A Comprehensive Review from Conventional Multilayer Perceptron to Popular Convolutional Neural Network and Potential Visual Transformer.” arXiv preprint arXiv:2108.00358 (2021).
[2] Casadevall, Arturo. “Crisis in infectious diseases: time for a new paradigm?.” Clinical infectious diseases 23.4 (1996): 790-794.
Synthesizing Art Historical datasets with Pixel-wise Annotations
Analysis of EEG data with machine learning
Heat Demand Forecasting with Multi-Resolutional Representation of Heterogeneous Temporal Ensemble
Accurate forecasting of heat consumption plays an important role in effective management of heating utility, such as, unit commitment, short term maintenance, network’s power flow optimization, etc. Inaccurate forecasting of consumption may lead to increase in operating cost. Over-forecasting leads to unnecessary reserved cost and excess supply. Under forecasted loads result in high expenditures in the peaking unit. Hence it is important for a utility, such as a district heating network to forecast Heat consumption patterns could vary depending upon the external temperature and day of usage, such as holiday or weekend.
The consumption pattern also varies based upon consumer type, operational reasons and consumer activities. This gives rise to the motivation of capturing the load profile feature of a certain consumer to tackle model variance when using ML based forecasters. This means, the forecasting model should capture information (feature) of a user’s usage/consumption pattern from three dimensions: Time, Frequency and Magnitude. The time series of heat consumption consist of several discontinuities or abrupt jumps which may carry important information. Therefore a highly accurate prediction of heat consumption of an end-user could be yielded by incorporating the discontinuities through the approximation of functional non-linearity. Moreover, the consumption pattern also varies based upon consumer type, operational reasons and consumer activities. Therefore in order to ensure the generalizability of the model across different types of end-users the forecasting model should capture features or information about the consumption pattern of different types of end-users. This project also investigates the research question of generalizability of the model through the evaluation of performance quantitatively and qualitatively. The thesis consists of the following aspects:
- Literature review of heat consumption classification in district heating network.
- Analysis and understanding of the heat consumption data from a utility.
- Development of a sophisticated heat consumption forecaster.
- Comprehensive evaluation of the forecasting performance by comparing with existing forecasting models.
References:
[1] Y. Zhao, Y. Shen, Y. Zhu and J. Yao, “Forecasting Wavelet Transformed Time Series with Attentive Neural Networks,” 2018 IEEE International Conference on Data Mining (ICDM), 2018, pp. 1452-1457, doi: 10.1109/ICDM.2018.00201.
[2] Chatterjee, Satyaki and Bayer, Siming and Maier, Andreas K “Prediction of Household-level Heat-Consumption using PSO enhanced SVR Model”, NeurIPS 2021 Workshop on Tackling Climate Change with Machine Learning, 2021, https://www.climatechange.ai/papers/neurips2021/42
[3] Kováč, Szabolcs and Micha’čonok, German and Halenár, Igor and Važan, Pavel,” Comparison of Heat Demand Prediction Using Wavelet Analysis and Neural Network for a District Heating Network”, Special Issue “Artificial Intelligence in the Energy Industry”, https://www.mdpi.com/1996-1073/14/6/1545
Design and Evaluation of Machine Learning Applications for Space Systems
This thesis aims at designing and evaluating two open source and representative machine learning applications for on-board data processing in space systems, as part of the OBPMark-ML benchmarking suite. Recently there is an increased interest for the adoption of machine learning and artificial intelligence methods in on-board processing, as demonstrated in the European Space Agency’s (ESA) Phi-Sat-1 In-Orbit-Demonstration (IOD) mission launched in 2020 [1,2]. In addition, future missions, are expected to rely on machine learning and deep learning methods to offer increased autonomy.
However, it is not clear which hardware architectures should be employed in such future systems. Currently, space systems use simple processors specifically designed for space, which cannot provide the required performance. Therefore, several alternatives are currently investigated as future candidates for space, such as embedded GPUs, FPGAs, custom AI accelerators etc.
However, different devices have significantly different properties, e.g. in terms of number of computations per second, numerical format (integer or floating point), data width (1, 8, 16, 32 or 64 bits), memory requirements etc. which make the comparison and trade-offs of their computation performance and accuracy difficult.
In addition, there is a lack of representative application cases that can be used to perform such comparison. MLPerf, the de facto benchmarking suite for machine learning is not representative of the type of processing required in space. The only available space-related software is the open source benchmarking suite GPU4S\_Bench [3,4] (also known as OBPMark Kernels) and its evolution OBPMark (On-Board Data Processing Benchmarks) Applications, developed at the Barcelona Supercomputing Center (BSC) as part of the ESA-funded project GPU4S (GPU for Space)[5]. However, GPU4S\_Bench provides software kernels solely on algorithmic building blocks such as matrix multiplication and convolution used in deep learning, as well as a simple inference chain targeting CIFAR-10, but without evaluating performance and accuracy trade offs, nor using real space data sets for training and/or evaluation. In contrast to this, the OBPMark suite implements a set of computational performance benchmarks developed specifically for spacecraft on-board data processing applications, like radar processing and data compression [6]. OBPMark-ML is going to be a third variant of OBPMark, which will include realistic space applications covering multiple types of machine learning and deep learning processing.
Two of these applications are going to be designed and implemented in this Master’s Thesis, which will be performed during a research visit at the Barcelona Supercomputing Center, within the GPU4S ESA project. The applications will cover two different types of imaging tasks and will be trained with real space data. The design and implementation will cover both the training and the implementation parts of the applications in standard machine learning frameworks, both in Python and C. It will also include the production of the trained models in various formats, as well as the necessary material for the reproduction of the training for potentially new architectures which are not covered by the pre-trained models which will be generated. In particular:
- Instance Segmentation: Cloud Screening: The first task uses instance segmentation for cloud detection on an open source data set called Cloud95 [7]. U-Nets have become a standard approach for segmentation tasks and were also shown to be effective on cloud screening tasks [8]. Constraints for the use in on board processing are the enormous amount of parameters and the high computational cost, though. A good trade-off between computational complexity/memory footprint and prediction accuracy has to be found. For this the number of parameter needs to be scaled down. This can be achieved through a reduction of the depth and the number of filters in convolution layers. Another method is using techniques from the MobileNetv2 architecture and adapt them to the U-Net architecture [9]. Both methods will be evaluated.
- Object Detection: Ship Detection: The second application will be an object detection task, like ship detection on satellite pictures [10]. The same restrictions like in the segmentation case apply here. State-of-the-art architectures for object detection are single shot detectors (SSD) [11] and “You only look once” (YOLO) networks [12]. These architectures can use different backbones models. To reduce the number of parameters and the computational complexity, a MobileNet will be used in comparison to a heavy network like ResNet [13].
The models will be trained on Tensorflow/Keras and PyTorch, as well as converted to the ONNX format for portability and reproducibility. As different hardware like i.e. FPGAs require fixed point arithmetic, all models will be trained and provided in different precisions ranging from (double/full/half) floating point to integer (int8/int16). BSC will provide access to supercomputing resources for the training process. Accuracy-wise, they are to be compared qualitatively with literature results. The computational performance will be tested on some processor of interest, like i.e. the Myriad VPU or a Xilinx FPGA leveraging Vitis AI or embedded GPUs which have been identified as candidates for use in GPU4S. This depends of the accessibility of these development boards. The thesis is developed together with the Barcelona Computer Center (BSC) in the GPU4S program co-funded by the European Space Agency and all code and models will be published open source on Github under the current ESA-PL license.
[1] Jan-Gerd Meß, Frank Dannemann and Fabian Greif. Techniques of Artificial Intelligence for Space Applications – A Survey. In European Workshop on On-Board Data Processing (OBDP), 2019.
[2] https://www.esa.int/Applications/Observing_the_Earth/Ph-sat
[3] Ivan Rodriguez, Leonidas Kosmidis, Jerome Lachaize, Olivier Notebaert, David Steenari, GPU4S Bench: Design and Implementation of an Open GPU Benchmarking Suite for Space On-board Processing, Technical Report UPC-DAC-RR-CAP-2019-1, [online] Available: https://www.ac.upc.edu/app/research-reports/public/html/research_center_index-CAP-2019,en.html
[4] Leonidas Kosmidis, Iván Rodriguez, Alvaro Jover-Alvarez, Sergi Alcaide, Jérôme Lachaize, Olivier Notebaert, Antoine Certain, David Steenari. GPU4S: Major Project Outcomes, Lessons Learnt and Way Forward. Design Automation and Test in Europe Conference (DATE) 2021
[5] David Steenari, Leonidas Kosmidis, Ivan Rodriquez, Alvaro Jover, and Kyra Förster. OBPMark (On-Board Processing Benchmarks) – Open Source Computational Performance Benchmarks for Space Applications. In European Workshop on On-Board Data Processing (OBDP), 2021. [online], Available: https://zenodo.org/record/5638577
[6] OBPMark and GPU4S\_Bench open source repositories. [online] Available: https://obpmark.github.io
[7] https://www.kaggle.com/sorour/95cloud-cloud-segmentation-on-satellite-images (accessed on 1.2.2022)
[8] Johannes Drönner, Nikolaus Korfhage, Sebastian Egli, Markus Mühling, Boris Thies, Jörg Bendix, Bernd Freisleben and Bernhard Seeger. Fast Cloud Segmentation Using Convolutional Neural Networks. Remote Sens. 2018
[9] Junfeng Jing, Zhen Wang, Matthias Rätsch and Huanhuan Zhang. Mobile-Unet: An efficient convolutional neural network for fabric defect detection. Textile Research Journal, 2020.
[10]https://www.kaggle.com/c/airbus-ship-detection/overview
[11] Liu W. et al. SSD: Single Shot MultiBox Detector. Computer Vision – ECCV 2016, 2016.
[12] Joseph Redmon, Santosh Kumar Divvala, Ross B. Girshick, Ali Farhadi. You Only Look Once: Unified, Real-Time Object Detection. CoRR 2015, 2015.
[13] MobileNetV2: The Next Generation of On-Device Computer Vision Networks, [online] Available: https://ai.googleblog.com/2018/04/mobilenetv2-next-generation-of-on.html.
Learning based methods for 3D hemodynamics estimation in the cerebral vasculature
CT Projection Inpainting Using Denoising Diffusion Probabilistic Models
Modeling of Randomized Cerebrovascular Trees for Artifical Data Generation using Blender
Thesis Description
The advent of deep learning in recent years has led to a multitude of new practical applications of machine learning in many fields, including the medical domain [1]. A main limiting factor in
the implementation of deep learning algorithms for healthcare applications is the availability of representative training datasets of sufficient size [2].
Solving the data scarcity problem may prove particularly beneficial in the case of stroke. Globally, stroke is the leading cause of serious adult disability and the second-leading cause of death [3]. The main structure to distribute blood flow to the brain is the Circle of Willis (CoW) [3, 4]. Several anatomical variations of the CoW can be observed in the population [3, 4]. These normvariants differ in the frequency they appear in, leading to only 40% of the population possessing a wellformed complete CoW [4].
The multitude of CoW normvariants further exacerbates the need for more cerebrovascular training data for tasks like vessel labeling in stroke cases [5]. It may be possible to alleviate this problem by generating artificial data. The open-source 3D graphics software Blender appears suitable for this.
The aim of this thesis is to model a CoW graph which can be randomly and realistically deformed while being able to probabilistically incorporate common normvariants. The thesis shall comprise the following points:
- Literature research regarding the distribution of CoW normvariants in the population
- Model a standard variant of the cerebrovascular tree in Blender
- Create artificial trees by randomly sampling its parameter values
- Probabilistically adapt model to normvariants
References
[1] Neha Sharma, Reecha Sharma, and Neeru Jindal. Machine learning and deep learning applications-a vision. Global Transitions Proceedings, 2(1):24–28, 2021. 1st International Conference on Advances in Information, Computing and Trends in Data Engineering (AICDE – 2020).
[2] Martin J. Willemink, Wojciech A. Koszek, Cailin Hardell, Jie Wu, Dominik Fleischmann, Hugh Harvey, Les R. Folio, Ronald M. Summers, Daniel L. Rubin, and Matthew P. Lungren. Preparing medical imaging data for machine learning. Radiology, 295(1):4–15, 2020.
[3] Mohammed Oumer and Mekuriaw Alemayehu. Association between circle of willis and ischemic stroke: a systematic review and meta-analysis. BMC Neuroscience, 22(10), 2021.
[4] Debanjan Mukherjee, Neel D. Jani, Jared Narvid, and Shawn C. Shadden. The role of circle of willis anatomy variations in cardio-embolic stroke – a patient-specific simulation based study.
bioRxiv, 2018.
[5] Florian Thamm, Markus J¨urgens, Hendrik Ditt, and Andreas Maier. VirtualDSA++: Automated Segmentation, Vessel Labeling, Occlusion Detection and Graph Search on CTAngiography Data. In Barbora Kozl´ıkov´a, Michael Krone, Noeska Smit, Kay Nieselt, and Renata Georgia Raidou, editors, Eurographics Workshop on Visual Computing for Biology and Medicine. The Eurographics Association, 2020.