Index

A hybrid approach forLeakage Localization in the Water Distribution Network

Climate change is expected to cause more frequent and intense weather events such as droughts and floods, which can place additional stress on water distribution networks (WDN). Leakage in water distribution networks is a significant challenge that exacerbate the effects of climate change by increasing the amount of water that needs to be extracted and treated, as well as increasing energy consumption and greenhouse gas emissions associated with pumping and treating water. Therefore, accurate leakage localization can help reduce the amount of water lost from distribution networks, thereby reducing the need for additional water extraction and treatment. This can lead to energy savings and reduced greenhouse gas emissions, as well as ensuring that water resources are used efficiently and effectively. Additionally, by reducing the amount of water lost to leakage, WDN can be made more resilient to the impacts of climate change, such as droughts and water scarcity.
State-of-the-art methods tackle the challenge of leakage localization in a WDN comprise acoustic methods [1], pressure transient methods [2], flow measurement methods [3], and machine learning (ML) based methods [4-5]. However, these methods have significant limitations that hinder their application in the daily routine of water utilities. For instance, acoustic methods are cost intensive as they require additional sensors and equipment. Furthermore, the accuracy of such methods is greatly affected by the material of the pipes and presence of noise. Although the sensors that are necessary for the pressure transient methods and flow measurement methods might available due to the daily operation of WDN, these methods are often not sensitive to detect small leaks. Data-driven methods using ML has gain more importance in the recent year. However, the data availability, data quality and the explainability of ML models are the major limitations.
Therefore, we would like to investigate the effectiveness of a hybrid AI approach combining hydraulic model and ML to tackle the leakage localization within WDN using real world data. The following aspects need to be considered:
• Literature review of leakage localization for WDN.
• Development and implementation of a hybrid framework combining hydraulic model and ML methods for leakage localization with existing sensor data.
• Comprehensive evaluation of the performance of the implemented framework w.r.t. accuracy, robustness, and explainability.
[1] Khulief, Yehia et. al., (2012). Acoustic Detection of Leaks in Water Pipelines Using Measurements inside Pipe. ASCE Journal of Pipeline System Engineering and Practice. 2021, 3, 47. Doi:10.1061/(ASCE)PS.1949-1204.0000089.
[2] Levinas, D. et. al., Water Leak Localization Using High-Resolution Pressure Sensors. Water 2021, 13, 591. https://doi.org/10.3390/w13050591
[3] L. Lindström, et. al. Leakage Localization in Water Distribution Networks: A Model-Based Approach, 2022 European Control Conference (ECC), London, United Kingdom, 2022, pp. 1515-1520, doi: 10.23919/ECC55457.2022.9838006.
[4] Huang, Pingjie, et al. “Real-time burst detection in district metering areas in water distribution system based on patterns of water demand with supervised learning.” Water 10.12 (2018): 1765. doi.org/10.3390/w10121765
[5] Soldevila, Adrià, et al. “Data-driven approach for leak localization in water distribution networks using pressure sensors and spatial interpolation.” Water 11.7 (2019): 1500. doi.org/10.3390/w11071500

ML based Classification of States in LPWAN Current Consumption Curves

Evaluation of imperfect segmentation labels and the influence on deep learning models

Multi-organ segmentation in CT is of great clinical and research value [1], which can benefit the development of automatic computer-aided diagnosis tools and the accuracy of some interventional therapies, such as the treatment planning of radiation therapy. With the development of the deep learning (DL), the performance of the DL-based models has dramatically improved, compared with traditional segmentation methods [2].
For training a DL model for segmentation task, a paired segmentation dataset is needed. A paired segmentation dataset here indicates the accurate annotation of all voxels in all CT volumes, which is tedious and time-consuming. For this reason, the large-scale segmentation datasets for multiple organs in large body region are rarely published and mostly contain annotation errors. Several researches have been done to study the influence of imperfect segmentation labels on the training of the segmentation network, but to the best of our knowledge, none is done for the multi-organ segmentation task in CT. [3, 4]
In this thesis, our research problem is how the segmentation network will be influenced by the typical annotation errors. To achieve this, several typical annotation errors will be simulated on a public multiorgan segmentation dataset, CT-ORG, [5] and the influence will be analysed both quantitatively and qualitatively.

The thesis will comprise the following work items:

  • Literature overview of related analysis of imperfect segmentation labels.
  • Simulate some typical annotation errors on a segmentation dataset.
  • Train the baseline model on the perfect dataset and the models on the imperfect datasets.
  • Evaluate the influence of the errors with the baseline.
  • Record the result in the thesis

[1] Andreas Maier, Christopher Syben, Tobias Lasser, and Christian Riess. A gentle introduction to deep learning in medical image processing. Zeitschrift f¨ur Medizinische Physik, 29(2):86–101, 2019.
[2] Mohammad Hesam Hesamian, Wenjing Jia, Xiangjian He, and Paul Kennedy. Deep learning techniques for medical image segmentation: achievements and challenges. Journal of digital imaging, 32:582–596, 2019.
[3] Eugene Vorontsov and Samuel Kadoury. Label noise in segmentation networks: mitigation must deal with bias. In DGM4MICCAI 2021 and DALI 2021, pages 251–258. Springer, 2021.
[4] Nicholas Heller, Joshua Dean, and Nikolaos Papanikolopoulos. Imperfect segmentation labels: How much do they matter? In CVII-STENT 2018 and LABELS 2018, pages 112–120. Springer, 2018.
[5] Blaine Rister, Darvin Yi, Kaushik Shivakumar, Tomomi Nobashi, and Daniel L Rubin. Ct-org, a new dataset for multiple organ segmentation in computed tomography. Scientific Data, 7(1):381, 2020

Super-Resolving XRM Scans Using Conditioned Diffusion Models

Image-to-Image Translation Using Diffusion Generative Models

Recognition of Optical Chemical Structures

Development of an AI-based ring detection algorithm for CT image quality control

Diffusion-based Super Resolution for X-ray Microscopy

Diffusion Models for Generating Offline Handwritten Text Images

Unsupervised Contextual Anomaly Detection in Frequency Converter Data

Thesis Description

Anomaly detection is a widely researched topic and is already extensively used in many different domains such as fraud detection for credit cards, intrusion detection for cyber-security or fault detection in safety critical systems [1]. Various anomaly detection techniques are also applied in the industry where electric motors are one of the most important components which is underlined by the fact that they account for approximately 40% of global electricity consumption. Being able to detect anomalies in their sensor data can help predicting errors and therefore avoiding expensive reactive maintenances which will save a lot of cost for unplanned downtime [2].

As it is difficult to come up with general anomaly detection techniques and since different domains have different requirements regarding performance and other limitations, a wide range of methods such as Neural Networks, Rule-based techniques, Clustering, Nearest Neighbor, Statistical models and more exist. The different techniques have certain limitations what kind of anomaly they can detect (such as point, contextual or collective anomaly) and what labels need to be available. To give an example, a nearest neighbor anomaly detection would be able to detect point anomalies without needing labeled data, which makes it an unsupervised algorithm. In general it is exceptionally hard to get labeled data in some domains and therefore in such cases an unsupervised solution is preferable [1].

In this thesis the focus is on unlabeled multivariate timeseries data coming from a SINAMIC frequency converter, that is used to convert the mains voltage to a suitable signal powering an electric motor [3]. A partial timeframe of this data is declared as reference measurement and assumed to be mostly regular without anomalies. The frequency converters have a set of sensor parameters that can be extracted and processed using the Siemens Industrial Edge Device [4]. State of the art methods for this problem include calculating the distance metric between timeseries using Dynamic Time Warping and applying the Local Outlier Factor algorithm as shown in [5], using a neural network consisting of stacked autoencoders that will classify sequences based on whether they can be reconstructed by the autoencoder [2] or using a Generative Adversarial Network (GAN) including LSTMs (Long Short Term Memory) to capture temporal correlation as demonstrated in [6].

Especially Autoencoders, that encode an input into a compact hidden representation that is then decoded with the aim of reconstructing the original input, are commonly used [7]. The idea is that since the hidden representation is reduced, it will only be able to represent regular data patterns and not the patterns in anomalies. If an anomaly is fed through the autoencoder it is expected to generate a higher reconstruction error than usual. However, the approach suffers, if the training data contains anomalies, since their patterns might be learned in this case. And indeed, this applies to many real-world applications [8].

In this thesis the unlabeled data of industry motors will be analyzed and different solutions for anomaly detection in multivariate timeseries data will be tested. Since the computation power of the edge device is limited, the algorithms will be evaluated regarding their performance. Classification of anomalies would be desirable but highly depend on the application domain and is therefore not feasible in a general anomaly detection approach.

For summarization the thesis will deal with the following points:

  1. Data analysis
    (a) Statistics
    (b) (manual) identification of anomalies
  2. Development of various models for anomaly detection
  3. Model evaluation regarding to different metrics at least including the following
    (a) Accuracy
    (b) Training performance
    (c) Runtime performance
  4. Optional: Classification

References
[1] Varun Chandola, Arindam Banerjee, and Vipin Kumar. Anomaly detection: A survey. ACM Comput. Surv.,
41(3), jul 2009.
[2] Sean Givnan, Carl Chalmers, Paul Fergus, Sandra Ortega-Martorell, and Tom Whalley. Anomaly detection
using autoencoder reconstruction upon industrial motors. Sensors, 22(9), 2022.
[3] Siemens AG. Converter, 2022. Accessed on 14.12.2022.
[4] Siemens AG. Industrial edge / production machines, 2022. Accessed on 14.12.2022.
[5] Wang Yong, Mao Guiyun, Chen Xu, and Wei Zhengying. Anomaly detection of semiconductor processing
data based on dtw-lof algorithm. In 2022 China Semiconductor Technology International Conference
(CSTIC), pages 1–3, 2022.
[6] Alexander Geiger, Dongyu Liu, Sarah Alnegheimish, Alfredo Cuesta-Infante, and Kalyan Veeramachaneni.
Tadgan: Time series anomaly detection using generative adversarial networks. CoRR, abs/2009.07769, 2020.
[7] Ane Bl´azquez-Garc´ia, Angel Conde, Usue Mori, and Jos´e Antonio Lozano. A review on outlier/anomaly
detection in time series data. CoRR, abs/2002.04236, 2020.
[8] Tung Kieu, Bin Yang, Chenjuan Guo, Razvan-Gabriel Cirstea, Yan Zhao, Yale Song, and Christian S.
Jensen. Anomaly detection in time series with robust variational quasi-recurrent autoencoders. In 2022
IEEE 38th International Conference on Data Engineering (ICDE), pages 1342–1354, 2022.