AI-Enhanced Data Imputation for Water Network Modeling

Type: MA thesis

Status: running

Date: February 2, 2026 - August 2, 2026

Supervisors: Andreas Maier, Siming Bayer, Mohammad Moataz Tolba

Background and Motivation

Accurate and complete data are critical for modeling and operating Water Distribution Networks (WDNs). However, in practice, sensor data (SCADA and smart meters) often contain gaps due to missing measurements, faulty devices, or communication failures. Hydraulic simulators, already in use within this project, rely on continuous and represntative data to calibrate and forecast system behavior. Data imputation is therefore an essential step to ensure reli able model outputs and subsequent decision-making (e.g., leak localization, pipe status verification).

This thesis proposes to develop and evaluate machine learning methods for spatio-temporal data imputation in WDNs. By combining GIS-based network topology with operational data (SCADA, meters) and environmental context (air temperature, soil temperature), the goal is to reconstruct missing data more accurately than traditional interpolation methods. The improved datasets will then be used to enhance the stability and accuracy of existing hydraulic simulations.

Objectives

  • Develop a machine learning-based pipeline for imputing missing SCADA and meter data.
  • Incorporate spatio-temporal structure (network topology + time) into the imputation process.
  • Compare advanced methods (GCN+LSTM, graph-temporal models) with classical baselines (linear interpolation, KNN, matrix factorization).
  • Validate improvements in data quality (MSE, MAE on masked gaps) and assess their impact on
    hydraulic model residuals.

Research Questions

  • How much can spatio-temporal ML models improve imputation accuracy compared to standard statistical methods?
  • Does the inclusion of network topology (via GNNs) provide measurable benefits for missing data reconstruction?
  • To what extent do improved imputations reduce residuals between hydraulic simulations and SCADA data?

Modeling

  • Baselines: Linear/forward-fill interpolation, KNN imputer, matrix factorization.
  • ML/DL: LSTM/GRU for temporal patterns; Graph Convolutional Networks (GCNs) combined with temporal models for spatio-temporal imputation.

Tools and Frameworks

The implementation will be done in Python, using libraries such as PyTorch Geometric, PyTorch, scikit-learn, and Pandas.

References

[1] Carlos A. Bonilla et al. “Assessing the Impacts of Failures on Monitoring Systems in Real-Time Data-Driven State Estimation Models Using GCN-LSTM for Water Distribution Networks”. In: Water 17.1 (Dec. 27, 2024), p. 46. ISSN: 2073-4441. DOI: 10.3390/w17010046. URL: https://www.mdpi.com/2073-4441/17/1/46.
[2] Andrea Cini, Ivan Marisca, and Cesare Alippi. Filling the G ap s: Multivariate Time Series Imputation by Graph Neural Networks. Feb. 10, 2022. DOI: 10.48550/arXiv.2108.00298. arXiv: 2108.00298[cs]. URL: http://arxiv.org/abs/2108.00298.
[3] Juan Huan et al. “A deep learning model with spatio-temporal graph convolutional networks for river water quality prediction”. In: Water Supply 23.7 (July 1, 2023), pp. 2940–2957. ISSN: 1606-9749, 1607-0798. DOI: 10.2166 ws.2023.164. URL: https://iwaponline.com/ws/article/23/7/2940/96110/A-deep-learning- model-with-spatio-temporal-graph.
[4] Lifang Wang et al. “A groundwater level spatiotemporal prediction model based on graph convolutional networks with a long short-term memory”. In: Journal of Hydroinformatics 26.11 (Nov. 1, 2024), pp. 2962–2979. ISSN: 1464-7141, 1465-1734. DOI: 10.2166/hydro.2024. 226. URL: https://iwaponline.com/jh/article/26/11/2962/105703/A-groundwater-level-spatiotemporal-prediction.