Index
Cephalometric Landmark Detection Using Deep Learning
Projection Inpainting Using Partial Convolution for Metal Artifact Reduction
Truncation Correction in Computed Tomography Using Deep Learning
Investigate the application of DBP and GANs for truncation correction (field-of-view extension)
End-use Classification using High-Resolution Smart Water Meter Data
Smart water meters are widely used for billing of water consumption. Real-time data acquired with smart water meters provide new opportunities for the utility companies to create an intelligent and efficient water distribution network aimed at reducing costs and non-revenue losses. However, various factors such as degree of limescale, water quality or flow affect the accuracy of the measurements of a smart meter device greatly, especially the accumulated long-term influences of the aforementioned external factors result in significant non-conformance in the measurements. Hence, a predictive error estimation of the smart meter measurements and, thus, the degree of erosion of a device benefits the overall maintenance process from the meter level to the distribution network level.
In order to estimate the measurement error in a forehanded manner, end-use consumption patterns [1 – 3] (e.g. use of dishwasher etc.) can first be classified and extracted from the smart meter data, i.e. real-time measurements in a water distribution network. This step is the so called data disaggregation. Subsequently, the variance of the measurement accuracy is determined by comparing the results of disaggregation with reference measurements (e.g. same consumption pattern recognized from historical data). Therefore, a highly accurate classification of end-uses is fundamental for a precise estimation of the condition of a smart meter on the sensor level predictively.
In this work, we are focusing on the classification of the end-use consumption patterns using high-resolution smart meter data, especially for water distribution networks. The thesis consists of the following aspects:
- Literature review of water event clustering and water end-use classification techniques
- Analysis and understanding of the existing data
- Development and implementation of a water end-use classification framework
- Evaluation of the implemented approach
- Feature extraction
- Clustering of water events
- Classification of established water end-uses
[1] Mario Vašak, Goran Banjac, and Hrvoje Novak. Water use disaggregation based on classification of feature vectors extracted from smart meter data. Procedia Engineering, 119(1):1381–1390, 2015.
[2] Khoi Anh Nguyen, Rodney A. Stewart, and Hong Zhang. An autonomous and intelligent expert system for residential water end-use classification. Expert Systems with Applications, 41(2):342–356, 2014.
[3] L. Pastor-Jabaloyes, F. J. Arregui, and R. Cobacho. Water end use disaggregation based on soft computing techniques. Water, 10(1):321–341, 2018.
Ranking Loss for Writer Identification on Music Scores
Writer identification is a one-shot classification problem that is often performed solely on textual
handwriting as the data is easy to obtain. But also on handwritten music scores promises this task a
significant knowledge gain, especially as old music scores were often copied by hand even though
music engraving is known to exist since the late sixteenth century [1]. As pointed out by [3], Naı̈ve
Bayes Nearest Neighbor (NBNN) classifiers can natively not be used as Convolutional Neural Network
(CNN) activations or final layer of CNN end-to-end training. They proposed a scalable version of
Naive Bayes Non-linear Learning (NBNL) to address this problem. Besides, Mohammed et al. [4]
improved the classifier’s robustness to unbalanced data and added constraints to prevent matching of
irrelevant key points, adapting it specially for the task of writer identification.
Ranking is commonly used in evaluation metrics and it seems greatly desirable for the task of writer
recognition. Through incorporating the ’SoDeep’-layer as proposed by Engilberge et al. [2], we can
learn a loss function for our classification task. This also allows to introduce loss functions, which are
closer to the actual metrics of interest. The focus of this work will be to incorporate a NBNN classifier
into SoDeep.
In this work, a ranking loss layer is incorporated into a deep neural network architecture, allowing a
better classification by the local naive Bayes nearest neighbor approach for writer recognition.
The thesis consists of the following milestones:
• Setting up a writer identification framework.
• Incorporating the ’Normalized Local Naive Bayes Nearest Neighbor’ classifier as proposed
in [4].
• Implementing the sorting layer from the SoDeep-paper [2] for our metrics.
The implementation should be done in Python using Pytorch.
[1] Music engraving. URL: https://en.wikipedia.org/wiki/Music_engraving.
[2] Martin Engilberge, Louis Chevallier, Patrick Pérez, and Matthieu Cord. SoDeep: a Sorting Deep net to
learn ranking loss surrogates. In Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, pages 10792–10801, 2019.
[3] Ilja Kuzborskij, Fabio Maria Carlucci, and Barbara Caputo. When naive bayes nearest neighbors meet
convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, pages 2100–2109, 2016.[4] Hussein Mohammed, Volker Märgner, Thomas Konidaris, and H Siegfried Stiehl. Normalised Local Naı̈ve
Bayes Nearest-Neighbour Classifier for Offline Writer Identification. In 2017 14th IAPR International
Conference on Document Analysis and Recognition (ICDAR), volume 1, pages 1013–1018. IEEE, 2017.
Concentrating on Text for Improved Document Analysis
In recent years, deep learning as a popular research direction, has also been applied to text recognition
tasks by many researchers [1]. The variability of text and background interference makes text recogni-
tion becomes a challenging task, i.e. the deformation or bending of the scene text will cause various
recognition errors. To improve the performance of such problems, Shi et al. [2] have proposed RARE
(Robust text recognizer with Automatic Rectification), a recognition model that is robust to irregular
text. Writer Identification is the process of identifying the writer of a given text. Convolutional Neural
Networks (CNNs) as a state-of-the-art tool have been used for writer identification[3][4].
However, the use of Convolutional Neural Networks (CNNs) may face some limitations: Training and
running a deep learning model requires a large amount of computational power [5]. In addition, such
image-based text recognition process for scene text recognition or image-based writer identification,
could be affected by the background. Typically, a text document contains less than 5% of text pixels,
the rest is background. A mask, with which helps to concentrate on the foreground, i.e. the text pixels,
would be beneficial. Some researches of object detection [5] and image inpainting [6] have proposed
to use a binary mask, where the convolution is masked and renormalized to be conditioned on only
valid pixels. Furthermore, a new method “Self-Attention” has been proposed [7], the authors combine
Self-Attention with GANs to generates consistent scenarios by leveraging complementary features
in distant portions of the image rather than local regions of fixed shape. The idea of “Self-Attention”
could also be investigated in our task.
In this work, a binary mask or the method of self-attention is incorporated as a matrix into a deep neural
network architecture to focus on the foreground and ignore the background, allowing for end-to-end
training of the network for different document analysis tasks, such as writer identification.
The thesis consists of the following milestones:
• Incorporate partial convolutions [6] and evaluate them with pre-computed binary masks for
writer identification.
• Evaluate the influence of using a binarization mask to regularize the loss by means of the
Frobenius norm.
• Learn the parts to focus by using techniques, such as self-attention [7] or BABO [5].
• Thorough evaluation of the different methods and combinations for different document analysis
tasks.
• Further experiments regarding learning procedure and network architecure.
The implementation should be done in Python.
[1] B. Shi, X. Bai, and C. Yao. An end-to-end trainable neural network for image-based sequence recognition and
its application to scene text recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence,
39(11):2298–2304, 2017.
[2] Baoguang Shi, Xinggang Wang, Pengyuan Lyu, Cong Yao, and Xiang Bai. Robust scene text recognition
with automatic rectification. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR),
June 2016.
[3] Vincent Christlein, Markus Diem, Florian Kleber, Günter Mühlberger, Verena Schwägerl-Melchior, Esther
van Gelder, and Andreas Maier. Automatic Writer Identification in Historical Documents: A Case Study.
Zeitschrift für digitale Geisteswissenschaften, 2016.
[4] Vincent Christlein, David Bernecker, Andreas Maier, and Elli Angelopoulou. Offline Writer Identification
Using Convolutional Neural Network Activation Features. In Juergen Gall, Peter Gehler, and Bastian Leibe,
editors, Pattern Recognition, Lecture Notes in Computer Science, pages 540–552, Berlin, 2015.
[5] Byungseok Roh, Han-Cheol Cho, Myung-Ho Ju, and Soon Hyung Pyo. Babo: Background activation
black-out for efficient object detection, 2020.
[6] Guilin Liu, Fitsum A. Reda, Kevin J. Shih, Ting-Chun Wang, Andrew Tao, and Bryan Catanzaro. Image
inpainting for irregular holes using partial convolutions. In The European Conference on Computer Vision
(ECCV), September 2018.
[7] Han Zhang, Ian Goodfellow, Dimitris Metaxas, and Augustus Odena. Self-attention generative adversarial
networks, 2018.
End-to-end Deep Learning based Writer Identification
Writer identification is the task of finding the correct writer for a certain document. State-of-the-art
writer identification systems applying deep convolutional neural networks (CNN) consist of three
components [1] [2]. First of all, the local image patches are extracted based on the keypoints of
Scale-Invariant Feature Transform (SIFT). Subsequently, the local descriptors for these patches are
computed using the penultimate layer of a deep CNN. Finally, a global descriptor used for comparison
is computed by aggregation after embedding of local descriptors. For example, Keglevic et al.[1] map
image patches into a global descriptor using a triplet network followed by VLAD encoding.
However, the above systems, which comprise of three individual algorithmic components, still have
disadvantages that the patch extraction and feature encoding are not able to be optimized in an end-
to-end fashion. Ren et al. [3] introduce a Region Proposal Network (RPN) to generate high-quality
region proposal with a deep network. The RPN which shares convolutional layers with object detection
networks needs nearly none additional computation. Moreover, Zhang et al. [4] generalize VLAD
encoding and propose a Deep Texture Encoding Network (Deep-TEN) with a learnable encoding layer
that achieves supervised feature aggregation. Instead of using an average pooling, Christlein et al. [5]
propose to use Deep Generalized Max Pooling (DGMP) for the computation of the weights of local
activation vectors in order to balance frequent and rare embeddings that consist of locally coherent
activations.
In this work, the currently keypoints based patch extraction part is replaced by integrating it into the
RPN. The VLAD encoding part is replaced by Deep-TEN layer. Meanwhile, the currently average
pooling mechanism would be exchanged by DGMP. Ultimately, an end-to-end trainable network for
writer identification should be established.
The thesis consists of the following milestones:
• Incorporating RPN, Deep-TEN layer and DGMP into a neural network.
• Evaluating performance on the ICDAR17 competition dataset on historical document writer
identification [6].
• Comparing the effects of each stage against its non-deep learning part.
• Further experiments regarding learning procedure and network architecture.
The implementation should be done in Python.
[1] M. Keglevic, S. Fiel, and R. Sablatnig. Learning features for writer retrieval and identification using triplet
cnns. In 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR), pages
211–216, Aug 2018.
[2] Vincent Christlein, Martin Gropp, Stefan Fiel, and Andreas K. Maier. Unsupervised feature learning for
writer identification and writer retrieval. 2017 14th IAPR International Conference on Document Analysis
and Recognition (ICDAR), 01:991–997, 2017.
[3] Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster r-cnn: Towards real-time object detection
with region proposal networks. In C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett,
editors, Advances in Neural Information Processing Systems 28, pages 91–99. Curran Associates, Inc., 2015.
[4] Hang Zhang, Jia Xue, and Kristin Dana. Deep ten: Texture encoding network. In The IEEE Conference on
Computer Vision and Pattern Recognition (CVPR), July 2017.
[5] Vincent Christlein, Lukas Spranger, Mathias Seuret, Anguelos Nicolaou, Pavel Král, and Andreas Maier.
Deep generalized max pooling. 2019.
[6] S. Fiel, F. Kleber, M. Diem, V. Christlein, G. Louloudis, S. Nikos, and B. Gatos. Icdar2017 competition on
historical document writer identification (historical-wi). In 2017 14th IAPR International Conference on
Document Analysis and Recognition (ICDAR), volume 01, pages 1377–1382, Nov. 2018.