Index

Multi-task Learning for Module Power Prediction and Failure Classification on EL Images

Manual on-site inspection of solar modules requires a huge amount of human resources. The efficiency of a module is determined by the efficiency of all its cells. Those cells degrade over time and may suffer from a dust cover or multiple failures, like cracks or fractures. Automated inspection is used to reduce the time for on-site module inspection. Here, mainly three modalities are used: Electroluminescence (EL) imaging is used by the majority of works [1, 2, 3], while visual images of solar modules are used by Li et al. [4]. Further, Pierdicca et al. Use a dataset that consists of thermal images [5]. In addition to using different modalities, related works also differ by using images taken in a manufacturing setting [1, 2, 3] or from on-site inspection using drones [4, 5]. In this work, we will use a dataset that consists of 691 EL images taken under controlled lab conditions.

In this work, we aim to apply Deep Learning to enable automated inspection of solar modules. Existing research focuses on classification of failures [2-5] or regression of module efficiency [1]. In this work, we aim to join these ideas and consider the classification of failures and prediction of module efficiency as a multi-task learning problem. To this end, we aim to learn an embedding from cell images that can be used for defect classification on cell-level and for power prediction on module-level at the same time. Further, we want to assess, if a very small dimension of the embedding space is suited for power prediction, since we know that the module power is mainly dependent on the fraction of active area per cell. Hence, we hope that reducing the size of the embedding space constrains the problem in a favorable way. Finally, we plan to explore the learned embedding space by visualization and/or correlation with well-known features.

All models will be implemented in Python and PyTorch. The classification task will be done with the ResNet-18 architecture. The ResNet model will be trained with and without transfer learning.

In our dataset, all modules have six rows with ten cells each, resulting in a total of 60 cells per module. These single cell images, 41460 in total, will be used to train the neural networks either with the cell level failure label, the module level power label or both.

Literature

[1] Buerhop-Lutz, Claudia, et al. “Applying Deep Learning Algorithms to EL-images for Predicting the Module Power.” Presented at 36th European Photovoltaic Solar Energy Conference and Exhibition, Marseille 2019.

[2] Deitsch, Sergiu, et al. “Automatic classification of defective photovoltaic module cells in electroluminescence images.” Solar Energy 185 (2019): 455-468.

[3] Sun, Mingjian, et al. “Defect detection of photovoltaic modules based on convolutional neural network.” International Conference on Machine Learning and Intelligent Communications. Springer, Cham, 2017.

[4] Li, Xiaoxia, et al. “Intelligent Fault Pattern Recognition of Aerial Photovoltaic Module Images Based on Deep Learning Technique.” J Syst Cybern Inf 16.2 (2018): 67-71.

[5] Pierdicca, R., et al. “Deep Convolutional Neural Network for Automatic Detection of Damaged Photovoltaic Cells.” International Archives of the Photogrammetry, Remote Sensing & Spatial Information Sciences 42.2 (2018).

Extension of the Lottery Ticket Hypothesis for Saving Computational Cost and Energy

Many state-of-the-art neural networks have millions of parameters, e.g. VGG’s smallest configuration has 133 million parameters[1]. They achieve high test and training accuracies but require a high computational cost. Since powerful hardware exists, coping with that computational cost is possible but very inefficient. To countereffect these inefficiencies network pruning can be applied to decrease the size of the neural networks. Because the resulting accuracies after pruning did not match the ones of the original network for many years but in contrast hardware got more performing and cheaper, pruning was thought not to be optimal and the trend of constructing neural networks went towards creating big networks that perform consistently well on high performing hardware.

In 2019 The Lottery Ticket Hypothesis LTH [2] was introduced as a new approach for pruning neural networks. Using a binary mask the lowest weights are selectively set to zero and therefore connections are removed from the network. It suggests that fully-connected and convolutional neural networks can be iteratively pruned into a sparse subnetwork such that the parameter count can be reduced by over 90% while the number of iterations in training is at max as high as the original network’s and the test accuracy meets or even exceeds the original one. This paper opened up a large area of discussion where on the one hand some papers do not find improvements of the LTH over random initialization [3], whereas on the other hand some even found insights to why the approach works well [4].

A drawback of the lottery ticket hypothesis is however that the network’s structure of neurons still stays the same and no neurons are removed to decrease computational cost. The goal of this thesis is to investigate whether neural network pruning by reducing the number of neurons based on the idea of the lottery ticket hypothesis is possible. Additional goals would be to compute the amount of energy savings [5], compare masks and structures created by different datasets and optimizers for the same network to acquire potentially deeper insights and see if the idea of a supermask [4] also exists for the approach of neuron pruning.

The thesis aims to achieve following goals:
• Extending the lottery ticket hypothesis by actually removing neurons, instead of using a binary mask.
• Comparing the accuracies and network size to the original thesis on different datasets.

Additional investigations should be taken in:
• Comparing the network structures and masks of different datasets and optimizers.
• Compute the amount of energy savings. [5]
• Incorporate the idea of a supermask [4] to the approach of neuron pruning.
• Investigate pruning procedures aiming at removing enurons,instead of cutting connections.

[1] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. CoRR, abs/1409.1556, 2015.
[2] Jonathan Frankle and Michael Carbin. The lottery ticket hypothesis: Training pruned neural networks. CoRR, abs/1803.03635, 2018.
[3] Zhuang Liu, Mingjie Sun, Tinghui Zhou, Gao Huang, and Trevor Darrell. Rethinking the value of network pruning. CoRR, abs/1810.05270, 2018.
[4] Hattie Zhou, Janice Lan, Rosanne Liu, and Jason Yosinski. Deconstructing lottery tickets: Zeros, signs, and the supermask. CoRR, abs/1905.01067, 2019.
[5] Tien-Ju Yang, Yu-Hsin Chen, and Vivienne Sze. Designing energy-efficient convolutional neural networks using energy-aware pruning. CoRR, abs/1611.05128, 2016.

Multi-task Learning for Historical Handwritten Document Classification

In the Competition on Image Retrieval for Historical Handwritten Documents 2019 [1], several methods
have been proposed to identify the writer of a document. Although most of the proposed methods
are based on feature descriptors and traditional machine learning techniques, the deep learning based
methods are emerging in the field of historical document classification. At the same time, other deep
learning based methods are dominating the image retrieval and classification task.
Multi-task learning (MTL) is an approach to learning multiple tasks at the same time using one neural
network. This approach has not been used for historical handwritten document classification. Over
the years, MTL has been applied to many fields, not only for computer vision, but also to speech
processing, bioinformatics, etc. , to boost the performance [2]. In context of computer vision, MTL is
used to detect facial landmarks to improve the performance of expression recognition [3]. Furthermore,
a convolutional neuronal network has been proposed for pose estimation with some auxiliary tasks: for
example body part detection [4].
In this work, we will investigate the approach with neural networks using multi-task learning for
historical handwritten document classification. We will use the online published datasets from previous
competitions for training and testing.
We will implement two multi-task neural networks, one should focus on writer identification with the
auxiliary task: binarization. The performance of this multi-task learning algorithm will be evaluated
using the datasets from the ICDAR2017 Competition on Historical Document Writer Identification
(Historical-WI) [5]. The other neural network should focus on dating and style classification and
the performance of this multi-task learning algorithm will be evaluated using the datasets from the
ICDAR2017 Competition on the Classification of Medieval Handwritings in Latin Script [6].
The thesis consists of the following milestones:
• Review of the related work and methods of historical handwritten document classification
• Implemention of two neural networks for multi-task learning for:
– writer identification and binarization
– date classification and script type classification
• Evaluation the results of these two multi-task neural networks
• Comparison of this approach to current document classification approaches
• Examination and discussion about whether a multi-task neural network is useful for document
classification.
The implementation should be done in Pytorch.

[1] Vincent Christlein, Anguelos Nicolaou, Mathias Seuret, Dominique Stutzmann, and Andreas Maier. Icdar
2019 competition on image retrieval for historical handwritten documents. In International Conference on
Document Analysis and Recognition (ICDAR), 2019.
[2] Yu Zhang and Qiang Yang. A survey on multi-task learning. arXiv preprint arXiv:1707.08114, 2017.
[3] Terrance Devries, Kumar Biswaranjan, and Graham W. Taylor. Multi-task learning of facial landmarks and
expression. In 2014 Canadian Conference on Computer and Robot Vision, pages 98–103. IEEE, 2014.
[4] Sijin Li, Zhi-Qiang Liu, and Antoni B. Chan. Heterogeneous multi-task learning for human pose estimation
with deep convolutional neural network. In Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition Workshops, pages 482–489, 2014.
[5] Stefan Fiel, Florian Kleber, Markus Diem, Vincent Christlein, Georgios Louloudis, Stamatopoulos Nikos,
and Basilis Gatos. Icdar2017 competition on historical document writer identification (historical-wi). In
2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), volume 1,
pages 1377–1382. IEEE, 2017.
[6] Florence Cloppet, Veronique Eglin, Marlene Helias-Baron, Cuong Kieu, Nicole Vincent, and Dominique
Stutzmann. Icdar2017 competition on the classification of medieval handwritings in latin script. In 2017
14th IAPR International Conference on Document Analysis and Recognition (ICDAR), volume 1, pages
1371–1376. IEEE, 2017.

Pose Based Image Retrieval in Greek Vase Paintings

Adversarial Modeling of Emotions in Visual Scenes

Emotion Recognition Guided by Gaze and Context on Images

Machine-learning based localization of latest epicardial activation for cardiac resynchronization therapy guidance

Reinforcement Learning for the Planning of Liver Tumor Thermal Ablation

Adapting Pyro-NN with SPECT operators

Goal of this project is the implementation of the SPECT Forward and backward projection model in the Pyro-NN Framework. This would enable to include the SPECT reconstruction process into a Neural Network architecture.

Advancing the digital twin method

The aim of this research project was to develop a program that registers an XCAT phantom to a CT scan with a rigid and a non-rigid registration. The registered XCAT Phantom can be used to perform SPECT experiments and simulations without burdening the patient with an additional SPECT examination. To perform the registration the open source software Plastimatch version 1.8.0 was used. The results of the registration were evaluated visually and empirically. The registration was successful in most of the cases, but there were some cases where the rigid registration direction failed.