Detection of Label Noise in Solar Cell Datasets

On-site inspection of solar panels is a time-consuming and difficult process, as the solar panels are often difficult to reach. Furthermore, identifying defects can be hard, especially for small cracks. Electroluminescence (EL) imaging enables the detection of small cracks, for example using a convolutional neural network (CNN) [1,2]. Hence, it can be used to identify such cracks before they propagate and result in a measurable impact on the efficiency of a solar panel [3]. This way costly inspection and replacement of solar panels can be avoided.

To train a CNN for the detection of cracks, a comprehensive dataset of labeled solar cells is required. Unfortunately, assessing, if a certain structure on a polycrystalline solar cell corresponds to a crack or not, is a hard task, even for human experts. As a result, setting up a consistently labeled dataset is nearly impossible. That is why EL datasets of solar cells favor a significant amount of label noise.

It has been shown that CNNs are robust against small amounts of label noise, but there may be drastic influence on the performance starting at 5%-10% of label noise [4]. This thesis will

(1) analyze the given dataset with respect to label noise and
(2) attempts to minimize the negative impact on the performance of the trained network caused by label noise.

Recently, Ding et. al. proposed to identify label noise by clustering of the features learned by the CNN [4]. As part of this thesis, the proposed method will be applied to a dataset consisting of more than 40k labeled samples of solar cells, which is known to contain a significant amount of label noise. As a result, it will be investigated, if the method can be used to identify noisy samples. Furthermore, it will be evaluated, if abstaining from noisy samples improves the performance of the resulting model. To this end, a subset of the dataset will be labeled by at least three experts to obtain a cleaned subset. Finally, an extension of the method will be developed. Here, it shall be evaluated, if the clustering can be omitted, since this proved instable in prior experiments using the same data.

[1] Deitsch, Sergiu, et al. “Automatic classification of defective photovoltaic module cells in electroluminescence images.” Solar Energy 185 (2019): 455-468.
[2] Mayr, Martin, et al. “Weakly Supervised Segmentation of Cracks on Solar Cells Using Normalized L p Norm.” 2019 IEEE International Conference on Image Processing (ICIP). IEEE, 2019.
[3] Köntges, Marc, et al. “Impact of transportation on silicon wafer‐based photovoltaic modules.” Progress in Photovoltaics: research and applications 24.8 (2016): 1085-1095.
[4] Ding, Guiguang, et al. “DECODE: Deep confidence network for robust image classification.” IEEE Transactions on Image Processing 28.8 (2019): 3752-3765.