Synthetic data creation of defect images for CNN training using GAN

Master Thesis in Cooperation with Infineon Technologies AG

External Supervision:
Weichselbaumer Christoph (BE R UPE TEST) (Christoph.Weichselbaumer@infineon.com)

Working Tittle:
Synthetic data creation of defect images for CNN training using GAN

Research Question:
Can the use of Generative Adversarial Networks (GANs) for generating synthetic data of particles and scratches on a transparent background improve the accuracy of Convolutional Neural Networks (CNNs) in defect detection?

Description:
The proposed Master’s thesis aims to address the challenge of providing ground truth for model training in AI use-cases, specifically in the context of detecting defects in products. Currently, the process of manual review and labeling of images for CNN [1] training is highly time-consuming and costly, and the most critical and relevant defects are often the least present in real data due to the design of products to minimize such defects. Moreover, data shifts in production can also affect the training of models.
To overcome these challenges, the proposed Master’s thesis will focus on the use of Generative Adversarial Networks (GANs) [2],[3], [4] to generate synthetic data for the minority classes for CNN training. The goal of the thesis is to create one or more GANs that are capable of generating defect images of particles and scratches on a transparent background, and to evaluate the performance of these generated images by measuring their classification accuracy using an existing CNN [5].

The proposed research will involve two sub-targets:
1. Developing GANs capable of generating defect images of particles and scratches on a transparent background.
2. Creating GANs capable of image-to-image translation to generate defect images of particles and scratches.
The focus of the research will be on generating defect images on a transparent background, with image-to-image translation considered an optional target depending on the progress of the research.

The approach to using GANs involves using existing defect image datasets for training, and then generating synthetic images to fill gaps in the real data. The GANs will be trained to generate synthetic images of particles and scratches on a transparent background, and these images will be combined with a “golden die” background to create an enhanced dataset for CNN training.
The suggested topic has the potential to significantly reduce the time and cost associated with manual review and labeling of images for CNN training, and to provide a more diverse and relevant dataset for model training. The proposed approach can also be applied to other defect types, such as stains and chipping, to further enhance the dataset for CNN training.

Literature:
[1]. O’Shea, Keiron, and Ryan Nash. “An introduction to convolutional neural networks.” arXiv preprint arXiv:1511.08458 (2015).
[2]. Ali, Safinah, Daniella DiPaola, and Cynthia Breazeal. “What are GANs? introducing generative adversarial networks to middle school students.” Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 35. No. 17. 2021.
[3]. Figueira, Alvaro, and Bruno Vaz. “Survey on synthetic data generation, evaluation methods and GANs.” Mathematics 10.15 (2022): 2733.
[4]. Eilertsen, Gabriel, et al. “Ensembles of GANs for synthetic training data generation.” arXiv preprint arXiv:2104.11797 (2021).
[5]. Buda, Mateusz, Atsuto Maki, and Maciej A. Mazurowski. “A systematic study of the class imbalance problem in convolutional neural networks.” Neural networks 106 (2018): 249-259.