Improving the Robustness of In-Circuit Test in Electronic Production using Deep Neural Network and Synthetic Tabular Data

Type: MA thesis

Status: running

Date: July 1, 2023 - December 31, 2023

Supervisors: Martin Leipert, Reinhardt Seidel, M. Sc., Lehrstuhl für Fertigungsautomatisierung und Produktionssystematik, Erik Schwulera, Siemens AG


Since the fourth Industrial revolution, named ‘Industry 4.0’, the manufacturing
production process has become more flexible, traceable, and less complicated.
The worldwide demand for electronic products, particularly Printed Circuit
Boards (PCBs), is continuously increasing [1]. The manufacturing of PCBs in
the electronic industries needs high precision to ensure product optimality and
reliability because any defect in PCBs can lead to fatal flaws at the product
level. Even if only one component on the PCB does not work properly, there
is a possibility that the whole board is defective. Thus, accurate production
of PCB is critical. Machine learning techniques are widely used in electronic
industries to improve the manufacturing process and for detecting faults in order
to deliver high-quality PCBs to customers.

The thesis is conducted at Siemens AG in Erlangen, an independent man-
ufacturer of electronic products in a production environment driven by the
requirement of customers. The main technologies involved in the production
are Surface-Mount Technology (SMT) and Through-Hole Technology (THT)
[2]. During the SMT assembly line, the circuit boards are mounted with SMT
components using soldering paste and a reflow-oven system to ensure that the
component adheres to the PCB and has an electrical connection. Then PCBs
are passed through the THT assembly line to mount the THT components by
wires using drilling holes on the board and THT soldering to connect the com-
ponents onto the board. Once SMT and THT components are placed on the
circuit, multiple optical inspection tests like the Automatic Optical Inspection
(AOI) or Automatic X-ray Inspection (AXI) are applied during the components
mounting procedure. At the end of the process, the Printed Circuit Board As-
sembly (PCBAs) are tested using In-Circuit Test (ICT) and subsequently, a
Functional test to confirm that the complete PCB is functioning well [3].

Using ICT, the measurement values of all passive components on the PCBAs
are measured in order to check for defective components or a defective soldering
connection. Each component is defined with upper and lower threshold values.
The measured values are compared to the default values. The test fails when
the measured value does not fall within the default range. This raises an error
flag in In-Circuit Testing. This error could also be a Pseudo error (false alarm)
due to measurement errors. So, an additional process of manual inception is
done for every error flag from In-Circuit Testing of PCBAs at the production
line. Checking all the error calls of PCBA manually during production, was
heavily time-consuming and cost-intensive.

Other factors such as contaminated or misaligned ICT needles, can also
influence a failure besides out-of-range measured values [4]. The measured values
vary substantially due to external effects such as changing temperatures caused
by soldering. Therefore, these thresholds are required to be adjusted. Thus, test
engineers are bound to tune the upper and lower thresholds. This leads to a
high degree of misclassification of Real error from Pseudo error (false alarm) [4].
Moreover, high Pseudo error percentages during production increase the chance
of missing a real defect due to negligence, potentially leading to an increase in
the error slip ratio.

In an unautomated setting, the PCBAs pass through multiple ICTs if the
first ICT test fails. This additional testing ensures that the error is not due to a
temporary fault caused by a non-robust measurement. So, the dataset contains
the boards with the same serial number multiple times because of the ICT
reruns. If this rerun also fails, manual inspection is carried out. To reduce labor-
intensive manual Inspections, especially for non-critical cases, namely Pseudo
error, Cognitive Analytics for Test Engineering (CATE) is used [5]. CATE is an
automated machine learning model implemented to support In-Circuit Testing
of PCBAs in order to eliminate the ICT reruns and also to distinguish Pseudo-
errors from Real Errors [6]. Using neural network models like CATE in PCB
production can help test engineers avoid false positive error flags on the shop
floor, which will help them to accelerate and optimize their work.

Problem Statement

Machine Learning techniques can provide remarkable solutions when using huge
uncorrupted data. However, such solutions are ineffective if the data is limited
and noisy, which is typical for a manufacturing environment. CATE only sends
the actual Real errors to the inspection, which reduces the manual effort to
the mandatory minimum [7]. This thesis aims to overcome the shortcoming
of CATE such as highly imbalanced and noisy datasets on Real errors and
Pseudo errors, and the One-sided decision (upper limit) threshold of ICTs on
measurement values of the components [5]. Besides, the current approach is not
robust against production process variations and known error patterns at new

To overcome the current challenges in CATE, this thesis presents the approaches to using synthetic tabular data for model training and optimizing the
model parameters to include a two-sided test evaluation. The noisy nature of
the dataset and imbalanced target class will be addressed using synthetic tab-
ular data. Thus, the real error patterns like open solder joints or solder bridges
can be stimulated [5]. Moreover, I want to analyze the impact of different
approaches on model structure and parameters.
Based on these improvements the following five hypotheses shall be examined:
The synthetic dataset will be similar to the error patterns of the original dataset [12].
Applying CTGAN will improve the model’s robustness and generalization to learn the distribution of neural network model features [8].
Adding the GAN approach will produce better outcomes compared to traditional sampling techniques for class imbalance [9] [10].
Applying synthetic data to XGBoost will enhance the performance metrics of the model, which reduces the Pseudo errors to improve production  efficiency [11] [12].
Adding uncertainty measure to the neural network model will reduce the misclassfication rate by using the probability cut-off value [13].

The data for CATE is constructed of the output produced by the ICT, ground
truth from inspections and error database. The data consists of the following:
Sartikelnr: The type of board that is in the production line.
Sid: Serial number of the boards of the same type. E.g.: T-P56079964
ICT: If a board passes the ICT test, the result is TRUE. otherwise, FALSE
Functional test: If a board passes the subsequent the functional test, then
TRUE. otherwise, FALSE
Component ID: The type of components that are mounted onto the boards
such as R1000 for a Resistor with ID 1000)
Measurement Value: The measurement value is the actual value captured
by the ICT for a respective component. E.g.: the measurement of 100
Ohm for a resistor.
Unit: The unit of measurement is logged to normalize the measurement

Lower limit: If the measurement value is lower than this threshold, the
result is set to FAIL.
Upper limit: If the measurement value is higher than this threshold, the
result is set to FAIL.
Result: The actual test result is represented as either PASS or FAIL.
When the upper or lower limits are not surpassed the result is PASS.
Ground Truth: The Ground truth shows the values from inspection data,
if a board passes the test in first time, then the label is FIRST PASS.
SECOND PASS or MULTI PASS if a board passes the test in reruns. if
it fails, then the label is ONLY FAIL. ERROR and PSEUDO ERROR
represents the error flag of the board. INSPECTION label tells us if the
board was in manual inspection.


Building a model

The model will be built to generate a synthetic tabular dataset. One of the
approaches that are considered for synthetic tabular data generation is the Con-
ditional Tabular Generative Adversarial Network (CTGAN), a specialization of
the GAN architecture for synthesizing tabular data was presented in 2019 by
Lei Xu et al. in ’Modeling Tabular Data Using Conditional GAN’ [8]. CTGAN
is used to handle the imbalanced and noisy dataset and believes the approach
has the potential to enhance the prediction accuracy [14]. The similarity of cat-
egorical correlations between synthetic and original datasets will be measured.
The eXtreme Gradient Boosting (XGBoost) algorithm is used in the CATE
use case for the model classification task of distinguishing the Real errors from
the Pseudo errors. XGBoost modeling method will be tested with CTGAN and
other traditional sampling techniques. Then, it will be validated on a specific
validation dataset to verify the robustness against the data drift and known
error patterns at new components.
In order to avoid the misclassification rate, the uncertainty measure will be
added to the model. This will tell us how uncertain the model prediction is.
The uncertainty rate in the model aims to be as confident as possible about
a prediction but also to limit the number of uncertain predictions [13]. The
probability cut-off value of uncertainty on the ICT model will alert the test
engineers that the error requires a manual check.


For implementing this thesis several toolkits can be used. The most prominent
being sdv for CTGAN [8], Scikit-learn [15], and Keras [16]. The training of the
model will take place centrally with the help of AWS Sagemaker.


The model evaluation will be partitioned into three areas. First, a comparison
between the synthetic dataset and the original dataset is needed to have a sim-
ilarity measure. Here, The evaluation metric presented by Melle Mendikowski
can be used [17]. Second, the performance of the model using the CTGAN,
other sampling approaches and also the original dataset as input to XGBoost
will be estimated by the model’s F1-score. Lastly, to estimate more informed
and precise decisions on error flags, uncertainty estimation will be examined


[1] Tingting Chen et al., “Machine Learning in Manufacturing towards Indus-
try 4.0: From ‘For Now’ to ‘Four-Know,’” In: Applied Sciences, vol. 13,
no. 3, p. 1903, Feb. 2023. doi: 10.3390/app13031903.
[2] Eva Jabbar et al. “Conditional Anomaly Detection for Quality
and Productivity Improvement of Electronics Manufacturing Sys-
tems”. In: Lecture Notes in Computer Science. Vol. 11943 LNCS.
Springer, 2019, pp. 711–724, doi: 10.1007/978-3-030-37599-7 59. url: 59.
[3] S S Zakaria et al. “Automated Detection of Printed Circuit Boards (PCB)
Defects by Using Machine Learning in Electronic Manufacturing: Cur-
rent Approaches”. In: IOP Conference Series: Materials Science and
Engineering, (Feb. 2020), p. 012064.issn: 1757-8981, doi: 10.1088/1757-
899X/767/1/012064. url:
[4] Nabil El Belghiti Alaoui et al. “Upgrading In-Circuit Test of High
Density PCBAs Using Electromagnetic Measurement and Principal
Component Analysis”. In: Journal of Electronic Testing34.6 (Dec.
2018), pp. 749–762 isbn :0923-8174. doi:10.1007/s10836-018-5763 4. url:
[5] Erik Schwulera and Michael Plendl. CATE (Cognitive Analytics for Test
En- gineering) or Use of Artificial Intelligence (AI) to Optimize Test Pa-
rameters in Real time at the Edge. Tech. rep. Siemens AG, 2020.
[6] Siemens AG. Prescriptive Analytics for Test Engineering. Tech. rep.
Siemens AG, 2018.
[7] Ziqiu Kang et al. ”Machine learning applications in production lines: A
systematic literature review”. In: Computers & Industrial Engineering (Jul.
2020). url:
[8] Lei Xu et al. ”Modeling Tabular Data using Conditional GAN”. In: 33rd
Conference on Neural Information Processing Systems (NeurIPS 2019),
arXiv:1907.00503. url:
[9] Fajardo, Val Andrei et al. “On oversampling imbalanced data with deep
conditional generative models.” In: Expert Syst. Appl. 169 (2021): 114463.
[10] Wentao Mao et al. ”Imbalanced Fault Diagnosis of Rolling Bearing Based
on Generative Adversarial Network: A Comparative Study,” In: IEEE
Access, vol. 7, pp. 9515-9530, 2019. doi: 10.1109/ACCESS.2018.2890693.
[11] Vadim Borisov et al. ”Deep Neural Networks and Tabular Data: A Sur-
vey”, In: IEEE Transactions on Neural Networks and Learning Systems,
pp(99):1-21. doi: 10.1109/TNNLS.2022.3229161
[12] Aziira, A. H. et al. ”Generation of Synthetic Continuous Numerical Data
Using Generative Adversarial Networks”, In: Journal of Physics: Con-
ference Series, Volume 1577, Issue 1, article id. 012027 (Jul. 2020). doi:
[13] Gawlikowski et al. “A Survey of Uncertainty in Deep Neural Networks”,
ArXiv abs/2107.03342, 2021.
[14] Subhajit Chatterjee et al. ”A Synthetic Data Generation Technique for
Enhancement of Prediction Accuracy of Electric Vehicles Demand”. In:
Sensors. 2023; 23(2):594. url:
[15] F. Pedregosa et al. “Scikit-Learn: Machine Learning in Python”. In: Jour-
nal of Machine Learning Research 12 (2011), pp. 2825–2830.
[16] Chollet et al, ”Keras”, 2015. url:
[17] Melle Mendikowski et al. ”Creating Customers That Never Existed – Syn-
thesis of E-commerce Data Using CTGAN”. In: Machine Learning and
Data Mining in Pattern Recognition. International Conference on Machine
Learning and Data Mining in Pattern Recognition (MLDM-2022), pp. 91–
105, volume: 284, isbn: 978-3-942952-93-4