Automation of flow cytometry diagnostics workflow for leukemia diagnostics by leveraging machine learning

Type: MA thesis

Status: running

Date: May 1, 2021 - November 2, 2021

Supervisors: Daniel Stromer, Vincent Christlein, Stefan Krause, Andreas Maier

Background: FCM – Flow cytometry is a technique for measuring the physical and chemical properties
of individual cells suspended in a fluid stream. FCM is widely used in immunology, in many clinical and
biomedical laboratories for diagnosis, subclassification and post-treatment monitoring of blood cancers or
leukemias. Generally, a single session of FCM produces multidimensional readouts of 10,000 to 1,000,000
cells with 4 to 12 parameters.
The conventional workflow of diagnostics involves visualization of the FCM dataset in a series of 2-D scatter
plots and evaluate the different characteristics of cell populations by experts. Based on the inspection, the
pathologists identify a sub-population of cells (gating) and quantifies for further analysis/diagnosis.
Motivation: However, the conventional analytic process is performed manually on a sequence of two-
dimensional scatter plots. Repeating this process on multiple data sets is very time consuming and labour-
intensive. This problem leads to different clinical decisions depending upon the individuals who perform it
and causes more challenges.
Approach: Our approach is to automatize these conventional workflows by leveraging machine learning
techniques thereby supporting the pathologists/clinicians in their daily routine or research work. The main
objective of this thesis is to focus on the identification of small amounts of residual atypical cells in patients
with leukemia (minimal residual disease – MRD) in an automated fashion.
The following is an overview of the tasks involved in the development of the project:
1. Data Selection: Finding an unsupervised algorithm to search for “islands” that contain mainly events
from the same sample, but only a few events from different samples.
2. Dimensionality Reduction Algorithms[1]: Implementing other algorithms (umap) and validating the
effect against the existing t-SNE algorithm.
3. Optimization: Performing optimization of SNE based on OptSNE algorithm [2].
4. Performing evaluation and testing
[1] Y. Saeys, S. Van Gassen, and B. Lambrecht, “Computational flow cytometry: Helping to make sense of
high-dimensional immunology data,” Nature Reviews Immunology, vol. 16, 06 2016.
[2] A. C. Belkina, C. O. Ciccolella, R. Anno, R. Halpert, J. Spidlen, and J. E. Snyder-
Cappione, “Automated optimized parameters for t-distributed stochastic neighbor embedding
improve visualization and allow analysis of large datasets,” bioRxiv, 2019. [Online]. Available: