Index
Motion Compensation Using Epipolar Consistency Condition in Computed Tomography
The hippocampus and the Successor Representation – An analysis of the properties of the Successor Representation, place- and grid cells.
The human brain is a big role model for computer science. Many applications
like Neural Networks mimic brain functions with great success. However a lot of
functions are still not well understood and therefore subject to present research.
The hippocampus is one of the regions of greater interest. It is a central part
of memory processing, the limbic system and used for spatial navigation. Place
and grid cells are two important cell types found in the hippocampus, which
help to encode information for navigational tasks [1].
New theories however extend this view from spatial navigation to more abstract
navigation, which can be used for all concepts of information. In the paper The
hippocampus as a predictive map a mathematical description of the place cells
in the hippocampus, the Successor Representation (SR) is developed. The SR
can be used to imitate the data processing method of the hippocampus and
could already recreate experimental results [2]. Other experiments have also
extended the view from spatial navigation to broader information processing.
For example that the grid cells do not encode only the euclidean distances [3]
or that we use grid and place cells to orientate in our eld of vision [4]. All of
this could lead to powerful data processing tool, which can adept
exible to all
kinds of problems.
This thesis wants to build a framework which can be used to use and analyze
the properties of the SR. The framework should enable to create dierent envi-
ronments for simple navigation tasks, but also to get more abstract information
relationships in graphs. Furthermore mathematical properties should be ana-
lyzed to improve the learning process and to gain a broader understanding of
the functionality of the SR.
1
Development of a framework to simulate learning and task solving inspired by the hippocampus and successor representation
Since nervous systems have developed very efficient
mechanisms to store, retrieve and even extrapolate from
learned experience, machine learning has always oriented
itself on nature. Even though the discovery of neural
networks, support vector machines and deep networks have
been significantly pushing performance, science is still far
away from completely understanding the brain’s
implementation of those phenomena.
The hippocampus is a structure of the brain present in both
hemispheres. It has been proven to be responsible for both
spatial orientation and memory management [1, 2] but recent
studies suggest it is involved in far more profound tasks of
learning. This new theory assumes the hippocampus creates
abstract cognitive maps with the ability to predict unknown
states, joining the proven findings already mentioned
above.[3] To further investigate and study this behaviour
and possibly add proof to the theory, it is crucial to
examine the two dominant neural cell types which have
already been identified in the context of spatial
orientation. These are so called place cells on the one hand
and grid cells on the other.
Place- and grid cells were originally discovered to encode
spatial information and thus named accordingly. According to
the theory of abstract cognitive mapping in the hippocampus,
place cells’ activities are believed to represent states
in general. Grid cells were originally discovered firing
uniformly over space in different orientations, generating
some kind of coordinate system. In the context of this more
holistic theory, grid cells could provide a reference frame
for the abstract cognitive map. Many experiments have been
conducted to investigate the behaviour of the hippocampuses
structures related to learning.
The aim of this thesis is to create a framework for
researchers to simulate and work in environments which are
held so simple that the results can be transferred to any
other cognitive map. Hopefully this can help to avoid
complicated experimental setups, the use of laboratory
animal experiments and speed up research on the
hippocampuses role in learning in the future.
[1] D. S. Olton, J. T. Becker, and G. E. Handelmann.
“Hippocampus, space, and memory”. In: Behavioral and
Brain Sciences 2.3 (1979), pp. 313–322. issn: 0140-525X.
doi: 10.1017/S0140525X00062713.
[2] B. Milner, S. Corkin, and H.-L. Teuber. “Further
analysis of the hippocampal amnesic syndrome: 14-year
follow-up study of HM”. In: Neuropsychologia 6.3 (1968),
pp. 215–234. issn: 0028-3932.
[3] K. L. Stachenfeld, M. M. Botvinick, and S. J. Gershman.
“The hippocampus as a predictive map”. In: Nature
neuroscience 20.11 (2017), p. 1643. issn: 1546-1726.
——
Semi-Supervised Segmentation of Cell Images using Differentiable Rendering.
With the recent advancements in machine learning and mainly deep learning [1], deep convolutional neural networks
(CNNs) [2–7] have been developed, which are able to learn from data sets containing millions of images [8] to resolve
object detection tasks. When trained on such big data sets, CNNs are able to achieve task-relevant object detection
performances that are comparable or even superior to the capabilities of humans [9, 10]. A key problem of using deep
learning for cell detection is in general the large amount of data needed to train such networks. The main difficulty
lies in the acquisition of a representative data set of cell images which ideally contain various sizes, shapes and distributions
for a variety of cell types. Additionally, the manual annotation of the acquired data is mandatory to obtain
the so called ‘ground truth’ or ‘labels’, which is in general error-prone, time-consuming and costly to obtain.
Differentiable rendering [11–13] on the other hand is a emerging technique, which allows to generate synthetic,
photo-realistic images based on photographs of real-world objects by estimating its 3D shape and material properties.
While this approach can be used to generate photo-realistic images, it can also be applied for the generation of
respective ground truth labels for segmentation and object detection masks. Combining differentiable rendering with
deep learning could potentially solve the data bottleneck for machine learning algorithms in various fields, including
materials science and biomedical engineering.
The work of this thesis is based on the differentiable rendering framework ‘Redner’ [11] using data from the Cell
Tracking Challenge [14, 15]. In a first step, a literature research will be conducted on the topic of differentiable rendering.
In a second step, an existing implementation for the light, shader and geometry estimation of nanoparticles
will be adapted for the semi-supervised segmentation of GFP-GOWT1 mouse stem cells. Afterwards, the results of
this approach will be evaluated in terms of segmentation accuracy.
The thesis will include the following points:
• Getting familiar with the concepts of Differentiable Rendering and Gradient-based learning methods
• Implementation of a proof-of-concept for the semi-supervised segmentation of cells based on the ‘Redner’
framework using existing data from the Cell Tracking Challenge
• Evaluation of the method in terms of segmentation accuracy
• Elaboration of potential improvements for the method
Academic advisors:
References
[1] Y. Lecun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436–444, 2015.
[2] K. He, G. Gkioxari, P. Dollar, and R. Girshick, “Mask R-CNN,” Proceedings of the IEEE International Conference
on Computer Vision, vol. 2017-Octob, pp. 2980–2988, 2017.
[3] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv
preprint arXiv:1409.1556, 2014.
[4] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE
conference on computer vision and pattern recognition, pp. 770–778, 2016.
[5] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,”
Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2016-
Decem, pp. 779–788, 2016.
[6] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,”
in International Conference on Medical image computing and computer-assisted intervention, pp. 234–241,
Springer, 2015.
[7] T. Falk, D. Mai, R. Bensch, O¨ . C¸ ic¸ek, A. Abdulkadir, Y. Marrakchi, A. Bo¨hm, J. Deubner, Z. Ja¨ckel, K. Seiwald,
A. Dovzhenko, O. Tietz, C. Dal Bosco, S. Walsh, D. Saltukoglu, T. L. Tay, M. Prinz, K. Palme, M. Simons,
I. Diester, T. Brox, and O. Ronneberger, “U-Net: deep learning for cell counting, detection, and morphometry,”
Nature Methods, vol. 16, no. 1, pp. 67–70, 2019.
[8] Jia Deng, Wei Dong, R. Socher, Li-Jia Li, Kai Li, and Li Fei-Fei, “ImageNet: A large-scale hierarchical image
database,” 2009 IEEE Conference on Computer Vision and Pattern Recognition, no. May 2014, pp. 248–255,
2009.
[9] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou,
V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap,
M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis, “Mastering the game of Go with deep neural networks
and tree search,” Nature, vol. 529, no. 7587, pp. 484–489, 2016.
[10] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,”
in Advances in neural information processing systems, pp. 1097–1105, 2012.
[11] T.-M. Li, M. Aittala, F. Durand, and J. Lehtinen, “Differentiable monte carlo ray tracing through edge sampling,”
ACM Trans. Graph., vol. 37, Dec. 2018.
[12] M. Nimier-David, D. Vicini, T. Zeltner, and W. Jakob, “Mitsuba 2: a retargetable forward and inverse renderer,”
ACM Transactions on Graphics (TOG), vol. 38, no. 6, p. 203, 2019.
[13] G. Loubet, N. Holzschuch, andW. Jakob, “Reparameterizing discontinuous integrands for differentiable rendering,”
ACM Transactions on Graphics (TOG), vol. 38, no. 6, pp. 1–14, 2019.
[14] M. Maˇska, V. Ulman, D. Svoboda, P. Matula, P. Matula, C. Ederra, A. Urbiola, T. Espa˜na, S. Venkatesan,
D. M. Balak, et al., “A benchmark for comparison of cell tracking algorithms,” Bioinformatics, vol. 30, no. 11,
pp. 1609–1617, 2014.
[15] V. Ulman, M. Maˇska, K. E. Magnusson, O. Ronneberger, C. Haubold, N. Harder, P. Matula, P. Matula, D. Svoboda,
M. Radojevic, et al., “An objective comparison of cell-tracking algorithms,” Nature methods, vol. 14,
no. 12, p. 1141, 2017.
Detection of Hand Drawn Electrical Circuit Diagrams and their Components using Deep Learning Methods and Conversion into LTspice Format
Thesis Description
An electrical circuit diagram (ECD) is a graphical representation of an electrical circuit. ECDs consist of electrical circuit components (ECC), where for each ECC an unique symbol is defined in the international standard [1]. The ECCs are connected with lines, which correspond to wires in the real world. Furthermore, ECCs are further specified by an annotation next to their symbol, which consists of a digit followed by a unit. For instance a resistor can be denoted as ”100 mΩ” (Milliohm). Voltage sources and current sources are ECCs, which provide either a voltage (U) or a current (I) through the circuit. While U and I provided by sources are given, U and I with respect to certain ECCs have to be obtained through calculations. For small circuits this can be done by hand, however the calculation complexity grows with the size of the circuit and even more when alternating U/I sources are used, since certain component calculations become dependent on the frequency of the used source. Therefore, often a circuit simulation software (CSS) is used, where complex simulations can easily be performed in an automated way. Before a circuit can be simulated in a CSS, it first has to be modeled in the application. Refaat et al. [2] compared the drawing speed of structured diagrams by hand and with the diagram drawing tool Microsoft Visio. Their experiments have shown that drawing by hand was around 90% faster than drawing with Microsoft Visio. Since ECDs are also structured diagrams it seems that a hand drawn approach could be done more efficient than an application based drawing approach. Hence, an automated method to convert an image of a hand drawn ECD into a digital format processable by a CSS, would ease the use of CSS.
So far various researches have been conducted on the segmentation, recognition and the tracing of connections between ECCs, which will be briefly described in the following. The proposed approaches, can be structured as follows: 1) classification of ECCs [3, 4, 5], 2) segmentation and classification of ECCs [6, 7], 3) segmentation and classification of ECCs and ECD topology acquisition [8], 4) object detection of ECCs and ECD topology acquisition [9]. Moetesum et al. [6] used computer vision methods to segment ECCs from an ECD, where for different ECC types different strategies were used to obtain a segmentation mask. For instance sources were segmented by filling the region inside the source symbol, followed by a bounding box drawn around the segmentation mask. A Histogram of Oriented Gradients was applied on the region inside the bounding box, to obtain a feature vector for a following Support Vector Machine classifier. While this approach yielded good classification results, it is only partially extendable. For ECCs which have a similar shape to components which are already covered by a segmentation strategy, the existing strategy can probably be reused, but for completely new shapes, a new strategy has to be introduced. The aim of the proposed method by Dhanushika et al. [9] was to extract a boolean expression from an ECD made out of logical gate components (AND, OR, NOT, etc.). The ECS classification was modeled here by using the object detection algorithm YOLO (You Only Look Once) [10], which localizes and classifies an object in a single step. The ECD topology was recognized, by removing the bounding boxes from the image and applying a Hough Transform on the remaining connections. Hough Lines and bounding box intersections were now used to form the ECD topology, from which the final boolean expression was generated.
All of the above mentioned methods were restricted to drawings on white paper only. As it is quite common to also draw on gridded paper, this might become too restrictive for the use in real world scenarios. Furthermore, no method has been proposed so far, which aims to cover the full conversion, beginning with the image to the simulation based on a CSS formatted file.
Thus, this thesis aims to cover the development of a full processing pipeline able to convert images of hand drawn ECDs into an intermediate format, which reflects the topologies of the ECDs. Extensibility should be ensured by using an object detection deep neural network architecture, which is due to the nature of neural networks, simply extended by providing new data and labels for the training step. The pipeline should also be invariant to image quality (paper type, lighting conditions, background etc.), at least considering white and grid paper. Furthermore, the pipeline should contain the recognition of component annotations e.g. component values and voltage/current flow symbols. The conversion into a CSS format should be realized on the example of LTspice. Additionally, the used methods should be chosen such that the pipeline could be executed on mobile hardware, thus the computational effort for the whole pipeline must be kept as low as possible.
The thesis will comprise of the following work items:
- Collection of a suitable dataset
- Object detection of ECCs and annotations in images of a hand drawn ECDs
- Segmentation of the ECD from the drawing
- Identification of the ECD topology
- Postprocessing
- Building the ECD topology
- Assigning annotations to corresponding ECCs
- Embedding gathered information into a LTspice file
- Optional: Mobile demo application
References
[1] IEC-60617. https://webstore.iec.ch/publication/2723. Accessed: 21-12-2020.
[2] K. Refaat, W. Helmy, A. Ali, M. AbdelGhany, and A. Atiya. A new approach for context-independent
handwritten offline diagram recognition using support vector machines. In 2008 IEEE International Joint
Conference on Neural Networks (IEEE World Congress on Computational Intelligence), pages 177–182,
2008.
[3] M. Rabbani, R. Khoshkangini, H.S. Nagendraswamy, and M. Conti. Hand drawn optical circuit recognition.
Procedia Computer Science, 84:41 – 48, 2016. Proceeding of the Seventh International Conference on
Intelligent Human Computer Interaction (IHCI 2015).
[4] M. G¨unay, M. K¨oseo˘glu, and O. Yıldırım. Classification of hand-drawn basic circuit components using con- ¨
volutional neural networks. In 2020 International Congress on Human-Computer Interaction, Optimization
and Robotic Applications (HORA), pages 1–5, 2020.
[5] S. Roy, A. Bhattacharya, and N. Sarkar et al. Offline hand-drawn circuit component recognition using
texture and shape-based features. Springer Science+Business Media, August 2020.
[6] M. Moetesum, S. Waqar Younus, M. Ali Warsi, and I. Siddiqi. Segmentation and recognition of electronic
components in hand-drawn circuit diagrams. EAI Endorsed Transactions on Scalable Information Systems,
5(16), 4 2018.
[7] M. D. Patare and M. Joshi. Hand-drawn digital logic circuit component recognition using svm. International
Journal of Computer Applications, 143:24–28, 2016.
[8] B. Edwards and V. Chandran. Machine recognition of hand-drawn circuit diagrams. In 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100),
volume 6, pages 3618–3621 vol.6, 2000.
[9] T. Dhanushika and L. Ranathunga. Fine-tuned line connection accompanied boolean expression generation
for hand-drawn logic circuits. In 2019 14th Conference on Industrial and Information Systems (ICIIS),
pages 436–441, 2019.
[10] J. Redmon, S. K. Divvala, R. B. Girshick, and A. Farhadi. You only look once: Unified, real-time object
detection. CoRR, abs/1506.02640, 2015.
Semi-Supervised Beating Whole Heart Segmentation Based on 3D Cine MRI in Congenital Heart Disease Using Deep Learning
The heart is a dynamic, beating organ, and until now it has been challenging to fully capture its com-
plexity by magnetic resonance imaging (MRI). In an ideal world, doctors could create a 3-dimensional
(3D) visual representation of each patient’s unique heart and watch as it pumps, moving through each
phase of the cardiac cycle. [2]
The standard cardiac MRI includes multiple 2D image slices stacked next to each other that must
be carefully positioned by the MRI technologist based on a patient’s anatomy. Planning the location
and angle for the slices requires a highly-knowledgeable operator and takes time. [2]
Recently, a new MRI-based technology, referred to as “3D cine”, has been developed that can
produce moving 3D images of the heart. It allows cardiologists and cardiac surgeons to see a patient’s
heart from any angle and observe its movement throughout the entire cardiac cycle [2], as well as the
assessment of cardiac morphology and function [4].
Fully automatic methods for analysis of 3D cine cardiovascular MRI would improve the clinical
utility of this promising technique. At the moment, there is no automatic segmentation algorithm
available for 3D cine images of the heart. Furthermore, manual segmentation of 3D cine images is
time-consuming and impractical. Therefore, in this master thesis, dierent deep learning techniques
(DL) based on 3D MRI data will be investigated in order to automate the segmentation process. In
particular, two time frames of every 3D image might be rst semi-automatically segmented [3]. The
segmentation of these two time frames will be used to train a deep neural network for automatic
segmentation of the other time frames.
The datasets are acquired from 125 dierent patients at the Boston Children’s Hospital1. In
contrast to the standard cardiac MRI that patients must hold their breath while the picture is being
taken, these datasets are obtained by tracking the patient’s breathing motion and only collecting data
during expiration, when the patient is breathing out [1].
The segmentation results will be quantitatively validated using Dice score and qualitatively eval-
uated by clinicians.
The thesis has to comprise the following work items:
Data processing and manual annotation of the available datasets in order to utilize them for the
DL methods.
Development and implementation of 3D cine segmentation models based on DL techniques.
Quantitative evaluation of the segmentation results with respect to Dice score.
The thesis will be carried out at the Department of Pediatrics at Harvard University Medical School
and the Department of Cardiology at Boston Children’s Hospital, in cooperation with the Pattern
Recognition Lab at FAU Erlangen-Nuremberg and the Computer Science and Articial Intelligence
Lab of MIT. Furthermore, the results of the study are expected to be published as an abstract and
article at the International Society for Cardiovascular Magnetic Resonance in Medicine2.
1Department of Cardiology, Boston Children’s Hospital, 300 Longwood Avenue, Boston, MA 02115, USA
2https://scmr.org/
References
[1] Mehdi Hedjazi Moghari, Ashita Barthur, Maria Amaral, Tal Geva, and Andrew Powell. Free-
breathing whole-heart 3d cine magnetic resonance imaging with prospective respiratory motion
compensation: Whole-heart 3d cine mri. Magnetic Resonance in Medicine, 80, 2017.
[2] Erin Horan. The future of cardiac mri: 3-d cine. Boston Children’s Hospital’s science and clinical
innovation blog, 2016. [Online]. Available: https://vector.childrenshospital.org/2016/12/
the-future-of-cardiac-mri-3-d-cine.
[3] Danielle F. Pace. Image segmentation for highly variable anatomy: Applications to congenital heart
disease. Doctoral dissertation, Massachusetts Institute of Technology, Cambridge, MA, USA, 2020.
[4] Jens Wetzl, Michaela Schmidt, Francois Pontana, Benjamin Longere, Felix Lugauer, Andreas
Maier, Joachim Hornegger, and Christoph Forman. Single-breath-hold 3-d cine imaging of the
left ventricle using cartesian sampling. Magnetic Resonance Materials in Physics, Biology and
Medicine, 31:1{13, 2017.
Deep Learning based image enhancement for contrast agent minimization in cardiac MRI
Late gadolinium enhancement (LGE) imaging has become an indispensable tool in diag-
nosis and assessment of myocardial infarction (MI). Size, location, and extent of the in-
farcted tissue are important indicators to assess treatment ecacy and to predict functional
recovery.[1] In LGE imaging, T1-weighted inversion recovery pulse sequences are applied
several minutes after injection of a gadolinium-based contrast agent (GBCA). However,
contraindications (e. g. renal insuciency) and severe adverse eects (e. g. nephrogenic sys-
temic brosis) of GBCAs are known.[2] Therefore, minimization of administered contrast
agent doses is subject of current research. Existing neural network-based approaches either
rely on cardiac wall motion abnormalities[3] or have been developed for brain MRI[4].
The aim of this thesis is to develop a post-processing approach based on convolutional neural
networks (CNN) to accurately segment and quantify myocardial scar in 2-D LGE images
acquired with reduced doses of GBCA. For this purpose, synthetic data generated with an
in-house MRI simulation suite is used for a start. The 4-D XCAT phantom[5] is used for the
simulation, as it oers multiple possibilities for variations in patient anatomy as well as in
geometry and location of myocardial scar. Furthermore, the simulated images will include
variability in certain acquisition parameters to best re
ect in-vivo data. In addition to LGE
images, T1-maps are simulated with dierent levels of contrast agent dose. In the scope of
this thesis, multiple approaches using dierent combinations of input data (i. e. LGE images
and/or T1-maps at zero-dose and/or low-dose) are explored. The performance of the net-
work will be evaluated on simulated and in-vivo data. Depending on the availability, in-vivo
data will also be incorporated into the training process.
The thesis covers the following aspects:
Generation of simulated training data, best re
ecting in-vivo data
Development of the CNN-based system including implementation using PyTorch
Optional: depending on data availability and on previous results, incorporation of
in-vivo data into the training process
Quantitative evaluation of the implemented network on simulated and in-vivo data
using dice score and clinically relevant MI quantication metrics, e. g. the full width
at half maximum method (FWHM)
References
[1] V. Hombach, N. Merkle, P. Bernhard, V. Rasche, and W. Rottbauer, \Prognostic sig-
nicance of cardiac magnetic resonance imaging: Update 2010,” Cardiology Journal,
2010.
[2] L. Bakhos and M. A. Syed, Contrast Media, pp. 271{281. Cham: Springer International
Publishing, 2015.
1
[3] N. Zhang, G. Yang, Z. Gao, C. Xu, Y. Zhang, R. Shi, J. Keegan, L. Xu, H. Zhang,
Z. Fan, and D. Firmin, \Deep learning for diagnosis of chronic myocardial infarction on
nonenhanced cardiac cine MRI,” Radiology, 2019.
[4] E. Gong, J. M. Pauly, M. Wintermark, and G. Zaharchuk, \Deep learning enables re-
duced gadolinium dose for contrast-enhanced brain MRI,” Journal of Magnetic Reso-
nance Imaging, 2018.
[5] W. P. Segars, G. Sturgeon, S. Mendonca, J. Grimes, and B. M. Tsui, \4D XCAT phantom
for multimodality imaging research,” Medical Physics, 2010.
Transfer Learning for Re-identification on Chest Radiographs
Getting the Most out of U-Net Architecture for Glacier (Front) Segmentation
Glacier and ice sheets are currently contributing 2/3 of the observed global sea level rise. Many glaciers on glaciated regions, e.g., Antarctica, show already considerable ice mass loss in the last decade. Most of this mass loss is caused by dynamic adjustment of glaciers, with considerable glacier retreat and elevation change being the major observables. The continuous and precise extraction of glacier calving fronts is hence of paramount importance for monitoring the rapid glacier changes.
This project intends to bridge the gap for a fully automatic and end-to-end deep learning-based glacier (front) segmentation using synthetic aperture radar (SAR) imagery. U-Net has been recently used, in its simple form, for this task and showed promising results [1]. In this thesis, we would like to thoroughly study the fundamentals and incorporate more advanced ideas to improve the segmentation performance of the simple U-Net. In other words, this thesis investigates the approaches that enhances the image segmentation performance without deviating from the U-Net’s root architecture. The outcome of this thesis is expected to be a comparative study, similar to [11], on the Glacier (front) segmentation. To this end, the following ideas are going to be investigated:
1. Pre-processing: So far in the literature, simple denoising/multi-looking algorithms were used as pre-processing. It is interesting to conduct a more thorough study on the effect of some more pre-processing algorithms:
1.1. Attribute Profiles (APs) [2, 3] have resulted in performance enhancement for very high-resolution remote sensing image classification. They have been used on SAR image segmentation too [4]. Their extension, Feature Attribute Profiles [5], have been shown to outperform APs in the most scenarios. They have been also used for pixel-wise classification of SAR images [6]. We would like to study the performance of APs and their extension in SAR image segmentation. This task is optional and will be addressed if time allows. 1.2. There are multiple classical denoising algorithms like: median filter, Gaussian filter, Bilateral filter, Lee filter, Kuan filter, etc. The denoised images may be followed by the contrast enhancement algorithms, e.g., contrast limited adaptive histogram equalization (CLAHE). Different combinations will be studies quantitatively and qualitatively.
2. Different network architectures in the U-Net’s bottleneck:
2.1. dilated convolution (atrous convolution): dilated convolution [7] is shown to introduce multi-scaling to the network without increasing the number of parameters,
2.2. dilated Resnet [8],
2.3. pre-trained networks (VGG, Resnet, etc.),
3. Different Normalization Algorithms: One common issue in training Deep CNNs is the internal covariate shift, which is caused by the distribution change of input features. It results in both the training speed and performance to decrease. As a remedy, multiple normalization techniques have been proposed, like Batch Normalization, Instance Normalization, layer normalization, and group normalization [9]. In this thesis, we will study the effect of the algorithms above on the segmentation results of the U-Net, both qualitatively and quantitatively.
4. The most optimum loss function for this application:
• (Binary) Cross Entropy
• Dice coefficient
• Focal loss
• Weighted combination of the loss functions above
5. Effect of dropout and drop connect: In which layer is dropout the most effective one?Maybe using that in all layers is the best approach? Is using dropout in combination with normalization techniques (batch normalization) even advantageous?
6. Effect of different data augmentation techniques, e.g., flip, rotate, random crop, random transformation, etc. on the segmentation performance.
7. Effect of transfer learning:
7.1. Is pre-training the decoder, encoder, and bottleneck of the U-Net separately or all at once on other datasets beneficial? Is it effective to tackle the limited training data and the class-imbalance problem in the dataset?
7.2. The effect of transfer learning from the high quality images (quality factor=[1:3]) to the low quality ones (quality factor=[1:3]).
8. Improved architectures of U-Net: For a thorough review on some of the architecture in one place, please refer to Taghanaki et al. [11].
8.1. Feedforward Auto-Encoder 8.2. FCN
8.3. Seg-Net
8.4. U-Net
8.5. U-Net++ [10]
8.6. Tiramisu Network [12]
References
[1] Zhang et al. “Automatically delineating the calving front of Jakobshavn Isbræ from multitemporal TerraSAR-X images: a deep learning approach.” The Cryosphere 13, no. 6 (2019): 1729-1741.
[2] Dalla Mura, Mauro, et al. “Morphological attribute profiles for the analysis of very high resolution images.” IEEE Transactions on Geoscience and Remote Sensing 48.10 (2010): 3747-3762.
[3] Ghamisi, Pedram, Mauro Dalla Mura, and Jon Atli Benediktsson. “A survey on spectral–spatial classification techniques based on attribute profiles.” IEEE Transactions on Geoscience and Remote Sensing 53.5 (2014): 2335-2353.
[4] Boldt, Markus, et al. “SAR image segmentation using morphological attribute profiles.” The International Archives of Photogrammetry, Remote Sensing and Spatial Information Sciences 40.3 (2014): 39.
[5] Pham, Minh-Tan, Erchan Aptoula, and Sébastien Lefèvre. “Feature profiles from attribute filtering for classification of remote sensing images.” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 11.1 (2017): 249-256.
[6] Tombak, Ayşe, et al. “Pixel-Based Classification of SAR Images Using Feature Attribute Profiles.” IEEE Geoscience and Remote Sensing Letters 16.4 (2018): 564-567.
[7] Chen, Liang-Chieh, et al. “Rethinking atrous convolution for semantic image segmentation.” arXiv preprint arXiv:1706.05587 (2017).
[8] Zhang, Qiao, et al. “Image segmentation with pyramid dilated convolution based on ResNet and U-Net.” International Conference on Neural Information Processing. Springer, Cham, 2017.
[9] Zhou, Xiao-Yun, and Guang-Zhong Yang. “Normalization in training U-Net for 2-D biomedical semantic segmentation.” IEEE Robotics and Automation Letters 4.2 (2019): 1792-1799.
[10] Zhou, Zongwei, et al. “Unet++: A nested u-net architecture for medical image segmentation.” Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support. Springer, Cham, 2018. 3-11.
[11] Taghanaki, Saed Asgari, et al. “Deep Semantic Segmentation of Natural and Medical Images: A Review.” arXiv preprint arXiv:1910.07655 (2019).
[12] Jégou, Simon, et al. “The one hundred layers tiramisu: Fully convolutional densenets for semantic segmentation.” Proceedings of the IEEE conference on computer vision and pattern recognition workshops. 2017.
Detecting Defects on Transparent Objects using Polarization Cameras
The classification of images is a well known task in computer vision. However, transparent or semi-
transparent objects have several properties that can make computer vision tasks harder. Those objects
usually have less textures and sometimes strong reflections. Occasionally, different backgrounds make
it hard to recognize edges or the shape of an object. [1, 2]
To overcome these difficulties we use polarization cameras in this work. In contrast to ordinary cameras,
polarization cameras additionally record information about the polarization of the light rays. Most
natural light sources emit unpolarized light. By using a light source that emits polarized light, it is
possible to remove reflections or increase the contrast. Further it is known that the Angle of Linear
Polarization (AoLP) provides information about the normal of a surface [3].
In this work, we will use the Deep Learning approach and use Convolutional Neural Networks (CNNs)
to explore the following topics:
1. Comparison of different sorts of preprocessing:
• Using only raw data / reshaped raw data
• Using extra features Degree of Linear Polarization (DoLP) and AoLP
2. Influence of different light sources.
3. Comparison of different defect classes.
To evaluate the results we use different error metrics such as accuracy and f1, as well as gradient class
activation maps (GradCAM) [4].
The implementation should be done in Python.
References
[1] Agastya Kalra, Vage Taamazyan, Supreeth Krishna Rao, Kartik Venkataraman, Ramesh Raskar, and Achuta
Kadambi. Deep Polarization Cues for Transparent Object Segmentation. In 2020 IEEE/CVF Conference on
Computer Vision and Pattern Recognition (CVPR), pages 8599–8608, Seattle, WA, USA, June 2020. IEEE.
[2] Ilya Lysenkov, Victor Eruhimov, and Gary Bradski. Recognition and Pose Estimation of Rigid Transparent
Objects with a Kinect Sensor. page 8, 2013.
[3] Francelino Freitas Carvalho, Carlos Augusto de Moraes Cruz, Greicy Costa Marques, and Kayque Martins
Cruz Damasceno. Angular Light, Polarization and Stokes Parameters Information in a Hybrid Image Sensor
with Division of Focal Plane. Sensors, 20(12):3391, June 2020.[4] Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and
Dhruv Batra. Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization.
International Journal of Computer Vision, 128(2):336–359, February 2020.