Index

Weakly supervised localization of defects in electroluminescence images of solar cells

With the recent rise of renewable energy, usage of solar energy has also grown rapidly. Detecting faulty panels inproduction and on-site therefore has become more important. Prior works focus on fault detection using the e.g. the current, voltage and temperature of solar modules as inputs [6, 1], but the localization of defects using imaging and machine learning has only recently gained attention [5, 4].

This work studies the detection of defects in electroluminescence (EL) images of solar cells using state of the art computer vision techniques with a focus on crack detection. Previously, in order to train a model to predict pixel classifications, exhaustive labelling of every pixel in an image of the dataset was required. State of the art training methods allow models to predict coarse segmentations using only image-wise classification labels by means of weakly supervised training. Recently, it has been shown that these methods can be applied to perform a coarse segmentation of cracks on EL images of solar cells as well [5].

This thesis aims to improve upon the existing method. To this end, weakly supervised learning methods like guided backpropagation, grad-cam, score-cam and adversarial learning [5, 9, 2, 7, 8, 3] will be implemented to train a model that reliably and accurately localizes cracks in a dataset of about 40k image-wise annotated EL images of solar cells. Finally, a thorough evaluation will show, if these methods can improve over the state of the art.

References

[1] Ali, Mohamed Hassan, et al. “Real time fault detection in photovoltaic systems.” Energy Procedia 111 (2017): 914-923.
[2] Chattopadhay, Aditya, et al. “Grad-cam++: Generalized gradient-based visual explanations for deep convolutional networks.” 2018 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 2018.
[3] Choe, Junsuk, and Hyunjung Shim. “Attention-based dropout layer for weakly supervised object localization.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019.
[4] Deitsch, Sergiu, et al. “Automatic classification of defective photovoltaic module cells in electroluminescence images.” Solar Energy 185 (2019): 455-468.
[5] Mayr, Martin, et al. “Weakly Supervised Segmentation of Cracks on Solar Cells Using Normalized L p Norm.” 2019 IEEE International Conference on Image Processing (ICIP). IEEE, 2019.
[6] Triki-Lahiani, Asma, Afef Bennani-Ben Abdelghani, and Ilhem Slama-Belkhodja. “Fault detection and monitoring systems for photovoltaic installations: A review.” Renewable and Sustainable Energy Reviews 82 (2018): 2680-2692.
[7] Wang, Haofan, et al. “Score-CAM: Score-weighted visual explanations for convolutional neural networks.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 2020.
[8] Zhang, Xiaolin, et al. “Adversarial complementary learning for weakly supervised object localization.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018.
[9] Zhou, Bolei, et al. “Learning deep features for discriminative localization.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.

Comparison of different text attention techniques for writer identification

Distillation Learning for Speech Enhancement

Noise suppression has remained a field of interest for more than five decades now, and a number of techniques have been employed to extract clean and/or noise free data. Continuous audio and video signals offer greater challenges when it comes to noise reduction, and deep neural network (DNN) techniques have been designed to enhance those signals (Valin, 2018). While the DNNs are efficient, they are computationally expensive and demand adequate memory resources. The aim of the proposed thesis will remain on addressing these constraints when working limited memory and computational power, without compromising much on the model efficiency.

A Neural Network (NN) can easily be overfitted with the training data, owing to the large number of parameters and training sessions for which the network was trained on the given data (Dakwale & Monz, 2019). One solution to this is to use ensemble (combination) of models trained on the same data to achieve generalization. The limitation of this solution comes with hardware constraints and when the network needs to be used on a hardware with limited memory and computational power, such as mobile phones. This resource limitation seeds the idea of distillation learning, in which the knowledge from a complex or ensembled network is transferred to a relatively simpler and computationally less expensive model.

Following the framework of distillation learning, a Teacher-Student network will be designed, with an existing trained Teacher network. The teacher network has been trained on audio data with hard labels, using a dense parameter matrix. The high number of parameters dictates the complexity of the neural network and also the efficiency to identify and suppress signal noise (Hinton, et al., 2015). The proposed method is to design a student network, which tries to imitate the output of the teacher, i.e., the probability distribution, without the need to be trained with the same number of parameters. By transferring the learning of the teacher to the student network, a simpler model can be designed, with a reduced set of parameters, which would be more suited for hardware with lower memory and computational power.

Motion Compensation Using Epipolar Consistency Condition in Computed Tomography

The hippocampus and the Successor Representation – An analysis of the properties of the Successor Representation, place- and grid cells.

The human brain is a big role model for computer science. Many applications
like Neural Networks mimic brain functions with great success. However a lot of
functions are still not well understood and therefore subject to present research.
The hippocampus is one of the regions of greater interest. It is a central part
of memory processing, the limbic system and used for spatial navigation. Place
and grid cells are two important cell types found in the hippocampus, which
help to encode information for navigational tasks [1].
New theories however extend this view from spatial navigation to more abstract
navigation, which can be used for all concepts of information. In the paper The
hippocampus as a predictive map a mathematical description of the place cells
in the hippocampus, the Successor Representation (SR) is developed. The SR
can be used to imitate the data processing method of the hippocampus and
could already recreate experimental results [2]. Other experiments have also
extended the view from spatial navigation to broader information processing.
For example that the grid cells do not encode only the euclidean distances [3]
or that we use grid and place cells to orientate in our eld of vision [4]. All of
this could lead to powerful data processing tool, which can adept
exible to all
kinds of problems.
This thesis wants to build a framework which can be used to use and analyze
the properties of the SR. The framework should enable to create di erent envi-
ronments for simple navigation tasks, but also to get more abstract information
relationships in graphs. Furthermore mathematical properties should be ana-
lyzed to improve the learning process and to gain a broader understanding of
the functionality of the SR.
1

Development of a framework to simulate learning and task solving inspired by the hippocampus and successor representation

Since nervous systems have developed very efficient
mechanisms to store, retrieve and even extrapolate from
learned experience, machine learning has always oriented
itself on nature. Even though the discovery of neural
networks, support vector machines and deep networks have
been significantly pushing performance, science is still far
away from completely understanding the brain’s
implementation of those phenomena.

The hippocampus is a structure of the brain present in both
hemispheres. It has been proven to be responsible for both
spatial orientation and memory management [1, 2] but recent
studies suggest it is involved in far more profound tasks of
learning. This new theory assumes the hippocampus creates
abstract cognitive maps with the ability to predict unknown
states, joining the proven findings already mentioned
above.[3] To further investigate and study this behaviour
and possibly add proof to the theory, it is crucial to
examine the two dominant neural cell types which have
already been identified in the context of spatial
orientation. These are so called place cells on the one hand
and grid cells on the other.

Place- and grid cells were originally discovered to encode
spatial information and thus named accordingly. According to
the theory of abstract cognitive mapping in the hippocampus,
place cells’ activities are believed to represent states
in general. Grid cells were originally discovered firing
uniformly over space in different orientations, generating
some kind of coordinate system. In the context of this more
holistic theory, grid cells could provide a reference frame
for the abstract cognitive map. Many experiments have been
conducted to investigate the behaviour of the hippocampuses
structures related to learning.

The aim of this thesis is to create a framework for
researchers to simulate and work in environments which are
held so simple that the results can be transferred to any
other cognitive map. Hopefully this can help to avoid
complicated experimental setups, the use of laboratory
animal experiments and speed up research on the
hippocampuses role in learning in the future.

[1] D. S. Olton, J. T. Becker, and G. E. Handelmann.
“Hippocampus, space, and memory”. In: Behavioral and
Brain Sciences 2.3 (1979), pp. 313–322. issn: 0140-525X.
doi: 10.1017/S0140525X00062713.

[2] B. Milner, S. Corkin, and H.-L. Teuber. “Further
analysis of the hippocampal amnesic syndrome: 14-year
follow-up study of HM”. In: Neuropsychologia 6.3 (1968),
pp. 215–234. issn: 0028-3932.

[3] K. L. Stachenfeld, M. M. Botvinick, and S. J. Gershman.
“The hippocampus as a predictive map”. In: Nature
neuroscience 20.11 (2017), p. 1643. issn: 1546-1726.
——

Detection of Hand Drawn Electrical Circuit Diagrams and their Components using Deep Learning Methods and Conversion into LTspice Format

Thesis Description

An electrical circuit diagram (ECD) is a graphical representation of an electrical circuit. ECDs consist of electrical circuit components (ECC), where for each ECC an unique symbol is defined in the international standard [1]. The ECCs are connected with lines, which correspond to wires in the real world. Furthermore,  ECCs are further specified by an annotation next to their symbol, which consists of a digit followed by a unit. For instance a resistor can be denoted as ”100 mΩ” (Milliohm). Voltage sources and current sources are ECCs, which provide either a voltage (U) or a current (I) through the circuit. While U and I provided by sources are given, U and I with respect to certain ECCs have to be obtained through calculations. For small circuits this can be done by hand, however the calculation complexity grows with the size of the circuit and even more when alternating U/I sources are used, since certain component calculations become dependent on the frequency of the used source. Therefore, often a circuit simulation software (CSS) is used, where complex simulations can easily be performed in an automated way. Before a circuit can be simulated in a CSS, it first has to be modeled in the application. Refaat et al. [2] compared the drawing speed of structured diagrams by hand and with the diagram drawing tool Microsoft Visio. Their experiments have shown that drawing by hand was around 90% faster than drawing with Microsoft Visio. Since ECDs are also structured diagrams it seems that a hand drawn approach could be done more efficient than an application based drawing approach. Hence, an automated method to convert an image of a hand drawn ECD into a digital format processable by a CSS, would ease the use of CSS.

So far various researches have been conducted on the segmentation, recognition and the tracing of connections between ECCs, which will be briefly described in the following. The proposed approaches, can be structured as follows: 1) classification of ECCs [3, 4, 5], 2) segmentation and classification of ECCs [6, 7], 3) segmentation and classification of ECCs and ECD topology acquisition [8], 4) object detection of ECCs and ECD topology acquisition [9]. Moetesum et al. [6] used computer vision methods to segment ECCs from an ECD, where for different ECC types different strategies were used to obtain a segmentation mask. For instance sources were segmented by filling the region inside the source symbol, followed by a bounding box drawn around the segmentation mask. A Histogram of Oriented Gradients was applied on the region inside the bounding box, to obtain a feature vector for a following Support Vector Machine classifier. While this approach yielded good classification results, it is only partially extendable. For ECCs which have a similar shape to components which are already covered by a segmentation strategy, the existing strategy can probably be reused, but for completely new shapes, a new strategy has to be introduced. The aim of the proposed method by Dhanushika et al. [9] was to extract a boolean expression from an ECD made out of logical gate components (AND, OR, NOT, etc.). The ECS classification was modeled here by using the object detection algorithm YOLO (You Only Look Once) [10], which localizes and classifies an object in a single step. The ECD topology was recognized, by removing the bounding boxes from the image and applying a Hough Transform on the remaining connections. Hough Lines and bounding box intersections were now used to form the ECD topology, from which the final boolean expression was generated.

All of the above mentioned methods were restricted to drawings on white paper only. As it is quite common to also draw on gridded paper, this might become too restrictive for the use in real world scenarios. Furthermore, no method has been proposed so far, which aims to cover the full conversion, beginning with the image to the simulation based on a CSS formatted file.

Thus, this thesis aims to cover the development of a full processing pipeline able to convert images of hand drawn ECDs into an intermediate format, which reflects the topologies of the ECDs. Extensibility should be ensured by using an object detection deep neural network architecture, which is due to the nature of neural networks, simply extended by providing new data and labels for the training step. The pipeline should also be invariant to image quality (paper type, lighting conditions, background etc.), at least considering white and grid paper. Furthermore, the pipeline should contain the recognition of component annotations e.g. component values and voltage/current flow symbols. The conversion into a CSS format should be realized on the example of LTspice. Additionally, the used methods should be chosen such that the pipeline could be executed on mobile hardware, thus the computational effort for the whole pipeline must be kept as low as possible.

The thesis will comprise of the following work items:

  1. Collection of a suitable dataset
  2. Object detection of ECCs and annotations in images of a hand drawn ECDs
  3. Segmentation of the ECD from the drawing
  4. Identification of the ECD topology
  5. Postprocessing
    1. Building the ECD topology
    2. Assigning annotations to corresponding ECCs
    3. Embedding gathered information into a LTspice file
  6. Optional: Mobile demo application
References

[1] IEC-60617. https://webstore.iec.ch/publication/2723. Accessed: 21-12-2020.

[2] K. Refaat, W. Helmy, A. Ali, M. AbdelGhany, and A. Atiya. A new approach for context-independent
handwritten offline diagram recognition using support vector machines. In 2008 IEEE International Joint
Conference on Neural Networks (IEEE World Congress on Computational Intelligence), pages 177–182,
2008.

[3] M. Rabbani, R. Khoshkangini, H.S. Nagendraswamy, and M. Conti. Hand drawn optical circuit recognition.
Procedia Computer Science, 84:41 – 48, 2016. Proceeding of the Seventh International Conference on
Intelligent Human Computer Interaction (IHCI 2015).

[4] M. G¨unay, M. K¨oseo˘glu, and O. Yıldırım. Classification of hand-drawn basic circuit components using con- ¨
volutional neural networks. In 2020 International Congress on Human-Computer Interaction, Optimization
and Robotic Applications (HORA), pages 1–5, 2020.

[5] S. Roy, A. Bhattacharya, and N. Sarkar et al. Offline hand-drawn circuit component recognition using
texture and shape-based features. Springer Science+Business Media, August 2020.

[6] M. Moetesum, S. Waqar Younus, M. Ali Warsi, and I. Siddiqi. Segmentation and recognition of electronic
components in hand-drawn circuit diagrams. EAI Endorsed Transactions on Scalable Information Systems,
5(16), 4 2018.

[7] M. D. Patare and M. Joshi. Hand-drawn digital logic circuit component recognition using svm. International
Journal of Computer Applications, 143:24–28, 2016.

[8] B. Edwards and V. Chandran. Machine recognition of hand-drawn circuit diagrams. In 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100),
volume 6, pages 3618–3621 vol.6, 2000.

[9] T. Dhanushika and L. Ranathunga. Fine-tuned line connection accompanied boolean expression generation
for hand-drawn logic circuits. In 2019 14th Conference on Industrial and Information Systems (ICIIS),
pages 436–441, 2019.

[10] J. Redmon, S. K. Divvala, R. B. Girshick, and A. Farhadi. You only look once: Unified, real-time object
detection. CoRR, abs/1506.02640, 2015.

Semi-Supervised Beating Whole Heart Segmentation Based on 3D Cine MRI in Congenital Heart Disease Using Deep Learning

The heart is a dynamic, beating organ, and until now it has been challenging to fully capture its com-
plexity by magnetic resonance imaging (MRI). In an ideal world, doctors could create a 3-dimensional
(3D) visual representation of each patient’s unique heart and watch as it pumps, moving through each
phase of the cardiac cycle. [2]
The standard cardiac MRI includes multiple 2D image slices stacked next to each other that must
be carefully positioned by the MRI technologist based on a patient’s anatomy. Planning the location
and angle for the slices requires a highly-knowledgeable operator and takes time. [2]
Recently, a new MRI-based technology, referred to as “3D cine”, has been developed that can
produce moving 3D images of the heart. It allows cardiologists and cardiac surgeons to see a patient’s
heart from any angle and observe its movement throughout the entire cardiac cycle [2], as well as the
assessment of cardiac morphology and function [4].
Fully automatic methods for analysis of 3D cine cardiovascular MRI would improve the clinical
utility of this promising technique. At the moment, there is no automatic segmentation algorithm
available for 3D cine images of the heart. Furthermore, manual segmentation of 3D cine images is
time-consuming and impractical. Therefore, in this master thesis, di erent deep learning techniques
(DL) based on 3D MRI data will be investigated in order to automate the segmentation process. In
particular, two time frames of every 3D image might be rst semi-automatically segmented [3]. The
segmentation of these two time frames will be used to train a deep neural network for automatic
segmentation of the other time frames.
The datasets are acquired from 125 di erent patients at the Boston Children’s Hospital1. In
contrast to the standard cardiac MRI that patients must hold their breath while the picture is being
taken, these datasets are obtained by tracking the patient’s breathing motion and only collecting data
during expiration, when the patient is breathing out [1].
The segmentation results will be quantitatively validated using Dice score and qualitatively eval-
uated by clinicians.
The thesis has to comprise the following work items:
 Data processing and manual annotation of the available datasets in order to utilize them for the
DL methods.
 Development and implementation of 3D cine segmentation models based on DL techniques.
 Quantitative evaluation of the segmentation results with respect to Dice score.
The thesis will be carried out at the Department of Pediatrics at Harvard University Medical School
and the Department of Cardiology at Boston Children’s Hospital, in cooperation with the Pattern
Recognition Lab at FAU Erlangen-Nuremberg and the Computer Science and Arti cial Intelligence
Lab of MIT. Furthermore, the results of the study are expected to be published as an abstract and
article at the International Society for Cardiovascular Magnetic Resonance in Medicine2.
1Department of Cardiology, Boston Children’s Hospital, 300 Longwood Avenue, Boston, MA 02115, USA
2https://scmr.org/

References
[1] Mehdi Hedjazi Moghari, Ashita Barthur, Maria Amaral, Tal Geva, and Andrew Powell. Free-
breathing whole-heart 3d cine magnetic resonance imaging with prospective respiratory motion
compensation: Whole-heart 3d cine mri. Magnetic Resonance in Medicine, 80, 2017.
[2] Erin Horan. The future of cardiac mri: 3-d cine. Boston Children’s Hospital’s science and clinical
innovation blog, 2016. [Online]. Available: https://vector.childrenshospital.org/2016/12/
the-future-of-cardiac-mri-3-d-cine.
[3] Danielle F. Pace. Image segmentation for highly variable anatomy: Applications to congenital heart
disease. Doctoral dissertation, Massachusetts Institute of Technology, Cambridge, MA, USA, 2020.
[4] Jens Wetzl, Michaela Schmidt, Francois Pontana, Benjamin Longere, Felix Lugauer, Andreas
Maier, Joachim Hornegger, and Christoph Forman. Single-breath-hold 3-d cine imaging of the
left ventricle using cartesian sampling. Magnetic Resonance Materials in Physics, Biology and
Medicine, 31:1{13, 2017.

Deep Learning based image enhancement for contrast agent minimization in cardiac MRI

Late gadolinium enhancement (LGE) imaging has become an indispensable tool in diag-
nosis and assessment of myocardial infarction (MI). Size, location, and extent of the in-
farcted tissue are important indicators to assess treatment ecacy and to predict functional
recovery.[1] In LGE imaging, T1-weighted inversion recovery pulse sequences are applied
several minutes after injection of a gadolinium-based contrast agent (GBCA). However,
contraindications (e. g. renal insuciency) and severe adverse e ects (e. g. nephrogenic sys-
temic brosis) of GBCAs are known.[2] Therefore, minimization of administered contrast
agent doses is subject of current research. Existing neural network-based approaches either
rely on cardiac wall motion abnormalities[3] or have been developed for brain MRI[4].
The aim of this thesis is to develop a post-processing approach based on convolutional neural
networks (CNN) to accurately segment and quantify myocardial scar in 2-D LGE images
acquired with reduced doses of GBCA. For this purpose, synthetic data generated with an
in-house MRI simulation suite is used for a start. The 4-D XCAT phantom[5] is used for the
simulation, as it o ers multiple possibilities for variations in patient anatomy as well as in
geometry and location of myocardial scar. Furthermore, the simulated images will include
variability in certain acquisition parameters to best re
ect in-vivo data. In addition to LGE
images, T1-maps are simulated with di erent levels of contrast agent dose. In the scope of
this thesis, multiple approaches using di erent combinations of input data (i. e. LGE images
and/or T1-maps at zero-dose and/or low-dose) are explored. The performance of the net-
work will be evaluated on simulated and in-vivo data. Depending on the availability, in-vivo
data will also be incorporated into the training process.
The thesis covers the following aspects:
ˆ Generation of simulated training data, best re
ecting in-vivo data
ˆ Development of the CNN-based system including implementation using PyTorch
ˆ Optional: depending on data availability and on previous results, incorporation of
in-vivo data into the training process
ˆ Quantitative evaluation of the implemented network on simulated and in-vivo data
using dice score and clinically relevant MI quanti cation metrics, e. g. the full width
at half maximum method (FWHM)

References
[1] V. Hombach, N. Merkle, P. Bernhard, V. Rasche, and W. Rottbauer, \Prognostic sig-
ni cance of cardiac magnetic resonance imaging: Update 2010,” Cardiology Journal,
2010.
[2] L. Bakhos and M. A. Syed, Contrast Media, pp. 271{281. Cham: Springer International
Publishing, 2015.
1
[3] N. Zhang, G. Yang, Z. Gao, C. Xu, Y. Zhang, R. Shi, J. Keegan, L. Xu, H. Zhang,
Z. Fan, and D. Firmin, \Deep learning for diagnosis of chronic myocardial infarction on
nonenhanced cardiac cine MRI,” Radiology, 2019.
[4] E. Gong, J. M. Pauly, M. Wintermark, and G. Zaharchuk, \Deep learning enables re-
duced gadolinium dose for contrast-enhanced brain MRI,” Journal of Magnetic Reso-
nance Imaging, 2018.
[5] W. P. Segars, G. Sturgeon, S. Mendonca, J. Grimes, and B. M. Tsui, \4D XCAT phantom
for multimodality imaging research,” Medical Physics, 2010.

Getting the Most out of U-Net Architecture for Glacier (Front) Segmentation

Glacier and ice sheets are currently contributing 2/3 of the observed global sea level rise. Many glaciers on glaciated regions, e.g., Antarctica, show already considerable ice mass loss in the last decade. Most of this mass loss is caused by dynamic adjustment of glaciers, with considerable glacier retreat and elevation change being the major observables. The continuous and precise extraction of glacier calving fronts is hence of paramount importance for monitoring the rapid glacier changes.
This project intends to bridge the gap for a fully automatic and end-to-end deep learning-based glacier (front) segmentation using synthetic aperture radar (SAR) imagery. U-Net has been recently used, in its simple form, for this task and showed promising results [1]. In this thesis, we would like to thoroughly study the fundamentals and incorporate more advanced ideas to improve the segmentation performance of the simple U-Net. In other words, this thesis investigates the approaches that enhances the image segmentation performance without deviating from the U-Net’s root architecture. The outcome of this thesis is expected to be a comparative study, similar to [11], on the Glacier (front) segmentation. To this end, the following ideas are going to be investigated:

1. Pre-processing: So far in the literature, simple denoising/multi-looking algorithms were used as pre-processing. It is interesting to conduct a more thorough study on the effect of some more pre-processing algorithms:

1.1. Attribute Profiles (APs) [2, 3] have resulted in performance enhancement for very high-resolution remote sensing image classification. They have been used on SAR image segmentation too [4]. Their extension, Feature Attribute Profiles [5], have been shown to outperform APs in the most scenarios. They have been also used for pixel-wise classification of SAR images [6]. We would like to study the performance of APs and their extension in SAR image segmentation. This task is optional and will be addressed if time allows. 1.2. There are multiple classical denoising algorithms like: median filter, Gaussian filter, Bilateral filter, Lee filter, Kuan filter, etc. The denoised images may be followed by the contrast enhancement algorithms, e.g., contrast limited adaptive histogram equalization (CLAHE). Different combinations will be studies quantitatively and qualitatively.

2. Different network architectures in the U-Net’s bottleneck:

2.1. dilated convolution (atrous convolution): dilated convolution [7] is shown to introduce multi-scaling to the network without increasing the number of parameters,
2.2. dilated Resnet [8],

2.3. pre-trained networks (VGG, Resnet, etc.),

3. Different Normalization Algorithms: One common issue in training Deep CNNs is the internal covariate shift, which is caused by the distribution change of input features. It results in both the training speed and performance to decrease. As a remedy, multiple normalization techniques have been proposed, like Batch Normalization, Instance Normalization, layer normalization, and group normalization [9]. In this thesis, we will study the effect of the algorithms above on the segmentation results of the U-Net, both qualitatively and quantitatively.
4. The most optimum loss function for this application:
• (Binary) Cross Entropy
• Dice coefficient
• Focal loss
• Weighted combination of the loss functions above
5. Effect of dropout and drop connect: In which layer is dropout the most effective one?Maybe using that in all layers is the best approach? Is using dropout in combination with normalization techniques (batch normalization) even advantageous?
6. Effect of different data augmentation techniques, e.g., flip, rotate, random crop, random transformation, etc. on the segmentation performance.
7. Effect of transfer learning:

7.1. Is pre-training the decoder, encoder, and bottleneck of the U-Net separately or all at once on other datasets beneficial? Is it effective to tackle the limited training data and the class-imbalance problem in the dataset?
7.2. The effect of transfer learning from the high quality images (quality factor=[1:3]) to the low quality ones (quality factor=[1:3]).

8. Improved architectures of U-Net: For a thorough review on some of the architecture in one place, please refer to Taghanaki et al. [11].

8.1. Feedforward Auto-Encoder 8.2. FCN
8.3. Seg-Net
8.4. U-Net
8.5. U-Net++ [10]
8.6. Tiramisu Network [12]

 

References

[1] Zhang et al. “Automatically delineating the calving front of Jakobshavn Isbræ from multitemporal TerraSAR-X images: a deep learning approach.” The Cryosphere 13, no. 6 (2019): 1729-1741.

[2] Dalla Mura, Mauro, et al. “Morphological attribute profiles for the analysis of very high resolution images.” IEEE Transactions on Geoscience and Remote Sensing 48.10 (2010): 3747-3762.
[3] Ghamisi, Pedram, Mauro Dalla Mura, and Jon Atli Benediktsson. “A survey on spectral–spatial classification techniques based on attribute profiles.” IEEE Transactions on Geoscience and Remote Sensing 53.5 (2014): 2335-2353.
[4] Boldt, Markus, et al. “SAR image segmentation using morphological attribute profiles.” The International Archives of Photogrammetry, Remote Sensing and Spatial Information Sciences 40.3 (2014): 39.
[5] Pham, Minh-Tan, Erchan Aptoula, and Sébastien Lefèvre. “Feature profiles from attribute filtering for classification of remote sensing images.” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 11.1 (2017): 249-256.
[6] Tombak, Ayşe, et al. “Pixel-Based Classification of SAR Images Using Feature Attribute Profiles.” IEEE Geoscience and Remote Sensing Letters 16.4 (2018): 564-567.
[7] Chen, Liang-Chieh, et al. “Rethinking atrous convolution for semantic image segmentation.” arXiv preprint arXiv:1706.05587 (2017).
[8] Zhang, Qiao, et al. “Image segmentation with pyramid dilated convolution based on ResNet and U-Net.” International Conference on Neural Information Processing. Springer, Cham, 2017.
[9] Zhou, Xiao-Yun, and Guang-Zhong Yang. “Normalization in training U-Net for 2-D biomedical semantic segmentation.” IEEE Robotics and Automation Letters 4.2 (2019): 1792-1799.
[10] Zhou, Zongwei, et al. “Unet++: A nested u-net architecture for medical image segmentation.” Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support. Springer, Cham, 2018. 3-11.
[11] Taghanaki, Saed Asgari, et al. “Deep Semantic Segmentation of Natural and Medical Images: A Review.” arXiv preprint arXiv:1910.07655 (2019).
[12] Jégou, Simon, et al. “The one hundred layers tiramisu: Fully convolutional densenets for semantic segmentation.” Proceedings of the IEEE conference on computer vision and pattern recognition workshops. 2017.