Thesis Description
Proper initialization of convolution kernel is crucial for a highly optimized deep learning neural network [1]. A popular way to instantiate these kernels is random assignment of weights [2]. It follows a gaussian distribution pattern with a mean value of 0 and standard deviation of 1. Despite being easy to implement in a neural network it has quite a few downsides like not finding the global optima or slowing the training process down. As a further improvement to random assignment Xavier Glorot et al. proposed “Xavier initialization value (Xe)” [3] for convolution kernels. This method follows an uniform distribution with a 0 mean and a variance of 1/n where n is the total number of input neurons. Although, training process is faster with increased convergence speed, the derivation process of Xe initialization is based on the assumption that the activation function is linear, which is not the case for popular activation functions such as Rectified Linear Unit (Relu). To mitigate this issue Kaiming He et al. proposed He initialization [4] targeted more toward Relu activation function. He uses a gaussian uniform distribution of 0 mean and a variance of 2/n. All of the above initialization techniques for convolution kernels are based on independent initialization of kernel weights, not taking into account the already available data of training samples. The kernel weights are trained in such a way that these randomly generated values are tried to be matched against the local pattern of the images. In every iteration, the trainer tries to minimize the error between the kernel weights and the local features, which leads further to convergence. As this is a probability event, so it takes quite a lot of iteration after which the convolution kernels can have better match with the local features. This translates into slow down of network, with a larger training time and longer convergence rate. Different methods for initializing the convolution kernel have taken this issue into account. OrthoNorm is another method that uses orthogonal matrix for kernel initialization. It can successfully be used in non-linear networks as well unlike random assignment [5]. There is also “Layer sequence unit variance (LSUV)” method which takes the orthogonal initialization to the iterative process. It uses singular value decomposition SVD to replace the weights initiated with gaussian noise [6]. In 2014 Tsung-HanChan et al. proposed a Principal Component Analysis (PCA) based method for convolution kernel initialization [7]. The model gets all image patches from a feature map and initializes
the convolution kernel by calculating the principal components of image patches. This thesis aims to further improve the PCA based kernel initialization method by incorporating
ground truth GT images. GT images are already labeled and can be used to find suitable feature sets. Leveraging the dominant features from these sets and using them as convolution kernel weights, a dependency between training images and convolution kernels is created. It could theoretically decrease the training time and improve overall convergence rate [1]. Extensive benchmarking of the proposed initialization method along with other quantitative measures needs to be taken into account
while developing the system which is also included in the scope of this thesis. To achieve the goals of the thesis work, already existing tools and libraries such as, Pytorch Lightning(
www.pytorchlightning.ai), Monai(monai.io),Weights and Biases(wandb.ai), python(www.python.org) and notable python scientific packages shall be used and re-used where possible.
The thesis will comprise the following work items:
Literature overview of improved convolution kernel initialization method
Design and formalization of the system to be developed
Overview and explanation of the algorithms used
System development including code implementation
Quantitative evaluation of the implemented system on medical image data
References
[1] Chunyu Xu and Hong Wang. Research on a convolution kernel initialization method for speeding
up the convergence of cnn. Applied Sciences, 12:633, 01 2022.
[2] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional
neural networks. In F. Pereira, C.J. Burges, L. Bottou, and K.Q. Weinberger, editors,
Advances in Neural Information Processing Systems, volume 25. Curran Associates, Inc., 2012.
[3] Xavier Glorot and Yoshua Bengio. Understanding the difficulty of training deep feedforward
neural networks. In Yee Whye Teh and D. Mike Titterington, editors, AISTATS, volume 9 of
JMLR Proceedings, pages 249–256. JMLR.org, 2010.
[4] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Delving deep into rectifiers: Surpassing
human-level performance on imagenet classification, 2015.
[5] Andrew M. Saxe, James L. McClelland, and Surya Ganguli. Exact solutions to the nonlinear
dynamics of learning in deep linear neural networks, 2013.
[6] Dmytro Mishkin and Jiri Matas. All you need is a good init, 2015.
[7] Tsung-Han Chan, Kui Jia, Shenghua Gao, Jiwen Lu, Zinan Zeng, and Yi Ma. PCANet: A
simple deep learning baseline for image classification? IEEE Transactions on Image Processing,
24(12):5017–5032, dec 2015.