Extension of the Lottery Ticket Hypothesis for Saving Computational Cost and Energy

Many state-of-the-art neural networks have millions of parameters, e.g. VGG’s smallest configuration has 133 million parameters[1]. They achieve high test and training accuracies but require a high computational cost. Since powerful hardware exists, coping with that computational cost is possible but very inefficient. To countereffect these inefficiencies network pruning can be applied to decrease the size of the neural networks. Because the resulting accuracies after pruning did not match the ones of the original network for many years but in contrast hardware got more performing and cheaper, pruning was thought not to be optimal and the trend of constructing neural networks went towards creating big networks that perform consistently well on high performing hardware.

In 2019 The Lottery Ticket Hypothesis LTH [2] was introduced as a new approach for pruning neural networks. Using a binary mask the lowest weights are selectively set to zero and therefore connections are removed from the network. It suggests that fully-connected and convolutional neural networks can be iteratively pruned into a sparse subnetwork such that the parameter count can be reduced by over 90% while the number of iterations in training is at max as high as the original network’s and the test accuracy meets or even exceeds the original one. This paper opened up a large area of discussion where on the one hand some papers do not find improvements of the LTH over random initialization [3], whereas on the other hand some even found insights to why the approach works well [4].

A drawback of the lottery ticket hypothesis is however that the network’s structure of neurons still stays the same and no neurons are removed to decrease computational cost. The goal of this thesis is to investigate whether neural network pruning by reducing the number of neurons based on the idea of the lottery ticket hypothesis is possible. Additional goals would be to compute the amount of energy savings [5], compare masks and structures created by different datasets and optimizers for the same network to acquire potentially deeper insights and see if the idea of a supermask [4] also exists for the approach of neuron pruning.

The thesis aims to achieve following goals:
• Extending the lottery ticket hypothesis by actually removing neurons, instead of using a binary mask.
• Comparing the accuracies and network size to the original thesis on different datasets.

Additional investigations should be taken in:
• Comparing the network structures and masks of different datasets and optimizers.
• Compute the amount of energy savings. [5]
• Incorporate the idea of a supermask [4] to the approach of neuron pruning.
• Investigate pruning procedures aiming at removing enurons,instead of cutting connections.

[1] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. CoRR, abs/1409.1556, 2015.
[2] Jonathan Frankle and Michael Carbin. The lottery ticket hypothesis: Training pruned neural networks. CoRR, abs/1803.03635, 2018.
[3] Zhuang Liu, Mingjie Sun, Tinghui Zhou, Gao Huang, and Trevor Darrell. Rethinking the value of network pruning. CoRR, abs/1810.05270, 2018.
[4] Hattie Zhou, Janice Lan, Rosanne Liu, and Jason Yosinski. Deconstructing lottery tickets: Zeros, signs, and the supermask. CoRR, abs/1905.01067, 2019.
[5] Tien-Ju Yang, Yu-Hsin Chen, and Vivienne Sze. Designing energy-efficient convolutional neural networks using energy-aware pruning. CoRR, abs/1611.05128, 2016.