Novel View Synthesis for Augmentation of Fine-Grained Image Datasets

Current deep-learning-based classification methods require large amounts of data for training, and in certain scenarios such as in the surveillance imaging there is only a limited amount of data. The aim of the research is to generate new training images of vehicles with the same characteristics as the training data but from novel view points and investigate its suitability for fine-grained classification of vehicles.

Generative models such as generative adversarial networks (GANs) [1] allow for customization of images. However, adjusting the perspective through methods such as conditional GANs for unsupervised image-to-image translation has proven to be particularly difficult [1]. Methods such as StyleGANs [2] or neural radiance fields (NeRFs) [3] are relevant approaches to generate images with different styles and perspectives.
StyleGAN is an extension to the GAN architecture that proposes changes to the generator model such as the introduction of a mapping network. The mapping network generates intermediate latent codes which are transformed into styles that is integrated at each point in the generator network. It also includes a progressive growing approach for training generator models capable of synthesizing very large high-quality images.
NeRF can generate novel views of complex 3D scenes based on a partial set of 2D images. It is trained to directly map from spatial location and viewing direction (5D input) to opacity and color, using volume rendering [4] to render new views.

The thesis consists of the following milestones:

Literature review on the state-of-the-art approaches for GAN- and neural radiance fields-based
image synthesis
Adoption of existing GAN- and neural radiance fields-based image synthesis methods to generate
car images using different styles and camera poses [5]
Experimental evaluation and comparison of different image synthesis methods
Investigate the suitability of the generated images for fine-grained vehicle classification using
different classification methods [6], [7]

The implementation will be done in Python.

References
[1] Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio, “Generative Adversarial Networks ”, in NIPS, 2014
[2] Tero Karras, Samuli Laine, Timo Aila, “A Style-Based Generator Architecture for Generative Adversarial Networks ”, in proceedings of the IEEE/CVF Conference on CVPR, 2019
[3] Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, Ren Ng, “NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis ”, in ECCV 2020
[4] Robert A. Drebin, Loren Carpenter, Pat Hanrahan, “Volume Rendering ”, in Proceedings of SIGGRAPH 1988
[5] Jiatao Gu, Lingjie Liu, Peng Wang, Christian Theobalt, “StyleNeRF: A Style-based 3D-Aware Generator for High-resolution Image Synthesis ”, in ICLR 2022
[6] Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, Baining Guo, “Swin Transformer: Hierarchical Vision Transformer using Shifted Windows ”, in IEEE/CVF conference on ICCV, 2021
[7] Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, “Deep residual learning for image recognition ”, Proceedings of the IEEE conference on CVPR, 2016