Implementation and Evaluation of Cluster-based Self-supervised Learning Methods

Prototypical Contrastive Learning (PCL) [1] is a new unsupervised representation learning method
which unifies the two directions of unsupervised learning: clustering and contrastive learning. This
method can train deep neural networks from millions of unlabeled images. Conventional contrastive
learning was instance-based instead of prototype-based. The authors introduced prototypes as the
cluster centers of similar images. The training setup works in an EM-like scheme: Find the distribution
of prototypes by clustering in step E; optimize network by performing contrastive learning in the M step.
Additionally, they proposed the ProtoNCE loss, which generalizes the commonly used InfoNCE loss.
With this method, the authors report over 10% performance improvement across multiple benchmarks.
The clustering of the PCL is computed by k-means. However, this clustering may deteroriate over time,
causing problems, such as classifying all samples into the same category. The solution proposed by
Asano and Rupprecht [2] is to add a constraint, the labels must be equally distributed to all samples,
that is, to maximize the information between the indicators and labels of the sample. The problem
of label assignment is equivalent to optimal transport. In order to expand to millions of samples and
thousands of categories, a fast version of the Sinkhorn-Knopp algorithm is used to find an approxi-
mate solution. In summary, they replace the k-means in DeepCluster [3] with the Sinkhorn-Knopp
algorithm to approximate the label assignment Q, and then use cross-entropy to learn the representation.
In this work, the k-means clustering of the PCL shall be replaced by the Sinkhorn-Knopp algo-
rithm and be thoroughly evaluated on multiple datasets.
The thesis consists of the following milestones:
• Literature study on self-supervised learning [4][5]
• Implementation of [1] and [2]
• Implementation of the combination of PCL and Self-labelling with Optimal Transport
• Thorough evaluation using different datasets and compare it with PCL and Self-labeling
• Comparison with other self-supervised learning papers
• Further experiments regarding learning procedure and network architecure
The implementation should be done in Python, PyTorch.

References
[1] Junnan Li, Pan Zhou, Caiming Xiong, Richard Socher, and Steven C. H. Hoi. Prototypical Contrastive
Learning of Unsupervised Representations. arXiv:2005.04966 [cs], July 2020. arXiv: 2005.04966.
[2] Yuki Markus Asano, Christian Rupprecht, and Andrea Vedaldi. Self-labelling via simultaneous clustering
and representation learning. arXiv:1911.05371 [cs], February 2020. arXiv: 1911.05371.
[3] Mathilde Caron, Piotr Bojanowski, Armand Joulin, and Matthijs Douze. Deep Clustering for Unsupervised
Learning of Visual Features. In Computer Vision – ECCV 2018, volume 11218. Cham: Springer International
Publishing, 2018.
[4] Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. A Simple Framework for
Contrastive Learning of Visual Representations. arXiv:2002.05709 [cs, stat], June 2020. arXiv: 2002.05709.
[5] Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. Momentum Contrast for Unsupervised
Visual Representation Learning. arXiv e-prints, arXiv:1911.05722 (Nov. 2019). arXiv: 1911.05722 [cs.CV]].