Prathmesh Madhu
Concepts to Computational Constructs: Advanced Scene Understanding for Heterogeneous Artworks Using Deep Learning
Due to the mass digitisation of paintings, manual examination and understanding
of individual images is a cumbersome task. Developing automatic methods using
computer vision and machine learning techniques is extremely useful for humanities
experts, who are generally interested in understanding the origin of objects, iconographies,
and narratives in artworks. Digital humanities has become a predominant field
in the last decade for understanding and connecting the past, present, and future via
artworks in the digitised form of text and images. The aim is to have quicker access
to the information, uncover hidden trends and validate the theoretical learning from
large data collections. Understanding artworks is challenging in digital humanities
due to their subjective nature and lack of annotated data. Recently, deep learningbased
methods have shown commendable performance on real-world images. One
simple way is to learn algorithms on existing real-world photographs and test them
on artwork images. Since the artwork images have a highly different data distribution,
these algorithms often fail to generalise well, commonly referred to as the
domain shift problem.
This thesis develops several scene understanding methods from a digital humanities
perspective, targeting Art history, Christian, and Classical archaeology domains.
The focus lies on (a) developing methods for character-, iconography, and object recognition,
and (b) beyond recognition, especially targeting pose-estimation and novel
image compositions. Particular attention is given to methods beyond recognition,
where the theoretical concepts from Art history are converted into a computational
method for understanding and linking iconography. For methods in recognition,
starting with recognising characters for Art History, a two-step style transfer learning
algorithm is developed. This work is extended to iconography recognition, where
a detailed analysis of the impact of styles using supervised and self-supervised models
is presented. To mitigate the problem of the availability of few annotations, a
one-shot object detection pipeline with advanced augmentations such as context and
crop is developed for heterogeneous artworks.
For methods beyond recognition, first, the task of linking narratives in Greek
vase paintings is considered using pose estimation with as few as 1500 pose annotations.
The proposed two-step style transfer learning for recognition is extended to
enhance pose estimation and build a pose-based image retrieval system to link narratives
in Classical archaeology. Finally, a novel computational algorithm is developed,
namely Image Composition Canvas (ICC) which is an operationalisation based on
compositions in paintings presented by Hetzer and extended by Max Imdahl to understand
artworks. The concept of the mid-level feature extraction method presented
by Imdahl is constructed and extended to an image retrieval system (ICC++) with
explainable features. The proposed mid-level composition features are extremely
lightweight and outperform the existing state-of-the-art, which only uses detected
pose key points to link the images. The detailed qualitative and quantitative results
show the potential to improve the image composition methods further to introduce
complex composition features. This work, therefore, builds new constructs and proof
of concepts for artwork scene understanding tasks, including recognition and beyond,
allowing a detailed understanding of styles for domain adaptation from both digital
humanities and computer vision perspectives.