Concepts to Computational Constructs: Advanced Scene Understanding for Heterogeneous Artworks Using Deep Learning

Document Type
Doctoral Thesis
Issue Date
Issue Year
Madhu, Prathmesh

Due to the mass digitisation of paintings, manual examination and understanding of individual images is a cumbersome task. Developing automatic methods using computer vision and machine learning techniques is extremely useful for humanities experts, who are generally interested in understanding the origin of objects, iconographies, and narratives in artworks. Digital humanities has become a predominant field in the last decade for understanding and connecting the past, present, and future via artworks in the digitised form of text and images. The aim is to have quicker access to the information, uncover hidden trends and validate the theoretical learning from large data collections. Understanding artworks is challenging in digital humanities due to their subjective nature and lack of annotated data. Recently, deep learning-based methods have shown commendable performance on real-world images. One simple way is to learn algorithms on existing real-world photographs and test them on artwork images. Since the artwork images have a highly different data distribution, these algorithms often fail to generalise well, commonly referred to as the domain shift problem.

This thesis develops several scene understanding methods from a digital humanities perspective, targeting Art history, Christian, and Classical archaeology domains. The focus lies on (a) developing methods for character-, iconography, and object recognition, and (b) beyond recognition, especially targeting pose-estimation and novel image compositions. Particular attention is given to methods beyond recognition, where the theoretical concepts from Art history are converted into a computational method for understanding and linking iconography. For methods in recognition, starting with recognising characters for Art History, a two-step style transfer learning algorithm is developed. This work is extended to iconography recognition, where a detailed analysis of the impact of styles using supervised and self-supervised models is presented. To mitigate the problem of the availability of few annotations, a one-shot object detection pipeline with advanced augmentations such as context and crop is developed for heterogeneous artworks.

For methods beyond recognition, first, the task of linking narratives in Greek vase paintings is considered using pose estimation with as few as 1500 pose annotations. The proposed two-step style transfer learning for recognition is extended to enhance pose estimation and build a pose-based image retrieval system to link narratives in Classical archaeology. Finally, a novel computational algorithm is developed, namely Image Composition Canvas (ICC) which is an operationalisation based on compositions in paintings presented by Hetzer and extended by Max Imdahl to understand artworks. The concept of the mid-level feature extraction method presented by Imdahl is constructed and extended to an image retrieval system (ICC++) with explainable features. The proposed mid-level composition features are extremely lightweight and outperform the existing state-of-the-art, which only uses detected pose key points to link the images. The detailed qualitative and quantitative results show the potential to improve the image composition methods further to introduce complex composition features. This work, therefore, builds new constructs and proof of concepts for artwork scene understanding tasks, including recognition and beyond, allowing a detailed understanding of styles for domain adaptation from both digital humanities and computer vision perspectives.

Faculties & Collections
Zugehörige ORCIDs