Multimodal imaging is used by conservators and scientists to study the composition of paintings. To aid the combined analysis of these scans, such images must first be aligned. Rather than proposing a new domain-specific descriptor, we explore and evaluate how existing feature de
...
Multimodal imaging is used by conservators and scientists to study the composition of paintings. To aid the combined analysis of these scans, such images must first be aligned. Rather than proposing a new domain-specific descriptor, we explore and evaluate how existing feature descriptors from related fields can improve the performance of feature-based painting scan registration. We benchmark these descriptors on pixel-precise, manually aligned scans of “Girl with a Pearl Earring” by Johannes Vermeer (c. 1665, Mauritshuis) and of “18th Century Portrait of a Woman”. As a baseline we compare against the well-established classical SIFT descriptor. We consider two recent descriptors: the handcrafted multimodal MFD descriptor, and the learned unimodal SuperPoint descriptor. Experiments show that SuperPoint starkly increases description matching accuracy by 40% for modalities with little modality-specific artefacts. Further, performing craquelure segmentation and using the MFD descriptor results in significant description matching accuracy improvements for modalities with many modality-specific artefacts.