Paper
5 October 2021 A new method for duplicate document image detection with page layout
Yafeng Li
Author Affiliations +
Proceedings Volume 11911, 2nd International Conference on Computer Vision, Image, and Deep Learning; 1191107 (2021) https://doi.org/10.1117/12.2604709
Event: 2nd International Conference on Computer Vision, Image and Deep Learning, 2021, Liuzhou, China
Abstract
The document images often appear in the digital library, social media, e-mail etc. The duplicate copies of the same content bring burden to the management system and waste network traffic and storage resources. This paper proposes a new algorithm for detecting the duplicate document images in large-scale image data sets. The key idea of the proposed algorithm lies in taking advantage of the characteristics of the document image that is structured because of the page layout. In this paper, the text lines are exacted to be taken as elements features of the document image and the Fréchet Distance is introduced to measure the similarity of these features. The experimental results of different types of electronic documents show the advantages of the proposed algorithm in accuracy and stability.
© (2021) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Yafeng Li "A new method for duplicate document image detection with page layout", Proc. SPIE 11911, 2nd International Conference on Computer Vision, Image, and Deep Learning, 1191107 (5 October 2021); https://doi.org/10.1117/12.2604709
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Image retrieval

Digital image processing

Image processing

Digital imaging

Image processing algorithms and systems

Optical character recognition

Back to Top