by Meyer-Lerbs, Lothar, Schuldt, Arne and Gottfried, Björn
Abstract:
This paper is about the reproduction of ancient texts with vectorized fonts. A reproduction process does not necessarily require the recognition of characters. In OCR workflows only recognition rates count. Our system aims to extract all characters from printed historic documents, without the employment of knowledge of language, font or writing system. It searches for the best prototypes and creates a document-specific font from these glyphs. To reach this goal, many common OCR preprocessing steps are no longer adequate. We describe the necessary changes, used in our system that deals particularly with documents typeset in Fraktur. On the one hand, algorithms are described that extract glyphs accurately for the purpose of precise reproduction. On the other hand, classification results of extracted Fraktur glyphs are presented for different shape descriptors.
Reference:
Meyer-Lerbs, Lothar, Schuldt, Arne and Gottfried, Björn, "Glyph Extraction from Historic Document Images", In DocEng2010, ACM Press, Manchester, UK, 2010. To appear
Bibtex Entry:
@INPROCEEDINGS{MeyerLerbs2010a,
author = {Meyer-Lerbs, Lothar and Schuldt, Arne and Gottfried, Bj{\"o}rn},
title = {{Glyph Extraction from Historic Document Images}},
booktitle = {DocEng2010},
year = {2010},
address = {Manchester, UK},
month = {September21--24},
publisher = {ACM Press},
note = {To appear},
abstract = {This paper is about the reproduction of ancient texts with vectorized
fonts. A reproduction process does not necessarily require the recognition
of characters. In OCR workflows only recognition rates count. Our
system aims to extract all characters from printed historic documents,
without the employment of knowledge of language, font or writing
system. It searches for the best prototypes and creates a document-specific
font from these glyphs. To reach this goal, many common OCR preprocessing
steps are no longer adequate. We describe the necessary changes,
used in our system that deals particularly with documents typeset
in Fraktur. On the one hand, algorithms are described that extract
glyphs accurately for the purpose of precise reproduction. On the
other hand, classification results of extracted Fraktur glyphs are
presented for different shape descriptors.},
owner = {pmania},
timestamp = {2012.11.06}
}