Glyph Extraction from Historic Document Images (bibtex)

by Meyer-Lerbs, Lothar, Schuldt, Arne and Gottfried, Björn

Abstract:

This paper is about the reproduction of ancient texts with vectorized fonts. A reproduction process does not necessarily require the recognition of characters. In OCR workflows only recognition rates count. Our system aims to extract all characters from printed historic documents, without the employment of knowledge of language, font or writing system. It searches for the best prototypes and creates a document-specific font from these glyphs. To reach this goal, many common OCR preprocessing steps are no longer adequate. We describe the necessary changes, used in our system that deals particularly with documents typeset in Fraktur. On the one hand, algorithms are described that extract glyphs accurately for the purpose of precise reproduction. On the other hand, classification results of extracted Fraktur glyphs are presented for different shape descriptors.

Reference:

Meyer-Lerbs, Lothar, Schuldt, Arne and Gottfried, Björn, "Glyph Extraction from Historic Document Images", In DocEng2010, ACM Press, Manchester, UK, 2010. To appear

Bibtex Entry:

@INPROCEEDINGS{MeyerLerbs2010a,
  author = {Meyer-Lerbs, Lothar and Schuldt, Arne and Gottfried, Bj{\"o}rn},
  title = {{Glyph Extraction from Historic Document Images}},
  booktitle = {DocEng2010},
  year = {2010},
  address = {Manchester, UK},
  month = {September21--24},
  publisher = {ACM Press},
  note = {To appear},
  abstract = {This paper is about the reproduction of ancient texts with vectorized
	fonts. A reproduction process does not necessarily require the recognition
	of characters. In OCR workflows only recognition rates count. Our
	system aims to extract all characters from printed historic documents,
	without the employment of knowledge of language, font or writing
	system. It searches for the best prototypes and creates a document-specific
	font from these glyphs. To reach this goal, many common OCR preprocessing
	steps are no longer adequate. We describe the necessary changes,
	used in our system that deals particularly with documents typeset
	in Fraktur. On the one hand, algorithms are described that extract
	glyphs accurately for the purpose of precise reproduction. On the
	other hand, classification results of extracted Fraktur glyphs are
	presented for different shape descriptors.},
  owner = {pmania},
  timestamp = {2012.11.06}
}