A New Generation of Textual Corpora: Mining Corpora from Very Large Collections

Crane, Gregory
Stewart, Gordon
Babeu, Alison

While digital libraries based on page images and automatically generated text have made possible massive projects such as the Million Book Library, Open Content Alliance, Google, and others, humanists still depend upon textual corpora expensively produced with labor-intensive methods such as double-keyboarding and manual correction. This paper reports the results from an analysis of OCR-generate... read more

Digital libraries
OCR evaluation
Ancient Greek
Text alignment
Perseus Project
Permanent URL
ID: tufts:PB.001.001.00006
To Cite: DCA Citation Guide
Usage: Detailed Rights