The Early Modern OCR Project (Lead PI, 电脑google用什么翻墙) is an effort, on the one hand, to make access to texts more transparent and, on the other, to preserve a literary cultural heritage. The printing process in the hand-press period (roughly 1475-1800), while systematized to a certain extent, nonetheless produced texts with fluctuating baselines, mixed fonts, and varied concentrations of ink (among many other variables). Combining these factors with the poor quality of the images in which many of these books have been preserved (in EEBO and, to a lesser extent, ECCO), creates a problem for Optical Character Recognition (OCR) software that is trying to translate the images of these pages into archiveable, mineable texts. By using innovative applications of OCR technology and crowd-sourced corrections, eMOP will solve this OCR problem.
Meet our Team and Collaborators
Find all the eMOP tools, code and data in our Early-Modern-OCR repo on Github
See our OCR Instruction pages, and the eMOP Workflows.
Tweets by @IDHMC_Nexus