I'd be wary about taking on such a task, it could be horribly difficult. If the images were consistent then you could map the page with regions and then you'd only have to think about image quality and the reliability of the OCR results. But if the pages aren't consistent then you've two (potentially huge) problems, 1) how to locate the stuff you want to read and 2) how to read it accurately.