cancel
Showing results for 
Search instead for 
Did you mean: 

Extract data from PDF Image files?

PaulJoseph
Level 2
I have 50 PDF files with images in it. I would need to extract the data using OCR . The PDF files are NOT consistent format. How can we achieve this? Thanks a lot for your support.
2 REPLIES 2

John__Carter
Staff
Staff
I'd be wary about taking on such a task, it could be horribly difficult. If the images were consistent then you could map the page with regions and then you'd only have to think about image quality and the reliability of the OCR results. But if the pages aren't consistent then you've two (potentially huge) problems, 1) how to locate the stuff you want to read and 2) how to read it accurately.

PaulJoseph
Level 2
Thanks John. I was wondering, do we have good documentation on the parameters we deal with while working with OCR?