Extract data from PDF Image files?

PaulJoseph · ‎07-08-18

I have 50 PDF files with images in it. I would need to extract the data using OCR . The PDF files are NOT consistent format. How can we achieve this? Thanks a lot for your support.

John__Carter · ‎07-08-18

I'd be wary about taking on such a task, it could be horribly difficult. If the images were consistent then you could map the page with regions and then you'd only have to think about image quality and the reliability of the OCR results. But if the pages aren't consistent then you've two (potentially huge) problems, 1) how to locate the stuff you want to read and 2) how to read it accurately.

PaulJoseph · ‎10-08-18

Thanks John. I was wondering, do we have good documentation on the parameters we deal with while working with OCR?

SS&C Blue Prism Community

Extract data from PDF Image files?