Reading PDF picture with OCR and comparing text to template

andresales — Fri, 07 Jun 2019 10:26:00 GMT

Hello everyone!

I'm currently facing a challenge in a client project where we have to read all the text from a PDF image (particularly a signed and scanned document) and compare it to the template source to spot any differences.

Our biggest issues right now are:

What is the best form to read the PDF? If we are going the OCR way, it will never be 100% accurate (and it has to, since we are comparing it to the original document to spot differences); plus, then we have to spy Adobe Reader, worry about zooming, scrolling down, etc.

How can we compare text and get a percentage of match? Is there any VBO available that does this?

We know there is third-party apps that can do this, like Abbyy, however we would like to first test non-third-party solutions before we go that route, since this document has sensitive data.

Thanks in advance for any help you may provide.

Best Regards,
André Sales.

------------------------------
André Sales Lopes
Consultant
EY
Europe/London
------------------------------

topic Reading PDF picture with OCR and comparing text to template in Product Forum

Reading PDF picture with OCR and comparing text to template