Hello everyone!
I'm currently facing a challenge in a client project where we have to read all the text from a PDF image (particularly a signed and scanned document) and compare it to the template source to spot any differences.
Our biggest issues right now are:
- What is the best form to read the PDF? If we are going the OCR way, it will never be 100% accurate (and it has to, since we are comparing it to the original document to spot differences); plus, then we have to spy Adobe Reader, worry about zooming, scrolling down, etc.
- How can we compare text and get a percentage of match? Is there any VBO available that does this?
We know there is third-party apps that can do this, like Abbyy, however we would like to first test non-third-party solutions before we go that route, since this document has sensitive data.
Thanks in advance for any help you may provide.
Best Regards,
André Sales.
------------------------------
André Sales Lopes
Consultant
EY
Europe/London
------------------------------