<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Reading PDF picture with OCR and comparing text to template in Product Forum</title>
    <link>https://community.blueprism.com/t5/Product-Forum/Reading-PDF-picture-with-OCR-and-comparing-text-to-template/m-p/73138#M25743</link>
    <description>&lt;P&gt;Hello everyone!&lt;BR /&gt;&lt;BR /&gt;I'm currently facing a challenge in a client project where we have to read all the text from a PDF image (particularly a signed and scanned document) and compare it to the template source to spot any differences.&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;
Our biggest issues right now are:&lt;BR /&gt;&lt;BR /&gt;
&lt;UL&gt;
&lt;LI&gt;What is the best form to read the PDF? If we are going the OCR way, it will never be 100% accurate (and it has to, since we are comparing it to the original document to spot differences); plus, then we have to spy Adobe Reader, worry about zooming, scrolling down, etc.&lt;/LI&gt;
&lt;/UL&gt;
&lt;UL&gt;
&lt;LI&gt;How can we compare text and get a percentage of match? Is there any VBO available that does this?&lt;/LI&gt;
&lt;/UL&gt;
&lt;BR /&gt;We know there is third-party apps that can do this, like Abbyy, however we would like to first test non-third-party solutions before we go that route, since this document has sensitive data.&lt;BR /&gt;&lt;BR /&gt;Thanks in advance for any help you may provide.&lt;BR /&gt;&lt;BR /&gt;Best Regards,&lt;BR /&gt;André Sales.&lt;BR /&gt;&lt;BR /&gt;------------------------------&lt;BR /&gt;André Sales Lopes&lt;BR /&gt;Consultant&lt;BR /&gt;EY&lt;BR /&gt;Europe/London&lt;BR /&gt;------------------------------&lt;BR /&gt;</description>
    <pubDate>Fri, 07 Jun 2019 10:26:00 GMT</pubDate>
    <dc:creator>andresales</dc:creator>
    <dc:date>2019-06-07T10:26:00Z</dc:date>
    <item>
      <title>Reading PDF picture with OCR and comparing text to template</title>
      <link>https://community.blueprism.com/t5/Product-Forum/Reading-PDF-picture-with-OCR-and-comparing-text-to-template/m-p/73138#M25743</link>
      <description>&lt;P&gt;Hello everyone!&lt;BR /&gt;&lt;BR /&gt;I'm currently facing a challenge in a client project where we have to read all the text from a PDF image (particularly a signed and scanned document) and compare it to the template source to spot any differences.&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;
Our biggest issues right now are:&lt;BR /&gt;&lt;BR /&gt;
&lt;UL&gt;
&lt;LI&gt;What is the best form to read the PDF? If we are going the OCR way, it will never be 100% accurate (and it has to, since we are comparing it to the original document to spot differences); plus, then we have to spy Adobe Reader, worry about zooming, scrolling down, etc.&lt;/LI&gt;
&lt;/UL&gt;
&lt;UL&gt;
&lt;LI&gt;How can we compare text and get a percentage of match? Is there any VBO available that does this?&lt;/LI&gt;
&lt;/UL&gt;
&lt;BR /&gt;We know there is third-party apps that can do this, like Abbyy, however we would like to first test non-third-party solutions before we go that route, since this document has sensitive data.&lt;BR /&gt;&lt;BR /&gt;Thanks in advance for any help you may provide.&lt;BR /&gt;&lt;BR /&gt;Best Regards,&lt;BR /&gt;André Sales.&lt;BR /&gt;&lt;BR /&gt;------------------------------&lt;BR /&gt;André Sales Lopes&lt;BR /&gt;Consultant&lt;BR /&gt;EY&lt;BR /&gt;Europe/London&lt;BR /&gt;------------------------------&lt;BR /&gt;</description>
      <pubDate>Fri, 07 Jun 2019 10:26:00 GMT</pubDate>
      <guid>https://community.blueprism.com/t5/Product-Forum/Reading-PDF-picture-with-OCR-and-comparing-text-to-template/m-p/73138#M25743</guid>
      <dc:creator>andresales</dc:creator>
      <dc:date>2019-06-07T10:26:00Z</dc:date>
    </item>
  </channel>
</rss>

