<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Read Text with OCR Issues in Product Forum</title>
    <link>https://community.blueprism.com/t5/Product-Forum/Read-Text-with-OCR-Issues/m-p/83053#M34384</link>
    <description>There was a client that was having some small issues with their "Read Text with OCR".&amp;nbsp; The client had the OCR working most of the time but was sometimes getting small errors.&amp;nbsp; Mainly they were having issues with the OCR being able to "read" Vs and Ws.&amp;nbsp; They were also having issues with some 7s being read as 1s.&amp;nbsp; We were able to figure out a way to get the Read Text with OCR to work correctly.&amp;nbsp; I hope this works for you and your environment.&lt;BR /&gt;&lt;BR /&gt;The OCR that Blue Prism is using is Tesseract.&amp;nbsp; When we updated the Tesseract file to the newest version it was better able to recognize the text we were working with.&lt;BR /&gt;&lt;BR /&gt;We downloaded the English version of Tesseract from &lt;A href="https://github.com/tesseract-ocr/tessdata_best" target="_blank" rel="noopener"&gt;here&lt;/A&gt;.&amp;nbsp; The files are in alphabetical order, so if English is desired select the eng.traineddat option.&lt;BR /&gt;&lt;BR /&gt;On the machine running Blue Prism, you will need to navigate to the Tesseract file portion of Blue Prism.&amp;nbsp; The default location is "C:\Program Files\Blue Prism Limited\Blue Prism Automate\Tesseract\tessdata"&amp;nbsp; Once there you will see a file called eng.traineddata.&amp;nbsp; At this point you have 2 options:&lt;BR /&gt;1- Rename the existing file to something like OLD.traineddata.&amp;nbsp; With that done, move the newly downloaded eng.traineddata into the file.&amp;nbsp; We renamed the new file BEST.traineddata (that will come into play later).&amp;nbsp;&amp;nbsp;&lt;BR /&gt;2- Delete the existing eng.traineddata and replace it with the newly downloaded eng.traineddata.&lt;BR /&gt;&lt;BR /&gt;Where my original is named eng.traineddata and the new one is BEST.traineddata; my files look like this,&amp;nbsp;&lt;BR /&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="23607.png"&gt;&lt;img src="https://community.blueprism.com/t5/image/serverpage/image-id/23740i0B37D35ECFCB5BE5/image-size/large?v=v2&amp;amp;px=999" role="button" title="23607.png" alt="23607.png" /&gt;&lt;/span&gt;&lt;BR /&gt;&lt;BR /&gt;With the new traineddata in place, go back to your Read stage that was having difficulties.&amp;nbsp; Continue using the same object.&amp;nbsp; If you added a new name to the traineddata you will need to call out the new name in the Language section.&amp;nbsp; &amp;nbsp;In my case, it was using the "BEST".&amp;nbsp; If you replaced the traineddata you do not need to make any changes.&lt;BR /&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="23608.png"&gt;&lt;img src="https://community.blueprism.com/t5/image/serverpage/image-id/23741iD1B7842D52635F10/image-size/large?v=v2&amp;amp;px=999" role="button" title="23608.png" alt="23608.png" /&gt;&lt;/span&gt;&lt;BR /&gt;Having both versions of the traineddata on the machine allowed use to call either version of the OCR engine.&amp;nbsp; If I wanted to use the original version I would specify the Language section with "eng" (everything before the period on the original traineddata file).&amp;nbsp; This has the original Tesseract engine work on the OCR.&amp;nbsp; If I wanted to new version I would set the language to "BEST".&amp;nbsp;&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;I did not have to restart the service or application for the changes to work.&amp;nbsp; With this update, we found the OCR is more accurate.&amp;nbsp; I hope that can help you to get the OCR to "read" better.&lt;BR /&gt;&lt;BR /&gt;Thanks,&lt;BR /&gt;&lt;BR /&gt;------------------------------&lt;BR /&gt;Josh Bryan&lt;BR /&gt;------------------------------</description>
    <pubDate>Thu, 16 Dec 2021 00:09:00 GMT</pubDate>
    <dc:creator>JoshBryan</dc:creator>
    <dc:date>2021-12-16T00:09:00Z</dc:date>
    <item>
      <title>Read Text with OCR Issues</title>
      <link>https://community.blueprism.com/t5/Product-Forum/Read-Text-with-OCR-Issues/m-p/83053#M34384</link>
      <description>There was a client that was having some small issues with their "Read Text with OCR".&amp;nbsp; The client had the OCR working most of the time but was sometimes getting small errors.&amp;nbsp; Mainly they were having issues with the OCR being able to "read" Vs and Ws.&amp;nbsp; They were also having issues with some 7s being read as 1s.&amp;nbsp; We were able to figure out a way to get the Read Text with OCR to work correctly.&amp;nbsp; I hope this works for you and your environment.&lt;BR /&gt;&lt;BR /&gt;The OCR that Blue Prism is using is Tesseract.&amp;nbsp; When we updated the Tesseract file to the newest version it was better able to recognize the text we were working with.&lt;BR /&gt;&lt;BR /&gt;We downloaded the English version of Tesseract from &lt;A href="https://github.com/tesseract-ocr/tessdata_best" target="_blank" rel="noopener"&gt;here&lt;/A&gt;.&amp;nbsp; The files are in alphabetical order, so if English is desired select the eng.traineddat option.&lt;BR /&gt;&lt;BR /&gt;On the machine running Blue Prism, you will need to navigate to the Tesseract file portion of Blue Prism.&amp;nbsp; The default location is "C:\Program Files\Blue Prism Limited\Blue Prism Automate\Tesseract\tessdata"&amp;nbsp; Once there you will see a file called eng.traineddata.&amp;nbsp; At this point you have 2 options:&lt;BR /&gt;1- Rename the existing file to something like OLD.traineddata.&amp;nbsp; With that done, move the newly downloaded eng.traineddata into the file.&amp;nbsp; We renamed the new file BEST.traineddata (that will come into play later).&amp;nbsp;&amp;nbsp;&lt;BR /&gt;2- Delete the existing eng.traineddata and replace it with the newly downloaded eng.traineddata.&lt;BR /&gt;&lt;BR /&gt;Where my original is named eng.traineddata and the new one is BEST.traineddata; my files look like this,&amp;nbsp;&lt;BR /&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="23607.png"&gt;&lt;img src="https://community.blueprism.com/t5/image/serverpage/image-id/23740i0B37D35ECFCB5BE5/image-size/large?v=v2&amp;amp;px=999" role="button" title="23607.png" alt="23607.png" /&gt;&lt;/span&gt;&lt;BR /&gt;&lt;BR /&gt;With the new traineddata in place, go back to your Read stage that was having difficulties.&amp;nbsp; Continue using the same object.&amp;nbsp; If you added a new name to the traineddata you will need to call out the new name in the Language section.&amp;nbsp; &amp;nbsp;In my case, it was using the "BEST".&amp;nbsp; If you replaced the traineddata you do not need to make any changes.&lt;BR /&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="23608.png"&gt;&lt;img src="https://community.blueprism.com/t5/image/serverpage/image-id/23741iD1B7842D52635F10/image-size/large?v=v2&amp;amp;px=999" role="button" title="23608.png" alt="23608.png" /&gt;&lt;/span&gt;&lt;BR /&gt;Having both versions of the traineddata on the machine allowed use to call either version of the OCR engine.&amp;nbsp; If I wanted to use the original version I would specify the Language section with "eng" (everything before the period on the original traineddata file).&amp;nbsp; This has the original Tesseract engine work on the OCR.&amp;nbsp; If I wanted to new version I would set the language to "BEST".&amp;nbsp;&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;I did not have to restart the service or application for the changes to work.&amp;nbsp; With this update, we found the OCR is more accurate.&amp;nbsp; I hope that can help you to get the OCR to "read" better.&lt;BR /&gt;&lt;BR /&gt;Thanks,&lt;BR /&gt;&lt;BR /&gt;------------------------------&lt;BR /&gt;Josh Bryan&lt;BR /&gt;------------------------------</description>
      <pubDate>Thu, 16 Dec 2021 00:09:00 GMT</pubDate>
      <guid>https://community.blueprism.com/t5/Product-Forum/Read-Text-with-OCR-Issues/m-p/83053#M34384</guid>
      <dc:creator>JoshBryan</dc:creator>
      <dc:date>2021-12-16T00:09:00Z</dc:date>
    </item>
    <item>
      <title>RE: Read Text with OCR Issues</title>
      <link>https://community.blueprism.com/t5/Product-Forum/Read-Text-with-OCR-Issues/m-p/83054#M34385</link>
      <description>Thanks a lot for sharing it with us &lt;A class="user-content-mention" data-sign="@" data-contactkey="f939d0cd-592d-466c-908e-eb4cef1660ca" data-tag-text="@Josh Bryan" href="https://community.blueprism.com/network/profile?UserKey=f939d0cd-592d-466c-908e-eb4cef1660ca" data-itemmentionkey="add08618-335f-4367-807c-ab07279c96a4"&gt;@Josh Bryan&lt;/A&gt; &lt;BR /&gt;&lt;BR /&gt;Working with Teserract OCR definitely has been a tricky hit or miss kind of thing even for me in my prior engagements. This surely is great tip to try and explore :)​&lt;BR /&gt;&lt;BR /&gt;------------------------------&lt;BR /&gt;----------------------------------&lt;BR /&gt;Hope it helps you and if it resolves you query please mark it as the best answer so that others having the same problem can track the answer easily&lt;BR /&gt;&lt;BR /&gt;Regards,&lt;BR /&gt;Devneet Mohanty&lt;BR /&gt;Intelligent Automation Consultant&lt;BR /&gt;Blueprism 6x Certified Professional&lt;BR /&gt;Website: &lt;A href="https://devneet.github.io/" target="test_blank"&gt;https://devneet.github.io/&lt;/A&gt;&lt;BR /&gt;Email: devneetmohanty07@gmail.com&lt;BR /&gt;&lt;BR /&gt;----------------------------------&lt;BR /&gt;------------------------------&lt;BR /&gt;</description>
      <pubDate>Thu, 16 Dec 2021 06:33:00 GMT</pubDate>
      <guid>https://community.blueprism.com/t5/Product-Forum/Read-Text-with-OCR-Issues/m-p/83054#M34385</guid>
      <dc:creator>devneetmohanty07</dc:creator>
      <dc:date>2021-12-16T06:33:00Z</dc:date>
    </item>
  </channel>
</rss>

