<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Extract text from PDF (scanned image) in Product Forum</title>
    <link>https://community.blueprism.com/t5/Product-Forum/Extract-text-from-PDF-scanned-image/m-p/98838#M46509</link>
    <description>&lt;P&gt;Hi All - I am trying to read the text from an scanned image which is basically saved in a PDF format. Since, this is an image and not an editable PDF - i am not able to use some of the existing VBO's from digital exchange to read the text.&lt;/P&gt;
&lt;P&gt;One of the options i came across is the Cloud Vision API skill which has an action 'Document Text Extraction' but it only uses image as an input. Since it only accepts images, i am not able to send the PDF file as input. An alternate is to take screenshot and pass it as an image to the API - not sure if its the best approach.&lt;/P&gt;
&lt;P&gt;I came across another functionality in Cloud Vision API (&lt;A href="https://cloud.google.com/vision/docs/pdf" target="_blank" rel="noopener"&gt;https://cloud.google.com/vision/docs/pdf&lt;/A&gt;) - 'Detext text in files(PDF/TIFF) - however this is not available via the BluePrism skill.&lt;/P&gt;
&lt;P&gt;Please let me know of any solutions that you've implemented for this use case!&lt;/P&gt;
&lt;P&gt;
&lt;/P&gt;&lt;P&gt;Thanks in advance!&lt;/P&gt;</description>
    <pubDate>Wed, 06 Dec 2023 20:58:52 GMT</pubDate>
    <dc:creator>maneesh.vemula1</dc:creator>
    <dc:date>2023-12-06T20:58:52Z</dc:date>
    <item>
      <title>Extract text from PDF (scanned image)</title>
      <link>https://community.blueprism.com/t5/Product-Forum/Extract-text-from-PDF-scanned-image/m-p/98838#M46509</link>
      <description>&lt;P&gt;Hi All - I am trying to read the text from an scanned image which is basically saved in a PDF format. Since, this is an image and not an editable PDF - i am not able to use some of the existing VBO's from digital exchange to read the text.&lt;/P&gt;
&lt;P&gt;One of the options i came across is the Cloud Vision API skill which has an action 'Document Text Extraction' but it only uses image as an input. Since it only accepts images, i am not able to send the PDF file as input. An alternate is to take screenshot and pass it as an image to the API - not sure if its the best approach.&lt;/P&gt;
&lt;P&gt;I came across another functionality in Cloud Vision API (&lt;A href="https://cloud.google.com/vision/docs/pdf" target="_blank" rel="noopener"&gt;https://cloud.google.com/vision/docs/pdf&lt;/A&gt;) - 'Detext text in files(PDF/TIFF) - however this is not available via the BluePrism skill.&lt;/P&gt;
&lt;P&gt;Please let me know of any solutions that you've implemented for this use case!&lt;/P&gt;
&lt;P&gt;
&lt;/P&gt;&lt;P&gt;Thanks in advance!&lt;/P&gt;</description>
      <pubDate>Wed, 06 Dec 2023 20:58:52 GMT</pubDate>
      <guid>https://community.blueprism.com/t5/Product-Forum/Extract-text-from-PDF-scanned-image/m-p/98838#M46509</guid>
      <dc:creator>maneesh.vemula1</dc:creator>
      <dc:date>2023-12-06T20:58:52Z</dc:date>
    </item>
    <item>
      <title>Re: Extract text from PDF (scanned image)</title>
      <link>https://community.blueprism.com/t5/Product-Forum/Extract-text-from-PDF-scanned-image/m-p/98839#M46510</link>
      <description>&lt;P&gt;Hi Maneesh Vemula,&lt;/P&gt;
&lt;P&gt;
&lt;/P&gt;&lt;P&gt;In general Text recognizer in AWS and Form Recognizer in Azure will work for the requirement you are looking,&amp;nbsp;&lt;BR /&gt;There are plenty of other document extractions tools( like hyperscience, Abby and Google Vision API...)&amp;nbsp; are present some of them you need to convert the data from pdf to base 64 before trying to extract it.&lt;/P&gt;
&lt;P&gt;
&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 07 Dec 2023 00:35:39 GMT</pubDate>
      <guid>https://community.blueprism.com/t5/Product-Forum/Extract-text-from-PDF-scanned-image/m-p/98839#M46510</guid>
      <dc:creator>harish.mogulluri</dc:creator>
      <dc:date>2023-12-07T00:35:39Z</dc:date>
    </item>
    <item>
      <title>Re: Extract text from PDF (scanned image)</title>
      <link>https://community.blueprism.com/t5/Product-Forum/Extract-text-from-PDF-scanned-image/m-p/98840#M46511</link>
      <description>&lt;DIV&gt;Helo,&lt;/DIV&gt;
&lt;DIV&gt;&lt;/DIV&gt;
&lt;DIV&gt;&lt;SPAN&gt;You can use Blue Prism Decipher or some other third-party tool to convert the image to text and then extract it.&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;DIV&gt;&lt;SPAN&gt;&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;DIV&gt;&lt;SPAN&gt;Regards,&lt;/SPAN&gt;&lt;/DIV&gt;</description>
      <pubDate>Thu, 07 Dec 2023 03:57:04 GMT</pubDate>
      <guid>https://community.blueprism.com/t5/Product-Forum/Extract-text-from-PDF-scanned-image/m-p/98840#M46511</guid>
      <dc:creator>LeonardoSQueiroz</dc:creator>
      <dc:date>2023-12-07T03:57:04Z</dc:date>
    </item>
  </channel>
</rss>

