<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic The best way to read Values from a PDF in Product Forum</title>
    <link>https://community.blueprism.com/t5/Product-Forum/The-best-way-to-read-Values-from-a-PDF/m-p/78754#M30743</link>
    <description>Hello,

I want to read with BluePrsim some values (Invoicenumber, Date, total amount)&amp;nbsp;from an PDF-Invoice. Furthermore we want to use this values for several Actions.

What is the best and easiest way to solve this problem.

I would be glad about your suggestions.

Best regards

Stephan</description>
    <pubDate>Thu, 07 Jun 2018 14:44:00 GMT</pubDate>
    <dc:creator>StephanSaar</dc:creator>
    <dc:date>2018-06-07T14:44:00Z</dc:date>
    <item>
      <title>The best way to read Values from a PDF</title>
      <link>https://community.blueprism.com/t5/Product-Forum/The-best-way-to-read-Values-from-a-PDF/m-p/78754#M30743</link>
      <description>Hello,

I want to read with BluePrsim some values (Invoicenumber, Date, total amount)&amp;nbsp;from an PDF-Invoice. Furthermore we want to use this values for several Actions.

What is the best and easiest way to solve this problem.

I would be glad about your suggestions.

Best regards

Stephan</description>
      <pubDate>Thu, 07 Jun 2018 14:44:00 GMT</pubDate>
      <guid>https://community.blueprism.com/t5/Product-Forum/The-best-way-to-read-Values-from-a-PDF/m-p/78754#M30743</guid>
      <dc:creator>StephanSaar</dc:creator>
      <dc:date>2018-06-07T14:44:00Z</dc:date>
    </item>
    <item>
      <title>Hi Stephan,
The first…</title>
      <link>https://community.blueprism.com/t5/Product-Forum/The-best-way-to-read-Values-from-a-PDF/m-p/78755#M30744</link>
      <description>Hi&amp;nbsp;Stephan,
The first question is: What kind of PDF are you dealing with?
A text PDF (from which you can select the text) or an Image PDF (from which you can not select the text)
In case of Image, you need to go to an OCR solution.
In case of text, you can either copy paste it, or use an commandline tool such as PDFtoTEXT to get the data from the PDF and then parse it.
You can also automate the Adobe Acrobat Reader interface, and execute the option 'save as text' and then read out that text file for the details.
Also refer the the BluePrism manuals on dealing with PDF's.
Bottomline: There are more solutions that challenges&amp;nbsp;?
Good luck!</description>
      <pubDate>Thu, 07 Jun 2018 18:14:00 GMT</pubDate>
      <guid>https://community.blueprism.com/t5/Product-Forum/The-best-way-to-read-Values-from-a-PDF/m-p/78755#M30744</guid>
      <dc:creator>BastiaanBezemer</dc:creator>
      <dc:date>2018-06-07T18:14:00Z</dc:date>
    </item>
    <item>
      <title>Hello Bastian,
thank you for…</title>
      <link>https://community.blueprism.com/t5/Product-Forum/The-best-way-to-read-Values-from-a-PDF/m-p/78756#M30745</link>
      <description>Hello Bastian,
thank you for your help and the description of dealing with PDF`s,
It is a text based and standardized&amp;nbsp;PDF an with Ctrl+A I get all Information from the PDF.&amp;nbsp;
But I need specific Information from the PDF-file like invoice number, amount and&amp;nbsp;so on.
How can I get this Information - what is the best and easiest way to get this Information in Blue Prism - Methods (DataItem?).
Best regards
Stephan
&amp;nbsp;
&amp;nbsp;</description>
      <pubDate>Mon, 11 Jun 2018 13:27:00 GMT</pubDate>
      <guid>https://community.blueprism.com/t5/Product-Forum/The-best-way-to-read-Values-from-a-PDF/m-p/78756#M30745</guid>
      <dc:creator>StephanSaar</dc:creator>
      <dc:date>2018-06-11T13:27:00Z</dc:date>
    </item>
    <item>
      <title>Hi Stephan,
After reading…</title>
      <link>https://community.blueprism.com/t5/Product-Forum/The-best-way-to-read-Values-from-a-PDF/m-p/78757#M30746</link>
      <description>Hi Stephan,
After reading out the PDF you need to get the data from it.
How to get the data from it, depends on how it is structured.
If it is something like this:
Invoice Number: 3423423423
then you can look for the text ""Invoice Number: "" with InStr in a calculation stage to get the postion, and read out everything that follows.
If it is differently structured, you need to use a different approach.
RegEx is always a great thing to use when extracting data. If you search the forum, you'll come accross some nice examples.
Feel free to post an (anonymized) version of the text of your PDF, as it appears in your&amp;nbsp;DataItem,&amp;nbsp;if you need further hints &amp;amp; tricks.</description>
      <pubDate>Tue, 12 Jun 2018 01:58:00 GMT</pubDate>
      <guid>https://community.blueprism.com/t5/Product-Forum/The-best-way-to-read-Values-from-a-PDF/m-p/78757#M30746</guid>
      <dc:creator>BastiaanBezemer</dc:creator>
      <dc:date>2018-06-12T01:58:00Z</dc:date>
    </item>
    <item>
      <title>HI All,
Just adding to what…</title>
      <link>https://community.blueprism.com/t5/Product-Forum/The-best-way-to-read-Values-from-a-PDF/m-p/78758#M30747</link>
      <description>HI All,
Just adding to what Bastiann said.
Take a small example :- Invoice- 1234ATU&amp;nbsp; Name- Sumit
Using CTRL+A read all the elements from the pdf and use CTRL+C to copy the data. Then use the function GetClipborard() function using a Calculation stage and this will be added to the clipboard.
Using Instr() function for ""Invoice-"" that it is present in the pdf or not and also gets the starting point say for I of Invoices as 1. Then do the same for Name- using Instr() and get the index value of N of Name-. Now get the length of Invoice- say 8. now add 1+8=9 and so you are there on the blank space between Invoice- 1234ATU.
Now you are on the 9th position and you have the index value of 'N' also so get all the value from index 9 to index of N and use string manipulation and trim activity to get the value.
&amp;nbsp;
Hope this helps.
&amp;nbsp;
&amp;nbsp;</description>
      <pubDate>Tue, 26 Jun 2018 15:01:00 GMT</pubDate>
      <guid>https://community.blueprism.com/t5/Product-Forum/The-best-way-to-read-Values-from-a-PDF/m-p/78758#M30747</guid>
      <dc:creator>Sumitkumar2</dc:creator>
      <dc:date>2018-06-26T15:01:00Z</dc:date>
    </item>
    <item>
      <title>One thing about doing the …</title>
      <link>https://community.blueprism.com/t5/Product-Forum/The-best-way-to-read-Values-from-a-PDF/m-p/78759#M30748</link>
      <description>One thing about doing the A, C in a .pdf, is that you have to make sure that the entire .pdf has loaded.&amp;nbsp; If your .pdf is large, you may need to add a brief wait before doing the copy to make sure that the whole document has loaded.
Launch .pdf in Acrobat Reader
Small wait
Copy to clipboard
Get clipboard to data item
Split Lines to a collection
Loop or filter the collection to find the lines you want</description>
      <pubDate>Wed, 27 Jun 2018 03:54:00 GMT</pubDate>
      <guid>https://community.blueprism.com/t5/Product-Forum/The-best-way-to-read-Values-from-a-PDF/m-p/78759#M30748</guid>
      <dc:creator>JimmyMcCrillis</dc:creator>
      <dc:date>2018-06-27T03:54:00Z</dc:date>
    </item>
    <item>
      <title>You can also use Apache…</title>
      <link>https://community.blueprism.com/t5/Product-Forum/The-best-way-to-read-Values-from-a-PDF/m-p/78760#M30749</link>
      <description>You can also use Apache PDFBox with a command line. This is fast and you dont need Adobe.
&amp;nbsp;
&lt;A href="https://pdfbox.apache.org/2.0/commandline.html" target="test_blank"&gt;https://pdfbox.apache.org/2.0/commandline.html&lt;/A&gt;
&amp;nbsp;</description>
      <pubDate>Wed, 27 Jun 2018 12:59:00 GMT</pubDate>
      <guid>https://community.blueprism.com/t5/Product-Forum/The-best-way-to-read-Values-from-a-PDF/m-p/78760#M30749</guid>
      <dc:creator>MarcoSchulze1</dc:creator>
      <dc:date>2018-06-27T12:59:00Z</dc:date>
    </item>
    <item>
      <title>Hello Stephan,
Above posts…</title>
      <link>https://community.blueprism.com/t5/Product-Forum/The-best-way-to-read-Values-from-a-PDF/m-p/78761#M30750</link>
      <description>Hello Stephan,
Above posts suggest to get data from digital pdf to BluePrism Variable. To get specific values from the text It is always good to use RegEx. RegEx are pattern based and independent od&amp;nbsp;position or word count hence always produces the correct values irrespective of words position.
Use action ""Extract Regex Values"" available inside ""Utility - Strings"" object.&amp;nbsp;
Example(Extract Regex Values):&amp;nbsp;
We want to extract Invoice Number, Name and Date from below text:&amp;nbsp;
""Kindly find the invoice detail below:
Invoice No : INV32123 Name : Addy G Date : 10/04/2018
Hope this helps.""
Create a collection with two fields ""Name"" and ""Value"" of type text. Add three rows in Initial Value tab. Name Column values should be ""Invoice, ""Name"" and ""Date"", Value Column Should be empty.
Regex Pattern Looks Like :&amp;nbsp;
Invoice No\s:\s(?\w*\s*)\s*Name\s:\s(?\w*\s\w*\s*)\s*Date\s:\s(?\d\d\/\d\d\/\d\d\d\d)
Text in bold in regex should be the same as column created in name value collection.
Use same name value collection for output.
Google ""Regex cheat sheet"" to get the better understanding of regular expression. There are few websites are also available to test, debug and create regular expression online.
Hope this helps. &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;</description>
      <pubDate>Fri, 29 Jun 2018 11:48:00 GMT</pubDate>
      <guid>https://community.blueprism.com/t5/Product-Forum/The-best-way-to-read-Values-from-a-PDF/m-p/78761#M30750</guid>
      <dc:creator>MayurGangrade</dc:creator>
      <dc:date>2018-06-29T11:48:00Z</dc:date>
    </item>
  </channel>
</rss>

