cancel
Showing results for 
Search instead for 
Did you mean: 

pdf data extraction

aseelodeh
Level 5
hello, what is the best free way for extracting data from PDF?

------------------------------
aseel odeh
------------------------------
8 REPLIES 8

Sai_Devendra_Ku
Level 3
You can look into Decipher-IDP a product of Blue Prism, which helps extract data from PDF.

https://portal.blueprism.com/product/related-products/blue-prism-decipher-idp-11

------------------------------
Sai Devendra Kumar Komma
------------------------------

EmersonF
MVP
@aseelodeh,  The best option is Decipher, but if it's someone like a string, dumb thing, you can copy the content to a data item and do a regex for the desired value, if you just need to validate if a word exists, use InStr()​

------------------------------
Emerson Ferreira
Sr Business Analyst
Avanade Brasil
+55 (081) 98886-9544
If my answer helped you? Mark as useful!
------------------------------
Sr Cons at Avanade Brazil

yes, as you know copying data from PDF extracts the text without formatting, do you have a way other than regEX for processing the data in an excel file? i need it for multiple different files

------------------------------
aseel odeh
------------------------------

ewilson
Staff
Staff
There are various ways to extract data from PDFs. The "best" way depends on your specific use case and the make up of the PDFs that you'll be dealing with. Some examples have been mentioned above. Additional examples for extracting data include:
  • Use the PDF Toolkit from the DX to convert the PDF to a Word doc and then use the MS Word VBO to work with the contents.
  • Use the open source Xpdf Tools to convert a PDF to text and then use the Strings utility VBO to work with the text.
Cheers,

------------------------------
Eric Wilson
Director, Partner Integrations for Digital Exchange
Blue Prism
------------------------------

ok,
if the PDF is editable and can be copied, do you have a method for integrating and processing data into excel?
notice that I have different PDF formats

------------------------------
aseel odeh
------------------------------

@aseelodeh,

The PDF Toolkit, mentioned above, uses Adobe's Document Cloud platform. There's an action in the VBO called ExportPDFToDocx. You could copy that action into a new action and then change the following line of code in the code stage and I believe it would export the input PDF as an XLSX file.

7529.png
Change the above highlighted line to this:

ExportPDFOperation exportPdfOperation = ExportPDFOperation.CreateNew(ExportPDFTargetFormat.XLSX);

Cheers,


------------------------------
Eric Wilson
Director, Partner Integrations for Digital Exchange
Blue Prism
------------------------------

does keeping the "CredentialsFilePath" Empty causing an error? if yes what meant by this? I have no credentials for PDF reader

------------------------------
aseel odeh
------------------------------

@aseelodeh,

The PDF Toolkit requires an account with Adobe Document Cloud. You can sign up for a free developer account with them for testing.

Cheers,


------------------------------
Eric Wilson
Director, Partner Integrations for Digital Exchange
Blue Prism
------------------------------