pdf data extraction
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
04-07-21 01:38 PM
hello, what is the best free way for extracting data from PDF?
------------------------------
aseel odeh
------------------------------
------------------------------
aseel odeh
------------------------------
8 REPLIES 8
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
05-07-21 04:22 AM
You can look into Decipher-IDP a product of Blue Prism, which helps extract data from PDF.
https://portal.blueprism.com/product/related-products/blue-prism-decipher-idp-11
------------------------------
Sai Devendra Kumar Komma
------------------------------
https://portal.blueprism.com/product/related-products/blue-prism-decipher-idp-11
------------------------------
Sai Devendra Kumar Komma
------------------------------
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
05-07-21 02:37 PM
@aseelodeh, The best option is Decipher, but if it's someone like a string, dumb thing, you can copy the content to a data item and do a regex for the desired value, if you just need to validate if a word exists, use InStr()
------------------------------
Emerson Ferreira
Sr Business Analyst
Avanade Brasil
+55 (081) 98886-9544
If my answer helped you? Mark as useful!
------------------------------
------------------------------
Emerson Ferreira
Sr Business Analyst
Avanade Brasil
+55 (081) 98886-9544
If my answer helped you? Mark as useful!
------------------------------
Sr Cons at Avanade Brazil
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
06-07-21 11:03 AM
yes, as you know copying data from PDF extracts the text without formatting, do you have a way other than regEX for processing the data in an excel file? i need it for multiple different files
------------------------------
aseel odeh
------------------------------
------------------------------
aseel odeh
------------------------------
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
06-07-21 02:26 PM
There are various ways to extract data from PDFs. The "best" way depends on your specific use case and the make up of the PDFs that you'll be dealing with. Some examples have been mentioned above. Additional examples for extracting data include:
------------------------------
Eric Wilson
Director, Partner Integrations for Digital Exchange
Blue Prism
------------------------------
- Use the PDF Toolkit from the DX to convert the PDF to a Word doc and then use the MS Word VBO to work with the contents.
- Use the open source Xpdf Tools to convert a PDF to text and then use the Strings utility VBO to work with the text.
------------------------------
Eric Wilson
Director, Partner Integrations for Digital Exchange
Blue Prism
------------------------------
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
06-07-21 03:08 PM
ok,
if the PDF is editable and can be copied, do you have a method for integrating and processing data into excel?
notice that I have different PDF formats
------------------------------
aseel odeh
------------------------------
if the PDF is editable and can be copied, do you have a method for integrating and processing data into excel?
notice that I have different PDF formats
------------------------------
aseel odeh
------------------------------
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
06-07-21 06:28 PM
@aseelodeh,
The PDF Toolkit, mentioned above, uses Adobe's Document Cloud platform. There's an action in the VBO called ExportPDFToDocx. You could copy that action into a new action and then change the following line of code in the code stage and I believe it would export the input PDF as an XLSX file.

Change the above highlighted line to this:
ExportPDFOperation exportPdfOperation = ExportPDFOperation.CreateNew(ExportPDFTargetFormat.XLSX);
Cheers,
------------------------------
Eric Wilson
Director, Partner Integrations for Digital Exchange
Blue Prism
------------------------------
The PDF Toolkit, mentioned above, uses Adobe's Document Cloud platform. There's an action in the VBO called ExportPDFToDocx. You could copy that action into a new action and then change the following line of code in the code stage and I believe it would export the input PDF as an XLSX file.
Change the above highlighted line to this:
ExportPDFOperation exportPdfOperation = ExportPDFOperation.CreateNew(ExportPDFTargetFormat.XLSX);
Cheers,
------------------------------
Eric Wilson
Director, Partner Integrations for Digital Exchange
Blue Prism
------------------------------
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
06-07-21 07:02 PM
does keeping the "CredentialsFilePath" Empty causing an error? if yes what meant by this? I have no credentials for PDF reader
------------------------------
aseel odeh
------------------------------
------------------------------
aseel odeh
------------------------------
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
07-07-21 01:38 PM
@aseelodeh,
The PDF Toolkit requires an account with Adobe Document Cloud. You can sign up for a free developer account with them for testing.
Cheers,
------------------------------
Eric Wilson
Director, Partner Integrations for Digital Exchange
Blue Prism
------------------------------
The PDF Toolkit requires an account with Adobe Document Cloud. You can sign up for a free developer account with them for testing.
Cheers,
------------------------------
Eric Wilson
Director, Partner Integrations for Digital Exchange
Blue Prism
------------------------------
