cancel
Showing results for 
Search instead for 
Did you mean: 

PDF FIle Read from Blueprism Object

virendersingh
Level 3
Hello team,   My Requirement, We have to used the PDF file data to further other application process:-   1) Can we read the PDF data?? Except the Copy- Paste or Clipboard functionality. The reason behind that the PDF file is almost 17 pages. 2) Any Blueprism VBO available for PDF Read Data?? 3) Any way how we read the PDF file data Except copy & paste.   Please help for me for short out this issue. It's production issue, Initially Copy-past functionally working there, but right the functionally disabled in application so require to fix this issue in production environment, Right Now our Bot is block Situation.   So please suggest us the solution for the same. Thanks in advance. 
2 REPLIES 2

InJoeKhor
Staff
Staff
Hi Virender,

There are usually 2 types of PDF documents. PDF documents and PDF Images.
For Documents - it is usually created using Microsoft Word or Adobe Acrobat, and saved in the read only.pdf format. You can test if your document is truly a PDF document by attempting to copy text from the document using the Windows clipboard.The Image type isoften scanned documents saves as .pdf or .tiff format images. You can't copy text from these images. You can use the ' Reading Text with OCR' technique to extract data. OCR will only work if the image is of a high enough quality, 300dpi is recommended as a minimum.

Once you have captured the PDF document text using one of the techniques outlined above you will need to implement some logic to extract the data you want from the within the text.

Hope this helps.

Thanks,



------------------------------
In Joe Khor
Sr. Product Consultant
Blue Prism
------------------------------
In Joe Khor Sr. Product Consultant Blue Prism

EVIPUTI
MVP
The below can be used :
  1. Using the Windows Clipboard to copy all the text from a pdf document.
  2. Using the Blue Prism 'Read Text with OCR' Read stage action to read text from a region within a PDF document.
  3. Using the Adobe Acrobat API to export the pdf into another format (XML or Microsoft Word) from which data is easier to extract text.


------------------------------
[Vipul] [Tiwari] [Senior Process Simplification and Optimization Developer]
[Fidelity]
------------------------------
------------------------------ Vipul Tiwari Senior Process Simplification Developer Amazon ------------------------------