cancel
Showing results for 
Search instead for 
Did you mean: 

PDF File

Anonymous
Not applicable
Hi All, I am currently doing an object which reads from a PDF source file. the problem is the PDF data details came from a scanned document which the font is not recognized in BP. Question is, does BP has another way on reading the font details in a PDF. does using a 3rd party dll like itextsharp can be done? I am currently doing a surface automation on this project. Thanks
10 REPLIES 10

PatrickChilders
Level 3
I'm not sure about BP's built-in functions with respect to PDFs. As for the 3rd party DLLs it is my understanding that the purpose of being allowed to import dlls in object studio is to add 3rd party extensions. I imagine it's the same logic that went with us being provided code stages, that Blue Prism knew they couldn't think of every possible situation that would need automated, so code stages and dll imports give us the ability to add functionality ourselves. If I understand it correctly, we can ues dlls from whatever source, even make them ourselves, but we can't expect Blue Prism to support it of course. So using a dll will probably be fine, but if something breaks or there are issues I imagine you'll be on your own.

SamanthaShaw
Staff
Staff
In the latest release of 4.2 there is also some OCR functionality in Beta, based on the Tesseract OCR engine, which may be of use in this situation. Sam

Anonymous
Not applicable
@Sam Fidler Thanks for the information on Tesseract OCR engine I am now able to capture the details as long as the PDF file is in template form i can use this.

JIGARPARIKH
Level 4
Hi JongOclarit, Can you please let me know how can we select pdf or any other file? Using which object we can select JPG or PDF file?

AlexisVecinal
Level 2
Hi Sam, Does Blue Prism 4.2.50 able to extract handwritten text from a scanned document in a pdf Thanks?

DHINAGARANASHOK
Level 4
hi , Will the BP able to read the content from the SCAN copy or scanned image.

Anonymous
Not applicable
@jigar.parikh On my side what i did is open PDF reader and browse the file that i want to capture the details also be sure that the pdf you are reading is using a template for the fields of the details is fixed. @alexis.c.vecinal For the hand written part some text are not captured you may read and visit about the plugin they used which is Tesseract OCR engine. @DHINAGARAN yup BP can read the scan copy as long as it is clear scanned copy or image copy.

DHINAGARANASHOK
Level 4
@Jong Oclarit. Thanks for Your INformation. For clear understanding can we have any tutorials or Example For Reading data from Scanned pdf because i am in need of that very urgently. Thanks DHINAGARAN A

A guide for interfacing with PDF documents is currently being created by the Blue Prism PS team. It should be available on the Portal within the next month.