cancel
Showing results for 
Search instead for 
Did you mean: 

Convert Scanned PDF to searchable pdf

ShwetaDharmadhi
Level 4
I need to convert scanned pdf file into searchable one. It is easier in Adobe Acrobat DC just by finding any word (Ctrl+f) and then searching document and 'save as' the file.
But I have to use Adobe Acrobat Reader DC where the above functionality doesn't work. 
Please let me know the easier alternative (OCR or C# code may be).

Note: Reading whole pdf is not required, just want to make it searchable.


------------------------------
Shweta Dharmadhikari
RPA developer
------------------------------
8 REPLIES 8

Hi Shweta,

There are a number of assets available on DX which can be used to extract contents from pdf. Typically these connectors return data in JSON format within which you can perform your search operation using calculation stages. Many of these OCR providers allow free usage up to a certain limit. Here are a few links out of them:-

https://digitalexchange.blueprism.com/dx/entry/9648/solution/typless
https://digitalexchange.blueprism.com/dx/entry/9648/solution/sypht
https://digitalexchange.blueprism.com/dx/entry/3439/solution/form-recognizer-azure-cloud
https://digitalexchange.blueprism.com/dx/entry/3439/solution/textract-capability-aws-cloud
https://digitalexchange.blueprism.com/dx/entry/9648/solution/cloud-vision-api-v1-2


------------------------------
Shashank Kumar
DX Integrations Partner Consultant
Blue Prism
Singapore
+6581326707
------------------------------

Hi Shweta ,

I am using Adobe Acrobat Reader DC and doing same work as above mention.
I am using CTRL + F to find word and if its found need to save that file else need to pickup next file.

Could you please let me know which functionality doesn't work for you ?

Thanks
Nilesh


------------------------------
Nilesh Jadhav
Senior RPA Specialist
------------------------------
Nilesh Jadhav.
Consultant
ADP,India

Thanks for the reply Nilesh.

When I open scanned pdf in Adobe Acrobat Reader DC, and do Ctrl+F and search a word, I get popup message as "Adobe Reader has finished searching the document. No match found".  but in Acrobat DC it finely searches for the word in all the pages and make the pdf searchable.

Can you please suggest which approach should I follow for this automation.

Regards,
Shweta

------------------------------
Shweta Dharmadhikari
RPA developer
Accenture Solutions Pvt Ltd
Asia/Kolkata
------------------------------

Thanks for the suggestion Shahshank.

Do we have any free OCR providers.
Also I thought to have a code stage and write C# .Net to achieve automation but most of the available dlls are paid only. 
Please let me know if you know any free dll or any other suggestions.

Regards,
Shweta

------------------------------
Shweta Dharmadhikari
RPA developer
Accenture Solutions Pvt Ltd
Asia/Kolkata
------------------------------

Hi Shweta,

AS mentioned earlier, some of these OCR providers have free tiers available which means you can use them for free if you don't exceed the limit.

------------------------------
Shashank Kumar
DX Integrations Partner Consultant
Blue Prism
Singapore
+6581326707
------------------------------

Hi Shweta,

Is your pdf file is proper pdf or its scan image ? 

Open pdf copy any word from it and try to find manually if manually its work then automation can easily do it & if your pdf is scan image then you won't be able to find it and you have to use surface automation to get text and use instr function to find the value (try to add more word for search,if singal criteria match then save that file)

Do let me know if need more clarity .

Thanks
Nilesh 



------------------------------
Nilesh Jadhav
Senior RPA Specialist
------------------------------
Nilesh Jadhav.
Consultant
ADP,India

Hi Nilesh,

As I mentioned already my pdf is scanned one, it contains only images. 
Please can you suggest how should I proceed with surface automation as I don't need to read any text but just make that file searchable one.

Regards,
Shweta

------------------------------
Shweta Dharmadhikari
RPA developer
Accenture Solutions Pvt Ltd
Asia/Kolkata
------------------------------

Prasanth_
Level 2
Hi 
YOu can accomplish this automated scanned PDF to searchable pdf conversion using OCRvision. This OCR PDF software monitors a folder and OCR converts any new scanned PDFs to searchable PDFs.


------------------------------
Prasanth 
------------------------------