Digital Exchange

last person joined: 19 hours ago 

This community is a place to discuss the Blue Prism DX and all of the assets hosted there.
Expand all | Collapse all

Convert Scanned PDF to searchable pdf

  • 1.  Convert Scanned PDF to searchable pdf

    Posted 24 days ago
    I need to convert scanned pdf file into searchable one. It is easier in Adobe Acrobat DC just by finding any word (Ctrl+f) and then searching document and 'save as' the file.
    But I have to use Adobe Acrobat Reader DC where the above functionality doesn't work. 
    Please let me know the easier alternative (OCR or C# code may be).

    Note: Reading whole pdf is not required, just want to make it searchable.


    ------------------------------
    Shweta Dharmadhikari
    RPA developer
    ------------------------------


  • 2.  RE: Convert Scanned PDF to searchable pdf

    Posted 23 days ago
    Hi Shweta,

    There are a number of assets available on DX which can be used to extract contents from pdf. Typically these connectors return data in JSON format within which you can perform your search operation using calculation stages. Many of these OCR providers allow free usage up to a certain limit. Here are a few links out of them:-

    https://digitalexchange.blueprism.com/dx/entry/9648/solution/typless
    https://digitalexchange.blueprism.com/dx/entry/9648/solution/sypht
    https://digitalexchange.blueprism.com/dx/entry/3439/solution/form-recognizer-azure-cloud
    https://digitalexchange.blueprism.com/dx/entry/3439/solution/textract-capability-aws-cloud
    https://digitalexchange.blueprism.com/dx/entry/9648/solution/cloud-vision-api-v1-2


    ------------------------------
    Shashank Kumar
    DX Integrations Partner Consultant
    Blue Prism
    Singapore
    +6581326707
    ------------------------------



  • 3.  RE: Convert Scanned PDF to searchable pdf

    Posted 23 days ago
    Thanks for the suggestion Shahshank.

    Do we have any free OCR providers.
    Also I thought to have a code stage and write C# .Net to achieve automation but most of the available dlls are paid only. 
    Please let me know if you know any free dll or any other suggestions.

    Regards,
    Shweta

    ------------------------------
    Shweta Dharmadhikari
    RPA developer
    Accenture Solutions Pvt Ltd
    Asia/Kolkata
    ------------------------------



  • 4.  RE: Convert Scanned PDF to searchable pdf

    Posted 23 days ago
    Hi Shweta,

    AS mentioned earlier, some of these OCR providers have free tiers available which means you can use them for free if you don't exceed the limit.

    ------------------------------
    Shashank Kumar
    DX Integrations Partner Consultant
    Blue Prism
    Singapore
    +6581326707
    ------------------------------



  • 5.  RE: Convert Scanned PDF to searchable pdf

    Posted 23 days ago
    Hi Shweta ,

    I am using Adobe Acrobat Reader DC and doing same work as above mention.
    I am using CTRL + F to find word and if its found need to save that file else need to pickup next file.

    Could you please let me know which functionality doesn't work for you ?

    Thanks
    Nilesh


    ------------------------------
    Nilesh Jadhav
    Senior RPA Specialist
    ------------------------------



  • 6.  RE: Convert Scanned PDF to searchable pdf

    Posted 23 days ago
    Thanks for the reply Nilesh.

    When I open scanned pdf in Adobe Acrobat Reader DC, and do Ctrl+F and search a word, I get popup message as "Adobe Reader has finished searching the document. No match found".  but in Acrobat DC it finely searches for the word in all the pages and make the pdf searchable.

    Can you please suggest which approach should I follow for this automation.

    Regards,
    Shweta

    ------------------------------
    Shweta Dharmadhikari
    RPA developer
    Accenture Solutions Pvt Ltd
    Asia/Kolkata
    ------------------------------



  • 7.  RE: Convert Scanned PDF to searchable pdf

    Posted 23 days ago
    Hi Shweta,

    Is your pdf file is proper pdf or its scan image ? 

    Open pdf copy any word from it and try to find manually if manually its work then automation can easily do it & if your pdf is scan image then you won't be able to find it and you have to use surface automation to get text and use instr function to find the value (try to add more word for search,if singal criteria match then save that file)

    Do let me know if need more clarity .

    Thanks
    Nilesh 



    ------------------------------
    Nilesh Jadhav
    Senior RPA Specialist
    ------------------------------



  • 8.  RE: Convert Scanned PDF to searchable pdf

    Posted 23 days ago
    Edited by Shweta Dharmadhikari 23 days ago
    Hi Nilesh,

    As I mentioned already my pdf is scanned one, it contains only images. 
    Please can you suggest how should I proceed with surface automation as I don't need to read any text but just make that file searchable one.

    Regards,
    Shweta

    ------------------------------
    Shweta Dharmadhikari
    RPA developer
    Accenture Solutions Pvt Ltd
    Asia/Kolkata
    ------------------------------