cancel
Showing results for 
Search instead for 
Did you mean: 

Extract of text from PDF

MartynaPokojska
Level 3

Hello,

Do you have any good recommendation for the extract of text from a PDF file? 
I have been looking at iText and Adobe API, however both are not available for free (for iText a commercial license is required and Adobe has only a free trial version for 6 months). 
Looking forward to your replies 🙂

Martyna



------------------------------
Martyna Pokojska
------------------------------
Martyna Pokojska Arla Foods Solution Architect
4 REPLIES 4

Hi @Martyna Pokojska,

Did you try fetching the data by creating a small process which will just launch PDF, performs Ctrl+A to select the data, Ctrl+C to copy the data and then pasting the data in your desired location e.g. notepad?
Or you are specifically looking for external tools only to read the data?​

------------------------------
Manpreet Kaur
Manager
Deloitte
------------------------------

Hi @Manpreet Kaur,

I would like to avoid opening the file, therefore I'm searching for other options.
Additional problem with this PDF is, that part of it is interactive, so even when I'm opening it and save as txt file -> some values are not transferred to txt


------------------------------
Martyna Pokojska
------------------------------
Martyna Pokojska Arla Foods Solution Architect

Hi Martyna,

You can use the following DX Exchange asset which will co​nvert your PDF file to Excel using in-built Office operations hence no licensing issue should happen as long as you have a valid Microsoft Office software suite with you​​: Function for DX InDev PDF to Excel Converter

Once you have the data written on the excel file, you can use the 'Utility - MS Excel VBO' for extracting the content to a text file if required.

------------------------------
----------------------------------
Hope it helps you out and if my solution resolves your query, then please mark it as the 'Best Answer' so that the others members in the community having similar problem statement can track the answer easily in future

Regards,
Devneet Mohanty
Intelligent Process Automation Consultant | Sr. Consultant - Automation Developer,
Wonderbotz India Pvt. Ltd.
Blue Prism Community MVP | Blue Prism 7x Certified Professional
Website: https://devneet.github.io/
Email: devneetmohanty07@gmail.com

----------------------------------
------------------------------
----------------------------------
Hope it helps you out and if my solution resolves your query, then please provide a big thumbs up so that the others members in the community having similar problem statement can track the answer easily in future.

Regards,
Devneet Mohanty
Intelligent Process Automation Consultant | Technical Business Analyst,
WonderBotz India Pvt. Ltd.
Blue Prism Community MVP | Blue Prism 7x Certified Professional
Website: https://devneet.github.io/
Email: devneetmohanty07@gmail.com

----------------------------------

JatinKalra
Level 5
Hi @Martyna Pokojska,

We can achieve this functionality using python code.
Write python code and call it from Process Studio. It will convert all data into Text file. Then we can read that text file.
Python code is easily available on Internet.​

------------------------------
Jatin Kalra
Manager
Genpact
Noida UP
------------------------------
Jatin Kalra Manager Genpact Noida UP [Phone]