cancel
Showing results for 
Search instead for 
Did you mean: 

Table in PDF

Neel1
MVP
Hello Guys,

Please help me with all possible way to read a table from PDF file.

Thanks,

------------------------------
Neeraj Kumar
Software Engineer
------------------------------
3 REPLIES 3

Denis__Dennehy
Level 15
Hi Neeraj,

My first recommendation is to make sure you have the right tool for the job - Adobe Pro has options to extract information in PDFs in formats like XML/Excel and also spy a PDF document using AA or UIA accessibility technologies built into tools like Blue Prism.   Another option might be to use an ICR/OCR tool like Abbyy or Decipher if the structure of the pdf document is random and inconsistent (not designed with accessibility and data extraction in mind).

The hardest option is brute force extraction using copy all and hacking at the data you get and/or using Blue Prism's Surfact Automation functionality with using Tesseract OCR engine.  It is possible to extract data in this way but I once lost a week of my life doing it and the end result,  although it worked as a POC, was not scalable or robust enough for me to want to roll it out to a large number of similar documents.

Den


------------------------------
Denis Dennehy
Head of Professional Services, EMEA
Blue Prism Ltd
Europe/London
------------------------------

I totally agree with @Denis__Dennehy Even I wasted a lot of time in one of my projects using the Surface Automation option with Tesseract OCR or using some kind of regular expression once you copy all the contents of a PDF file onto Clipboard or some data item.

In past we have used the iText Sharp DLL VBO's in or​der to get PDF based tabular data in case they are digital for sure (Again there are some license restriction if I am correct as it is a GNU based license).

We have also used ABBYY Flexicapture 12 Distributed and it works wonderfully in case you have scanned or digital PDF's depending on the scan quality and how you create the Document Definition and Layouts around the tool. (You will need the ABBYY tool knowledge for sure)

I also came across an interesting VBO few weeks back on a sample digital invoice and tested the same, the results were great and it's pretty easy to use. You can find the VBO on Digital Exchange at the following link: PDF to Excel Converter

------------------------------
----------------------------------
Hope it helps you and if it resolves you query please mark it as the best answer so that others having the same problem can track the answer easily

Regards,
Devneet Mohanty
Intelligent Automation Consultant
Blueprism 6x Certified Professional
Website: https://devneet.github.io/
Email: devneetmohanty07@gmail.com

----------------------------------
------------------------------
---------------------------------------------------------------------------------------------------------------------------------------
Hope this helps you out and if so, please mark the current thread as the 'Answer', so others can refer to the same for reference in future.
Regards,
Devneet Mohanty,
SS&C Blueprism Community MVP 2024,
Automation Architect,
Wonderbotz India Pvt. Ltd.

Thanks a lot @devneetmohanty07 and @Denis__Dennehy for informative reply. I will try out few of the option mentioned..​​

------------------------------
Neeraj Kumar
Software Engineer
------------------------------