PDF extraction with checkbox field
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
02-02-23 04:03 AM
Can you assist me in finding a solution to extract data from a PDF without relying on external applications, as my organization requires the use of only Blue Prism approved objects and native tools?
I have attempted using Global Send Keys, but it doesn't seem to work well for capturing data from the PDF, which includes text boxes, multi-line fields, and checkboxes. Also, there is a possibility of rearranging the field positions in the future, making it inappropriate to use field position references for data extraction. The PDF can have more than 3 pages.
Is there any alternative method that allows for capturing data, including checkbox values, in a more efficient manner?
Sample of the fields are shown below.
------------------------------
Amrutha Sivarajan
------------------------------
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
02-02-23 07:16 AM
Two suggestions that perhaps can help you:
- I think there's an object for PFD's in DX
- Last week someone with a similar challenge was advised to try and open the pdf in Word
------------------------------
Happy coding!
---------------
Paul
Sweden
------------------------------
Paul, Sweden
(By all means, do not mark this as the best answer!)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
21-02-23 05:41 AM
Hi Paul
Can you please provide the DX link of the object?
Many thanks in advance
------------------------------
Manish Rawat
Project Manager
Mercer
New Delhi
------------------------------
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
21-02-23 11:15 AM
Hi Manish,
I wrote '...I think...' implying I am not sure as we do not use any DX objects in our shop.
That said, My '...I know...' is based on earlier posts on this subject in this community, so some googling on your side will probably unearth clues as to where to find any such DX object.
------------------------------
Happy coding!
---------------
Paul
Sweden
------------------------------
Paul, Sweden
(By all means, do not mark this as the best answer!)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
22-02-23 08:50 AM
Hi Amrutha,
I got one process last year where we had to extract some data from the pdf files. I used alternative way to do this task. I converted pdf files to excel file and then with the help of excel utility I read cells value.
you can also try this method.
------------------------------
Sahil Chankotra
------------------------------
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
22-02-23 01:34 PM
Hi Amrutha,
Last year, I worked on the automation where I have to update and extract the data from PDF forms. I have used C# code and Itextsharp dll for this use case.
Please find below the details -
Inputs - filePath(Text)
Outputs - outputText(Text), Success(Flag), Message(Text)
Code -
Success = true;
Message = "";
outputText = "";
StringBuilder text = new StringBuilder();
PdfReader pdfReader = null;
var pdf_filename = filePath;
try{
pdfReader = new PdfReader(pdf_filename);
{
var fields = pdfReader.AcroFields.Fields;
foreach (var key in fields.Keys)
{
var value = pdfReader.AcroFields.GetField(key);
text.Append(key+"----"+value+";");
}
outputText = text.ToString();
}
}
catch(Exception exx) {
Success = false;
Message = exx.Message;
}
finally {
if (pdfReader != null)
{
pdfReader.Close();
}
}
You will get the details in text data item and after that use the split text with character ;( as mentioned in code - text.Append(key+"----"+value+";")).
Also, you need to import the dlls in code option -
- C:\Program Files\Blue Prism Limited\Blue Prism Automate\itextsharp.dll
- C:\Program Files\Blue Prism Limited\Blue Prism Automate\BouncyCastle.Crypto.dll
Please let me know if you need any additional information.
------------------------------
KirtiMaan Talwar
Consultant
Deloitte
------------------------------
IA Consultant
Deloitte USI
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
23-02-23 03:32 AM
Thank you Sahil.
I tried your approach unfortunately the Excel is reading some fields as image and its not returning structured data. I'm getting a mix of image and text values for PDF to Excel conversion.
------------------------------
Amrutha Sivarajan
------------------------------
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
23-02-23 03:43 AM
Thanks a lot for your detailed explanation. I truly appreciate your effort.
I would like to try out the method you have suggested. If you don't mind can you share me the authenticated URLs for downloading the DLLs?
I had tried using BP objects from Digital exchange and worked on few python codes to read the PDF. Since the PDF is editable, its unable to read the field values and is able to read the field labels alone.
------------------------------
Amrutha Sivarajan
------------------------------
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
23-02-23 03:45 AM
Thanks Paul for your suggestion.
I tried few objects from DX and tried converting the PDF into word and excel. It is not able to extract the data and the information is read either as image or blank values as the PDF is editable form.
------------------------------
Amrutha Sivarajan
------------------------------
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
23-02-23 07:55 AM
Hi @Amrutha Sivarajan ,
Did you try opening the pdf file in chrome or any other browser? Opening a file using a browser sometimes helps in spying the relevant elements and you can try reading the checkbox values.
------------------------------
Manpreet Kaur
Manager
Deloitte
*If you find this post helpful mark it as Best Answer
------------------------------
