cancel
Showing results for 
Search instead for 
Did you mean: 

Need an idea for extracting specific information

Anonymous
Not applicable

Hi,
I have extracted all the information from the editable pdf using application modeller and send global keys. Now I need to extract some specific information. Can anyone tell me in which way can I extract data?

Note that Blue Prism is new to me. It would be nice if you could tell the solution in details.

PDF document:
35510.png
Extracted text:

Packing List
Date: 06/04/2018
FROM:
Shipping Company
123 Memory Lane
Madison, Wi
53203
TO:
Valued Customer
4343 Main St
Suite 100
Willowcrest, NC
27007
Quanity Ordered
Quantity Shipped
Product Number
Description
Unit Weight
2
1
0045657
Gizmo #1
1 lbs.
2
2
9007652
Gadget #2
20 lbs.
Total Items Shipped: 3
Comments:
Please retain this packing list for return or exchange.
Ship Date: 07/04/2018

Need following information:
35511.png


3 REPLIES 3

LakshmiNarayan3
Level 6

Possible solutions are string manipulation or Regex through Utility - Strings VBO

Still there are limitations as will be using key words like from, to etc so all the pdf should of same format

Next related to table data it again depends no of columns ( are they going to be fixed or keeps changing) & available of data in cells

if there any cells empty then their is failure of solution

 so should be aware of most possible cases with enough nor of test files.

following regex you can use to get the grouping values 

FROM:\s+(?<FromAddress>[\S\s]+)\s+TO:\s+(?<ToAddress>[\S\s]+)Quanity Ordered[\S\s]+Weight\s+(?<Table>[\S\s]+)Total Items Shipped:\s*(?<TotalItemsShipped>\d?)[\S\s]+Ship Date:\s(?<ShipDate>\d{2}\/\d{2}\/\d{4})

35504.png
Here initial string is text from pdf

first use the above to get the groups data and along with table 

will get output as collection as below
35505.png
next step is to get table data present in Table column

35506.png
will get the output as string

same way you can get all the values or even you can use multi calculation stage to get all the values at a time

next use below regex to get table values

(?<QO>\d+?)\r?\n(?<QS>\d+?)\r?\n(?<PN>\d{7})

35507.png
will get the values in collection , so based on your requirement you can get values through looping

35508.png

Note as mentioned earlier there are limitations

Note : you can the group names as you require but should have spaces (ex <QO> you can Name it <Quantity_Ordered>)

Hope this helps still if you face any issue you can explain


Regards


LakshmiNarayan3
Level 6

Hi

I hope below solution works , if you still face any issue or the output is not in expected format then you can provide the expecting output

so based on that can have a better solution

Regards

Anonymous
Not applicable

thanks for the solution. It worked for me.