28-04-23 09:33 AM
Hi,
I have extracted all the information from the editable pdf using application modeller and send global keys. Now I need to extract some specific information. Can anyone tell me in which way can I extract data?
Note that Blue Prism is new to me. It would be nice if you could tell the solution in details.
PDF document:
Extracted text:
Packing List
Date: 06/04/2018
FROM:
Shipping Company
123 Memory Lane
Madison, Wi
53203
TO:
Valued Customer
4343 Main St
Suite 100
Willowcrest, NC
27007
Quanity Ordered
Quantity Shipped
Product Number
Description
Unit Weight
2
1
0045657
Gizmo #1
1 lbs.
2
2
9007652
Gadget #2
20 lbs.
Total Items Shipped: 3
Comments:
Please retain this packing list for return or exchange.
Ship Date: 07/04/2018
Need following information:
28-04-23 10:06 AM
Possible solutions are string manipulation or Regex through Utility - Strings VBO
Still there are limitations as will be using key words like from, to etc so all the pdf should of same format
Next related to table data it again depends no of columns ( are they going to be fixed or keeps changing) & available of data in cells
if there any cells empty then their is failure of solution
so should be aware of most possible cases with enough nor of test files.
following regex you can use to get the grouping values
FROM:\s+(?<FromAddress>[\S\s]+)\s+TO:\s+(?<ToAddress>[\S\s]+)Quanity Ordered[\S\s]+Weight\s+(?<Table>[\S\s]+)Total Items Shipped:\s*(?<TotalItemsShipped>\d?)[\S\s]+Ship Date:\s(?<ShipDate>\d{2}\/\d{2}\/\d{4})
Here initial string is text from pdf
first use the above to get the groups data and along with table
will get output as collection as below
next step is to get table data present in Table column
will get the output as string
same way you can get all the values or even you can use multi calculation stage to get all the values at a time
next use below regex to get table values
(?<QO>\d+?)\r?\n(?<QS>\d+?)\r?\n(?<PN>\d{7})
will get the values in collection , so based on your requirement you can get values through looping
Note as mentioned earlier there are limitations
Note : you can the group names as you require but should have spaces (ex <QO> you can Name it <Quantity_Ordered>)
Hope this helps still if you face any issue you can explain
Regards
01-05-23 07:06 AM
Hi
I hope below solution works , if you still face any issue or the output is not in expected format then you can provide the expecting output
so based on that can have a better solution
Regards
02-05-23 11:28 AM
thanks for the solution. It worked for me.