Handling unstructured data - Decipher 1.2
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
16-09-21 05:17 AM
Here with I have attached image(source data) , image2 as
expected result is as shown in table
I have colored the values in the image with same colors of below table , for better understanding.
I have trained around 30 + docs and results are not satisfactory.
- The columns L1,L2 L3 comes exactly across even for multiple pages
- But L5and L6 lines are not coming properly almost for all lines
- The data in the image, you can observe that it is more unstructured data ,
- Coz L5 and L6 column values present in the image in different rows
- I have added key words which present for L5 and L6 values in the image still result is not good.
My Questions
- Can I enable NLPN to manage this kid of unstructured data
- As it use Graphic card , is any performance related will arise.
- Or you suggest any other method to achieve good result
------------------------------
Krishna Elapavuluri RPA Solution Lead
TEchnology Consultant
DXC.technology
Asia/Kolkata
------------------------------
3 REPLIES 3
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
16-09-21 08:27 AM
Hi Krishna,
1. NLP is not intended for structured or semi-structured documents, and while I appreciate this may seem "unstructured" it would fall in one of the first 2 categories. This is because the information is laid out in a consistent structure and not unstructured like a contract or agreement.
2. The GPU requirement is to speed up the processing as the NLP model uses a neural network and a regular CPU would take far too long to process the data.
3. Without anchors (e.g. sample headers) near the respective data, it will be very difficult to train consistently. Your best bet is to try using Format Expressions (Regex) for each of the fields. Sometimes you may need to gather more data than needed and format it later in Blue Prism. e.g. get the number 10 and KPL, then remove KPL later.
Thanks
------------------------------
Ben Lyons
Product Consultant
Blue Prism
UK
------------------------------
1. NLP is not intended for structured or semi-structured documents, and while I appreciate this may seem "unstructured" it would fall in one of the first 2 categories. This is because the information is laid out in a consistent structure and not unstructured like a contract or agreement.
2. The GPU requirement is to speed up the processing as the NLP model uses a neural network and a regular CPU would take far too long to process the data.
3. Without anchors (e.g. sample headers) near the respective data, it will be very difficult to train consistently. Your best bet is to try using Format Expressions (Regex) for each of the fields. Sometimes you may need to gather more data than needed and format it later in Blue Prism. e.g. get the number 10 and KPL, then remove KPL later.
Thanks
------------------------------
Ben Lyons
Product Consultant
Blue Prism
UK
------------------------------
Ben Lyons
Principal Product Specialist - Decipher
SS&C Blue Prism
UK based
Principal Product Specialist - Decipher
SS&C Blue Prism
UK based
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
16-09-21 01:46 PM
Hi Ben ,
as you said in #3 , I am doing lot of enrichment in BP process. But here problem is the expected value position line is keep changing (mean that it present 4 or 6 ot 9 row of specific order line) .
This is causing for Decipher to get wrong value as it remember last trained row position . to pore precise
Assume I trained D1 and picked the data from 4th (sample header as VALMISTAJAN TUOTENUMERO) row of an order line (say for 30 line).
for next order D2 assume same header present in 6th row , it won't read 6th row data still it read 4th line data
Hope i explained clearly 🙂
------------------------------
Krishna Elapavuluri RPA Solution Lead
TEchnology Consultant
DXC.technology
Asia/Kolkata
------------------------------
as you said in #3 , I am doing lot of enrichment in BP process. But here problem is the expected value position line is keep changing (mean that it present 4 or 6 ot 9 row of specific order line) .
This is causing for Decipher to get wrong value as it remember last trained row position . to pore precise
Assume I trained D1 and picked the data from 4th (sample header as VALMISTAJAN TUOTENUMERO) row of an order line (say for 30 line).
for next order D2 assume same header present in 6th row , it won't read 6th row data still it read 4th line data
Hope i explained clearly 🙂
------------------------------
Krishna Elapavuluri RPA Solution Lead
TEchnology Consultant
DXC.technology
Asia/Kolkata
------------------------------
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
17-09-21 09:52 AM
Hi Krishna,
You might be better off reading the entire section of text, then using formulas/functions in Blue Prism to extract the specific data.
Regards
------------------------------
Ben Lyons
Product Consultant
Blue Prism
UK
------------------------------
You might be better off reading the entire section of text, then using formulas/functions in Blue Prism to extract the specific data.
Regards
------------------------------
Ben Lyons
Product Consultant
Blue Prism
UK
------------------------------
Ben Lyons
Principal Product Specialist - Decipher
SS&C Blue Prism
UK based
Principal Product Specialist - Decipher
SS&C Blue Prism
UK based
