cancel
Showing results for 
Search instead for 
Did you mean: 

Handling unstructured data - Decipher 1.2

KrishnaElapavul
Level 6

Here with I have attached  image(source data) , image2 as

10391.png
10392.png

expected result is as shown in table

I have colored the values in the image  with same colors of below table , for better understanding.

I have trained  around 30 + docs and results are not satisfactory.

  1. The columns L1,L2 L3 comes exactly across even for multiple pages
  2. But L5and L6 lines are not coming properly almost for all lines
  3. The data in the image, you can observe that it is more unstructured data ,
  4. Coz L5 and L6 column values present in the image in different rows
  5. I have added key words which present for L5 and L6 values in the image still result is not good.

My Questions

  1. Can I enable NLPN to manage this kid of unstructured data
  2. As it  use Graphic card , is any performance related will arise.
  3. Or you suggest any other method to achieve  good result

 



------------------------------
Krishna Elapavuluri RPA Solution Lead
TEchnology Consultant
DXC.technology
Asia/Kolkata
------------------------------
3 REPLIES 3

BenLyons
Staff
Staff
Hi Krishna,

1. NLP is not intended for structured or semi-structured documents, and while I appreciate this may seem "unstructured" it would fall in one of the first 2 categories. This is because the information is laid out in a consistent structure and not unstructured like a contract or agreement.
2. The GPU requirement is to speed up the processing as the NLP model uses a neural network and a regular CPU would take far too long to process the data.
3. Without anchors (e.g. sample headers) near the respective data, it will be very difficult to train consistently. Your best bet is to try using Format Expressions (Regex) for each of the fields. Sometimes you may need to gather more data than needed and format it later in Blue Prism. e.g. get the number 10 and KPL, then remove KPL later.

Thanks

------------------------------
Ben Lyons
Product Consultant
Blue Prism
UK
------------------------------
Ben Lyons Senior Product Specialist - Decipher SS&C Blue Prism UK based

Hi Ben ,
as you said in #3 , I am doing lot of enrichment in BP process. But here problem is the expected value  position line is keep changing  (mean​ that it present 4 or 6 ot 9 row of specific order line) .
This is causing for Decipher to get wrong value as it  remember last trained row position . to pore precise 
Assume I trained D1 and picked the data from 4th (sample header as VALMISTAJAN TUOTENUMERO) row of an order line (say for 30 line).
for next order D2 assume same header present in 6th row  , it won't read 6th row data  still it read 4th line data 
Hope i explained clearly  🙂


------------------------------
Krishna Elapavuluri RPA Solution Lead
TEchnology Consultant
DXC.technology
Asia/Kolkata
------------------------------

Hi Krishna,

You might be better off reading the entire section of text, then using formulas/functions in Blue Prism to extract the specific data.

Regards

------------------------------
Ben Lyons
Product Consultant
Blue Prism
UK
------------------------------
Ben Lyons Senior Product Specialist - Decipher SS&C Blue Prism UK based