How to improve accuracy in case of dynamic rows

kiranb25 · ‎03-07-23

Hi,

I have a PO document(pdf) where the address data is dynamic in rows, Address may be in 3, 4 or 5 lines. In DFD we are assigning with multiple lines for the address field and trained with all format files in sufficient number (~100). But every time a new batch is loaded, decipher able to read only a few documents(40%) with correct address. for others either it is extracting less rows of data or more rows.

Eg: Decipher is reading only 3 lines in 5 lines of address data in a single block, or 4-5 lines in 3 line address.

refer below screenshot:

Any suggestions to improve the accuracy?

Thanks ,

Kiran.

------------------------------
kiran b
------------------------------

BenLyons · ‎05-07-23

Hi Kiran,

Decipher should be able to handle that type of dynamic field. What version are you using? And what are the DFD settings for the field? Are you using a Capture ML model?

Thanks

------------------------------
Ben Lyons
Senior Product Specialist - Decipher
SS&C Blue Prism
UK based
------------------------------

Ben Lyons Senior Product Specialist - Decipher SS&C Blue Prism UK based

kiranb25 · ‎05-07-23

Hi Ben,

Appreciate your response, I am using Decipher 2.2ver. DFD settings for this field is Assignable, Multiline and Miscellaneous parameters are set to multiple. I am not capturing any ML model for this Document type.

As of now, decipher able to identify very few documents with correct address.

I will try capturing ML model for this document type.

Any other suggestions.?

Thanks,

Kiran

------------------------------
kiran B
------------------------------

BenLyons · ‎06-07-23

Hi Kiran,

I wouldn't create a capture model just yet, as that's more for optimisation than initial training.

Why are you using a misc parameter? This doesn't look like something that's required for reading this field.

Have you followed the document training best practices from our online help pages?

Thanks

------------------------------
Ben Lyons
Senior Product Specialist - Decipher
SS&C Blue Prism
UK based
------------------------------

Ben Lyons Senior Product Specialist - Decipher SS&C Blue Prism UK based

kiranb25 · ‎06-07-23

Hi Ben,

Thanks for the suggestion,

Coming to the first question, anything that improves the accuracy to extract the dynamic row data apart from training more files with best practices.?

Thanks,

Kiran

------------------------------
kiran B
------------------------------

BenLyons · ‎26-07-23

Hi Kiran,

If for each layout the address appears in the same place, you could set it as a static field using the misc parameter "StrictPosition=On". You'll need to restart your training and create a region large enough to capture the longest address.

Essentially this parameter will treat the field as if it never moves and will read anything that's in the box, though this will be specific to each layout i.e. vendor A's address doesn't have to be in the same place as vendor B's.

Thanks

------------------------------
Ben Lyons
Senior Product Specialist - Decipher
SS&C Blue Prism
UK based
------------------------------

Ben Lyons Senior Product Specialist - Decipher SS&C Blue Prism UK based

kiranb25 · ‎09-08-23

Hi Ben,

The layout is changing as per address (no of rows), So after using "StrictPosition=On" decipher is reading the extra lines in case of 3 rows of address. We have tried to use other options like "Semantic=On" assuming will consider outline and stop reading the unwanted data. Still not able to achieve.

Appreciate any other suggestions.

FYI the pdf loos like below:

4 rows;

3rows:

5 rows:

After training the above formats , if we have batch with all the 3-address format of pdfs. Our target is to read only the address (Ship to)correctly.

Scenario 1: For some(3 lines of address file) decipher reading the values of ship to po as well considering it as 4, 5 lines.

Scenario 2: For 4 lines of address file, it is reading either 3 lines or ship to po as well considering it as 5.

Scenario 3: For 5 lines of address file, it is reading either 3 or 4 lines.

I hope you got the issue,

Note: We have connected with decipher support team, but the issue still persists.

Thanks,

Kiran.

------------------------------
kiran B
------------------------------

BenLyons · ‎09-08-23

Hi Kiran,

Are all these addresses known? Could you create a list of them?

Thanks

------------------------------
Ben Lyons
Senior Product Specialist - Decipher
SS&C Blue Prism
UK based
------------------------------

Ben Lyons Senior Product Specialist - Decipher SS&C Blue Prism UK based

kiranb25 · ‎09-08-23

Hi Ben,

Most of the address are in US region. We cannot create a list since they are customers, can be new address.

Can we use any regex to identify as address as a whole...?

Note: The provided addresses are manipulated in the above screenshot.

Thanks,

Kiran

------------------------------
kiran B
------------------------------

BenLyons · ‎10-08-23

Hi Kiran,

I've got some Regex that validates the full address, but unfortunately I've not been able to get the region training to consistently identify the full address.

But you can try using "([\s\S]*[\r?\n ]*){3,5}[0-9]{5}$" .

I've raised this query with the development team and hope they will come back with some further advice shortly.

Thanks

------------------------------
Ben Lyons
Senior Product Specialist - Decipher
SS&C Blue Prism
UK based
------------------------------

Ben Lyons Senior Product Specialist - Decipher SS&C Blue Prism UK based

SS&C Blue Prism Community

How to improve accuracy in case of dynamic rows