Training Decipher - per layout or mix of documents?

Evert_B_RoboRana · ‎12-05-22

Hi guys,

Simple question for which the best practices/documentation didn't provide me a conclusive answer.
We're automating an invoice process and I'm wondering which is the best approach.

Do I train Decipher with a few documents for each supplier/layout, or do I train it on a mixed set of documents?

The issue I have with the first approach is that our client has over 500 suppliers.
Training documents for each of these will be too time consuming, and even if we use the 50% of suppliers which provide us with the most invoices per year, we're still looking at about 200 suppliers.

What's the best approach for this?

Tejaskumar_Darji · ‎12-05-22

Hello Evert,

Does the set of fields you want to extract remain almost the same for all 500 vendors?

Evert_B_RoboRana · ‎13-05-22

Hi Tejaskumar,

Yes, the amount of fields as defined in the DFD remains the same for all vendors.

Tejaskumar_Darji · ‎18-05-22

Since fields are the same across all vendors I would say take a batch of 50 invoices from different vendors and see how it goes.

Also, ensure you have proper Sample Headers set up in the DFD so Decipher can grab the data accurately irrespective of the field position on the invoice.

Include all possible sample headers that you come across.

For e.g.

Do let us know how it goes for you.

Ben.Lyons1 · ‎18-05-22

Great sharing of advice here guys.

I'd just like to add 1 small note. Sample Headers aren't case sensitive, so you only need to have "Invoice Date" once and it can be as mentioned or "invoice date" or "INVOICE DATE". They will all work the same 👍

Thanks

Ben

Ben Lyons
Principal Product Specialist - Decipher
SS&C Blue Prism
UK based

Chaithanya340 · ‎27-05-22

Hey Evert Boelaert,

Even if you consider 200 suppliers, Can you add all suppliers to one batch?
There is a limitation of 50 suppliers to each batch as far as I know. How to manage all suppliers if there is a limitation like this, can someone throw light on this?

Ben.Lyons1 · ‎27-05-22

Hi All,

So far as I'm aware there's no actual limit on number of documents/files per batch, but we do recommend a lower count for performance and manual verification reasons. Performance will of course vary depending on your infrastructure, so this can be locally tested to see what works best for your environment. Though I would still consider it from the view of the person who may be manually verifying batches and a batch of 50+ could be more onerous than a batch of 10 or 15 documents (at least during training).

How many vendors should you train?

Well that's a tricky question, but one we can help you answer and the best practice is in the process of being updated for release with v2.2. In the meantime, it's not so much about how many vendors, more how different the layouts are. If every vendor uses the same headers for fields, the tables are all in the exact same layout and the image quality is high, then you won't need to train many in Development.

Decipher is designed to learn as it goes in Production, but you'll want to be sure the DFD configuration is as near to perfect as possible because changes in Production are not advisable. Additionally, you would potentially have difficulty manually verifying thousands of documents in Production, so you'd want your Training Data to be able to handle enough documents automatically that it doesn't create a lot of extra work.

Finally you'll want to be confident that all the auto-captured data is correct, so you'll need to be sure that during UAT no false positive values are sent back to the Blue Prism process. This will be an opportunity to utilise the many validation features available in Decipher, such as Format Expressions, Formulas and Validation Lists.

Does that help?

Thanks

Ben

Ben Lyons
Principal Product Specialist - Decipher
SS&C Blue Prism
UK based

zdenek.kabatek · ‎30-05-22

Hi, Ben,

it is a question for you - if you say train Decipher in Development and then take the training model to PROD I am wondering how this can be achieved if you can't choose which training model to pick for migration but you migrate all. So if I am already running in PROD a few DFDs and now I want to add one more and pre-train in DEV how this can be achieved with 2.1 version?

I guess it is not possible but I would be very happy to hear otherwise.

Regards

Zdenek

Ben.Lyons1 · ‎30-05-22

Hi Zdenek,

A good question and I'm currently working on the best practice materials, which will include detailed guidance on this topic.

But I'm happy to share some of the processes discussed.

First points to note is that Decipher will intelligently manage any duplicate training entries. So if you export your full Training Data from Dev and it contains training for live documents, when you import this to Production, Decipher will work out what the most current Training Data is. This is on a document layout level, so it won't be a problem if you've trained some new document layouts in Dev to be deployed to Production (though there is a lot more guidance on this concept coming).

However, the first thing to do would be to test this idea in your UAT environment. Start by deleting your UAT Training Data, the export your Production Training Data and import that into UAT. Next export your Dev Training Data and import that into UAT. Now in UAT you will have your Prod & UAT Training Data combined, simulating your expected performance when going live.

We are working on adding functionality to manage the Training Data, but this will not be part of v2.2.

Thanks

Ben

Ben Lyons
Principal Product Specialist - Decipher
SS&C Blue Prism
UK based

SS&C Blue Prism Community

Training Decipher - per layout or mix of documents?