If we have multiple Invoices to process using Decipher. Do we need to create Multiple DFD's ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
18-06-21 07:34 PM
In our use Case, we need to deal with 80 to 120 different type of invoices
Do we need to use one common DFD or Do we need to create different DFD for each invoice. Which process will give us better performance ?
#Decipher #DecipherIDP #DecipherDFD #Documentformdefination
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
21-06-21 08:49 AM
You'll be pleased to learn you don't need a DFD for every invoice layout!
However 1 may also not be enough, here's how you can find out.
1. Identify the fields you would like to read, perhaps start with 5 different layouts.
2. It's ok to include fields that perhaps only appear in 50% (or less) of invoices, they just won't be mandatory
3. Start your training (ML model disabled), you will start to see Decipher learn how to find these fields. (To help Decipher you can specify Sample Headers of text near the required field. You can specify multiple per field, but try to keep these to a minimum as Decipher can still locate the field without these).
4. Once you're comfortable with these working as expected, move onto further sets of layouts.
5. If during this training you find a layout that is not training as expected, due to a significantly different format, then it might be worth creating a separate DFD.
6. You can include these documents all in a single batch and use a classification model to separate them to the different DFDs. (You can find more info on this in our online help pages).
As for the second part of your question. Having a single DFD could be the most efficient as you don't then need a classification model to separate the documents. However, if this then requires you to carry out substantial training and enabling an ML model, this could reduce the efficiency. If having multiple DFD's results in smaller training sets and achieving the desired success rate without enabling the additional ML model, then this could be more efficient.
Thanks
Ben
Principal Product Specialist - Decipher
SS&C Blue Prism
UK based
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
21-06-21 09:00 AM
As long as the documents follow a consistent structure or the fields that you want to extract are available in all those documents, you should be able to work with one DFD ideally.
Essentially when the business rules for info extraction start conflicting that's when you would start looking at classification for better results.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
21-06-21 03:02 PM
Hi @Ben.Lyons1
Could you please let us know why to disable the ML model, before training the decipher.
In our use case, immediately after creating Document type. I have enabled the ML Model. Is this the reason why decipher is not working as expected for me ?
Also could you please provide your suggestion that we need to follow regarding "Enabling or Disabling ML model". we have 80 -120 different kind of invoice from different suppliers and we need to extract same information from all the invoices.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
21-06-21 03:12 PM
When carrying out your initial training, it's not necessary to enable the additional ML model. Decipher has a native training mechanism that builds the Training Data and this is often powerful enough to get great results without enabling the additional ML model.
When reading a document Decipher uses this Training Data in conjunction with the information specified in the DFD (e.g. Sample Headers, Formulas). Training Data is a mixture of the data types and locations found in similar documents.
I would recommend only enabling the extra ML model once you've reached a high degree of success/consistency. This may mean switching it on when it's ready to run in production/unattended.
Though if you're getting the results you need without it on, you can leave it off and you will get faster performance.
Thanks
Ben
Principal Product Specialist - Decipher
SS&C Blue Prism
UK based