02-01-24 10:38 PM
Hi,
I am struggling with one more thing with DFD in parallel and due to urgency of the problem I am not able to create new ticket and wait for response.
I have to extract 2 things - Invoice number and title of the Invoice (word "Invoice")
I created 2 fields:
InvNo with sample headers like 'Invoice, Invoice number' etc.
and
InvTitle as a selectable field with option 'Invoice' and 'Other' as default.
My problem is that the InvNo field "takes" the "Invoice" title as its header to find the invoice number and then the "Invoice" value is not assigned to the "InvTitle" field.
How to make it works?
03-01-24 07:56 AM
Hi Wojciech,
Is this based on the same DFD and sample document shared with customer support?
Thanks
03-01-24 10:54 AM
Hi, I created new ticket for that 276451.
03-01-24 10:59 AM
sample document is any invoice with only one occurence of word 'invoice', e.g like 'Invoice no 123/123'.
I have to extract value 'Invoice' to one fields, and '123/123' to another.
Problem is that sample header of invoice number is list of 'Invoice','Invoice no' and similar words and this header "takes" 'Invoice' word for itself and it cannot be assign to invoice title field.
03-01-24 01:27 PM
Hi Wojciech,
I would avoid using overly common keywords, especially where they may appear in multiple places on a document and reusing them in multiple fields is highly likely to cause an issue. I would remove Invoice from the Invoice Number field, and all simple variations.
You don't need to specify all keywords for a field, Decipher will learn the position on the document. This would just mean Decipher won't automatically recognise the field on an untrained document.
Alternatively you could use Specific Versions if this suggestion is problematic, though it's not recommended to use this for a high number of variations and may still only work as needed after some training.
Thanks
03-01-24 04:18 PM
Hi,
In our case there is more than 2000 invoice templates with different placing of invoice number. Using sample header with keywords like 'Invoice' ensures satisfactory effectiveness. Without those keywords effectiveness drops dramatically.
So do I understand it correctly that sample header of invoice number field "steals" the keyword 'Invoice' from Invoice title field?
Is it possible to set same exact value for sample header for one field and for extracted value for other?
03-01-24 04:27 PM
Hi Wojciech,
Removing the keywords shouldn't prevent Decipher from learning the layout of each invoice template. Without keywords Decipher will learn the layout through training in the data verification stage.
Having keywords is great because it gives Decipher something to start with when it hasn't seen a particular layout before, but without them it will use it's rules-based ML training. With each trained document, Decipher will learn and improve. I've done many use cases without any keywords, though you may need to reset your training data. Check out our best practice guide.
If you're unsure of how best to approach this I recommend seeing if you can get help with a knowledge support session. A member of our professional services team will be able to help get you up and running.
Thanks
03-01-24 04:41 PM
Ben, I am familiar with best practice guide, but our case isn't that simple.
First of all we have like 6000 invoices to process with at least 2000-3000 templates, every month there is a big bunch of new invoices with new templates. So even with machine learning "on" providing good results without keywords in sample headers will be not possible.
Second thing is that there is no option on production to correct Decipher choices after processing and we cannot set manual data capture verification for production process. Or maybe I am not familiar with "correction after processing" option? So how decipher would learn based on wrong choices? I was thinking about collecting incorrectly processed invoices, then uploading it to test environment to manually confirm data capturing with ML turned on and then to upload training data to prod env, but it is again- not effective and not prone to production processing error.
So... the only way to get correct invoice number in my point of view is to use those sample headers but than I cannot use this value, which was assigned to sample header in other field value. Is it normal decipher behavior? Those described "stealing"?
04-01-24 08:16 AM
Hi Wojciech,
I appreciate the complexity of your use case (as best I can without the full detail). Where we've had customers with similar volumes of new invoice layouts, it was generally found that they worked better with minimal generic keywords. I would also start with the additional ML in the Document Type disabled, this would only improve an already well functioning set of training data.
If a new layout arrives in Decipher, it will only be able to auto-process if it's highly confident, else it will be held for manual verification. The training data (rules-based model) could only do this with keywords and previous training of this layout. If the additional ML is enabled it will use the training of unrelated invoices and could give you a false positive, incorrectly marking it as high confidence.
It's not a case of "stealing" the field, if you tell a person to look for the word "Invoice" for two fields, how will they know which is the correct field without previous training of the layout? Again it's recommended not to use words which are likely to be repeated in a document or could have different use across your different vendors/layouts.
All IDP tools need training and many do not use pre-defined keywords, so it is possible to process this volume without them, you will just need to verify 2/3 documents per layout (some of this can be done in production, once the DFD is complete). Additionally from Decipher 2.2 (current release), the training data will build its own keywords for each field per layout (this is stored in the training data and is not user accessible). These keywords can be completely different from layout to layout and will help Decipher if the field moves within the layout.
I really do recommend looking at reaching out to our professional services for support with such a complex use case, you may be entitled to some time with them included in your support agreement.
Thanks
04-01-24 01:46 PM
Regarding to "It's not a case of "stealing" the field, if you tell a person to look for the word "Invoice" for two fields, how will they know which is the correct field without previous training of the layout? Again it's recommended not to use words which are likely to be repeated in a document or could have different use across your different vendors/layouts." - I totally disagree. Case is not to assign same value to two different fields or even decide to which it should be assign (btw misc. params should work - another ticket for that already created), but case is that word "Invoice" is Sample header for one field and value for other and Decipher not letting to do that.
Due to "If a new layout arrives in Decipher, it will only be able to auto-process if it's highly confident, else it will be held for manual verification. The training data (rules-based model) could only do this with keywords and previous training of this layout. If the additional ML is enabled it will use the training of unrelated invoices and could give you a false positive, incorrectly marking it as high confidence." - I am not sure how to set it. Is it QC option? Can you elaborate more or link the guide/documentation about that?
I already booked help session on Monday. Thanks for your assistance but still I cannot understand is this named by me "stealing case" normal situation.