26-02-24 10:56 PM
Hi Team -
I am trying to configure a document that has two main formatting variations depending on the sender. This is due to our external partners using different submissions services to fill out the document and send it back to us. These services seem to change the formatting from the initial template.
I can successfully configure one version, but then Decipher struggles when exposed to the other version. I am concerned that exposing it to both versions will simply confuse the training and ultimately lead to it not being able to handle either with consistent accuracy. Additionally, I am not sure if it is viable to create an entirely separate document type/DFD for each version since they are so similar.
I have found an option to create multiple versions of the same DFD depending on the information present in a given field. This seems promising, but it is unclear to me if these different versions will actually respond separately to training and learn to adjust the location of their regions independent of one another to account for the variations in formatting.
Has anyone had success handling variations in formatting of the same document? Either through DFD variations or another means?
Answered! Go to Answer.
04-03-24 01:27 PM
Hi Nathan,
It should be evident when the previous training is no longer having an impact on the variant layout when previously trained regions are no longer being automatically retrieved. So in the event all the regions are correctly mapped, return the batch without making any changes or saving, amend the DFD parameter and restart the batch at the Capture stage. Repeat this until it looks like fields are being missing because the training isn't being applied.
At this point assign the missing fields to regions and submit the batch. Then upload a batch with each layout variant to test the training has worked.
The DFD should be able to handle multiple variants of the same document type, assuming the fields to be retrieved are the same, irrespective of the actual layout.
Thanks
27-02-24 02:31 PM
Hi Nathan,
Decipher's main ML training uses a common pool, which is shared between all DFDs (can be set to DFD specific in 2.3). So you may not see any benefit in training the variations in a second DFD.
The training data is checked for each page uploaded to Decipher for a matching format, this match defaults to 60% but can be adjusted to suit your use case. It might be worth seeing if the variations can be picked up by increasing this match percentage with the Misc Parameter "TemplateMinMatchPercentage". This can be added to any field and can be set to equal an integer e.g. TemplateMinMatchPercentage=70.
Train the main document format, then upload one of the slightly different ones to see if the training is being used. If it is, then increase the match percentage and restart the batch at the Capture stage.
Let me know how you get on.
Thanks
01-03-24 10:12 PM
Hi Ben! Thank you for your insight. I do have a clarifying question. I understand using the main document to train initially, then upping the TemplateMinMatchPercentage value until it does not apply the training to the variant document. However, I am not clear on how I would proceed from there. Would I draw new regions on the variant to establish a new format to be applied? If so, would it be correct to say that in this scenario the goal is to establish two different formats that both fall under the same DFD?
04-03-24 01:27 PM
Hi Nathan,
It should be evident when the previous training is no longer having an impact on the variant layout when previously trained regions are no longer being automatically retrieved. So in the event all the regions are correctly mapped, return the batch without making any changes or saving, amend the DFD parameter and restart the batch at the Capture stage. Repeat this until it looks like fields are being missing because the training isn't being applied.
At this point assign the missing fields to regions and submit the batch. Then upload a batch with each layout variant to test the training has worked.
The DFD should be able to handle multiple variants of the same document type, assuming the fields to be retrieved are the same, irrespective of the actual layout.
Thanks
03-04-24 03:09 PM
Hey Ben! Wanted to let you know I was able to use this misc parameter to successfully train different models to account for variations. I appreciate the help!