Does the class verification step trains the classification model?

XavierGruchet · ‎19-08-22

Hello,

Really important question. I am confused with the classification model. If I understand well, it needs first to be trained by pushing different batches of documents by document type. Once done, we can change the setting and mark it as trained and update the model.

Then any future documents go by default to class verification and we can amend/confirm the type of document. During this process, does the model continue to be trained? Or is the training done meaning that the corrections made by humans at this stage wont be taken into account?

If the training also happens at the class verification stage, can we skip the pushing of batches by document type and train the classification model with only the class verification interface? Then the model can be trained over time without the necessity to create batches of documents.

Thanks

Ben.Lyons1 · ‎22-08-22

Hi Xavier,

At this time classification training is a one-off training process. So when you've uploaded all training batches and successfully created your classification model, it will not iterate with additional documents.

Documents which require manual verification in the classification stage will not update the model. If you need it to update and reflect these, a new model will need to be created.

Thanks

Ben

Ben Lyons
Principal Product Specialist - Decipher
SS&C Blue Prism
UK based

XavierGruchet · ‎22-08-22

Thanks Ben, really clear.

Few additional questions:
- In case we've got a new document type or we want to train again one of the document types, does it mean we have to train everything again or there is a possibility to have incremental changes?
- In case a document doesnt correspond to any of the documents that we have trained so far, is there a possibility to have it classified in a document type "Other", like an exception?

Thanks

XavierGruchet · ‎22-08-22

And additional questions, when pushing the documents by document type in this one off training process, is it possible to push png documents? And when pdf documents are pushed, is there an impact whether the pdf document has a text format or image format (text format meaning that we can copy paste directly the text from the pdf)? If not, does it mean that the document is directly converted to an image and then the classification model is only based on image to text capabilities?

Ben.Lyons1 · ‎23-08-22

Hi Xavier,

If you need to update the training due to a new document type or a document not performing as desired, you will need to create a new model/retrain the existing model in full. Retraining is a little difficult though, as you can't simply delete the model and keep the name, nor can you simply upload new documents as training batches. In some respects it's best to start fresh.

You can have a document type to represent exceptions, you don't need to provide any examples or link it to a DFD. This would mean any documents that don't meet the confidence requirements can be manually assigned to it. You may want to include a DFD and set up some fields for the exception detail, which will help your BP process understand it. Make the fields unassignable and they won't create any document training.

PNG documents are supported in Decipher for both classification and data extraction. You can also use all types of PDF, JPG and BMP. Classification models are trained based on the common locations of key text values and string types. This will be done via OCR extraction or vector based extraction for PDF's with selectable text.

Thanks

Ben

Ben Lyons
Principal Product Specialist - Decipher
SS&C Blue Prism
UK based

XavierGruchet · ‎23-08-22

Thanks Ben. About "This would mean any documents that don't meet the confidence requirements can be manually assigned to it.". What do you mean by manually assigned to it? In the Classification Verification interface, if the document doesnt meet the confidence requirements of the document types that have been trained, what will be the default assignment? Is it possible to have as a default assignment the document type "Other" which was not trained? Or the document always needs to be assigned manually to this document type?

Ben.Lyons1 · ‎23-08-22

Hi Xavier,

In the classification verification screen, you can select any of the document types from a dropdown, to assign the appropriate selection. You'll notice a percentage next to each possible selection where the classification is not 100% confident on a single document type.

There's no default value, the classification model will assign the document type with the highest confidence (even if the confidence is quite low). The confidence threshold for automatic verification is 95% by default, but this can be changed in the document type configuration.

Any document that doesn't meet the appropriate threshold for a document type will be held for manual verification.

Thanks

Ben

Ben Lyons
Principal Product Specialist - Decipher
SS&C Blue Prism
UK based

SS&C Blue Prism Community

Does the class verification step trains the classification model?