Decipher OCR struggling with trailing 5

denzain · ‎19-01-22

We are using Decipher to read remittance data (document has a few fields and table) but in many cases it reads the number 5 from the tabular section wrong when it is present at the end of a number. e.g. 12343.65 can be read as 12343.66 or 12343.68. There frequency of getting erroneous 6 was a lot higher than getting a 8. It doesn't always read it wrong.
I am overcoming this problem by checking whether the sum of the table column matches the total read from a field. But it is causing many documents to get stuck for verification because my check stops such bad data from going. Is there anything that can be done to correct the OCR to read the trailing digit 5 correctly in tabular section?

------------------------------
Kalash Sharma
Europe/London
------------------------------

HarpreetKaur · ‎19-01-22

Hello Kalash,

The fact that the OCR is incorrectly reading the digit '5' as '6'or '8', the first thing I would check here is:
1. If the doc is digital, do a self copy-paste check and see what digit is manually being picked up. If it is the incorrect one then the OCR will pick the incorrect value as well. This usually is an indication that the quality of the document needs to be enhanced. Another thing to look at could be if there are any boundary lines that are too close to that field that might be distorting the data capture.
2. If it is a scanned document, when you're in the data verification screen, hover your mouse on the rectangular block created around that field by Decipher and see what value is the OCR reading. Again if its the incorrect one, chances are that the doc quality may not be up to the required standard.

Another thing you could try is, compare the documents where Decipher picks up the correct values to the ones where Decipher errors out. That could give you some indication on those subtle variations.
As a precaution to avoid passing through false positives, there isn't any self correcting mechanism within Decipher. You are on the right track to set those validations to catch those erroring docs in Decipher rather than them being passed on with incorrect values read.
The correcting will happen with time, once Decipher builds on its rules based training and eventually the Capture model. But until then there is a need to spend some time correcting the docs quality and manually handling the ones getting caught up in Decipher.

Regards
Harpreet

------------------------------
Harpreet Kaur Product Consultant
------------------------------

View answer in original post

HarpreetKaur · ‎19-01-22

Hello Kalash,

The fact that the OCR is incorrectly reading the digit '5' as '6'or '8', the first thing I would check here is:
1. If the doc is digital, do a self copy-paste check and see what digit is manually being picked up. If it is the incorrect one then the OCR will pick the incorrect value as well. This usually is an indication that the quality of the document needs to be enhanced. Another thing to look at could be if there are any boundary lines that are too close to that field that might be distorting the data capture.
2. If it is a scanned document, when you're in the data verification screen, hover your mouse on the rectangular block created around that field by Decipher and see what value is the OCR reading. Again if its the incorrect one, chances are that the doc quality may not be up to the required standard.

Another thing you could try is, compare the documents where Decipher picks up the correct values to the ones where Decipher errors out. That could give you some indication on those subtle variations.
As a precaution to avoid passing through false positives, there isn't any self correcting mechanism within Decipher. You are on the right track to set those validations to catch those erroring docs in Decipher rather than them being passed on with incorrect values read.
The correcting will happen with time, once Decipher builds on its rules based training and eventually the Capture model. But until then there is a need to spend some time correcting the docs quality and manually handling the ones getting caught up in Decipher.

Regards
Harpreet

------------------------------
Harpreet Kaur Product Consultant
------------------------------

denzain · ‎20-01-22

Thanks for replying, Harpreet. It is valuable.

You also said that "correcting will happen with time, once Decipher builds on its rules based training and eventually the Capture model."

So Is the learning ability also present and active when we manually correct the OCR values read into the table section of DFD? Please can you confirm so that I can relay that to the customers?

--

Thanks and Regards,
Kalash

HarpreetKaur · ‎20-01-22

The learning capability is present in the Machine Learning capture models within Decipher.
Once Decipher starts giving you satisfactory results from the rules based training, that is usually a good time to create an ML model for those doc types.
This is where Decipher further adds on to its existing knowledge automatically. Depending on how many no. of docs you set when you're creating the model (usual recommendation is 1000), Decipher takes the time to learn from all the activity that has happened on those 1000 docs and include it into its ML model.
The rules based training will always be the foundation of your doc processing but ML models are what enhance it further.

Regards
Harpreet

------------------------------
Harpreet Kaur Product Consultant
------------------------------

denzain · ‎07-04-22

Hi Harpreet,

Based on Decipher documentation the 'Machine Learning' is not applicable for the data coming from the tables. Please see the highlighted text below. My problem is related to the trailing 5 digit read in the table. Perhaps I should have been clearer. Please could you confirm whether there is any other option rather than improving the PDF visual quality exists OR totals check which I am doing currently?

------------------------------
Kalash Sharma
IA Consultant
Europe/London
------------------------------

Ben.Lyons1 · ‎11-04-22

Hi Kalash,

If the confidence of a value is high, Decipher's unlikely to change the result over time. This comes down to the 2 confidence values it uses for retrieving data, the first is OCR confidence and the second is capture confidence. If the OCR confidence is high, this won't be automatically changed. So as Harpreet has suggested, in a digitally created pdf, where the value may be incorrect, the confidence will be 100%.

Please can you share a screen shot for the DFD and that part of the document?

Thanks

------------------------------
Ben Lyons
Senior Product Specialist - Decipher
Blue Prism
UK based
------------------------------

Ben Lyons
Principal Product Specialist - Decipher
SS&C Blue Prism
UK based

denzain · ‎13-04-22

Ben,

We raised a ticket 207044 for this. The information is attached to the ticket.

Regards,
Kalash

------------------------------
Kalash Sharma
IA Consultant
Europe/London
------------------------------

Ben.Lyons1 · ‎13-04-22

Hi Kalash,

Thanks for sending that in.

Our development team are on the case and we are hopeful of having an update in v2.2 (due in June 2022) to better handle these types of fields. Once I get a chance to test it, I'll be able to confirm.

Thanks

------------------------------
Ben Lyons
Senior Product Specialist - Decipher
Blue Prism
UK based
------------------------------

Ben Lyons
Principal Product Specialist - Decipher
SS&C Blue Prism
UK based

SS&C Blue Prism Community

Decipher OCR struggling with trailing 5