Decipher IDP’s configuration setting 'Data capture per page timeout'

JD_CPU · ‎11-10-24

Good Afternoon Everyone,

https://docs.blueprism.com/bundle/decipher-idp-2-3/page/user-guide/configuration.htm#Timeout

States that:

OCR per page timeout is "The timeout in seconds to OCR a page."
Data capture per page timeout is "The timeout in seconds for Data Capture processing a single page."

What affect do these two timeout values actually have on the results provided by rules-based training?

We have a situation where a rules-based.td file from Decipher IDP 2.2 yields similar but not as accurate results when used with the September release of Decipher IDP 2.3.2

I understand that rules-based training captures data from the following elements, referred to as 'hints', defined in the DFD:
Keywords
Data types
Lists
Regex
Formula
Location (after user feedback)

But what actually happens DURING the process of capturing data when Decipher exceeds the 'Data capture per page timeout' period?

Does Decipher simply display the best results found prior to the timeout rather than the best results possible if provided additional time with a larger timeout period? Similarly, would increasing RAM improve the results provided if/when Decipher ever exceeds the Data capture per page timeout period? Is there a way for an Administrator to tell if/when a submitted batch exceeds a 'Data capture per page timeout' or an 'OCR per page timeout'?

Much Appreciated,

JD

Ben.Lyons1 · ‎14-10-24

Hi JD,

The timeouts are there to protect your machine from getting stuck on a page forever, due to either a system error or a corrupted page. Theoretically these limits should never be hit, assuming your machine has adequate resource (CPU and RAM).

I don't believe any results would be provided from the respective client if the timeout is hit.

Every update to Decipher improves the Capture engine to improve data extraction, so I would expect to see an improvement in performance between those 2 versions. If this is not your experience, please raise a support ticket with as much detail as you can provide so that we can investigate.

Thanks

Ben Lyons
Principal Product Specialist - Decipher
SS&C Blue Prism
UK based

View answer in original post

JD_CPU · ‎19-11-24

The support ticket was just closed. We learned that Decipher Release 2.3.2 is sometimes unable to accurately update/improve upon a rules-based training file.td that was created with Decipher 2.2.1. Our solution was to simply Lock the DFD immediately upon importing the .td file into 2.3.2

If this was NOT done, as soon as we made a correction during manual data validation and pressed "SUBMIT" every batch that is pushed afterwards was MUCH less accurate selecting the correct fields. Additional training of 100+ documents of similar type using Release 2.3.2 did NOT improve results.

Specifically, in our case, only the first row of a table would ever be automatically identified by Decipher. Apparently, this result is similar to the previously known issue, documented internally as "DS-2310". DS-2310 was known to affect a previous version of Decipher and 2.3.2 was supposed to have already remedied it. But as this current behavior was able to be reliably duplicated by BP Support using Decipher version 2.3.2, they now realize additional work still remains to be done in this regard.

As long as the DFD remains locked you should still see the same results in 2.3.2 as you did when using 2.2.1.

Another possible workaround/solution proposed by BP Support was to try the new misc parameter TemplateMinMatchPercent = 75

In theory, you should only have to place this misc parameter on any single row of your DFD table to affect the entire table. We never tried/tested using the TemplateMinMatchPercent parameter as Locking the DFD prevents 2.3.2 from ever attempting to improve (and corrupting) the existing good results of our .td file that was created by the previous version of Decipher.

Just wanted to post this here in case it helps others in the future. It took considerable time and effort to correctly identify the exact cause of this now re-surfaced/known issue so that BP Developers are now able to duplicate the error on their own.

misc parameter TemplateMinMatchPercent = 75

View answer in original post

Ben.Lyons1 · ‎14-10-24

Hi JD,

The timeouts are there to protect your machine from getting stuck on a page forever, due to either a system error or a corrupted page. Theoretically these limits should never be hit, assuming your machine has adequate resource (CPU and RAM).

I don't believe any results would be provided from the respective client if the timeout is hit.

Every update to Decipher improves the Capture engine to improve data extraction, so I would expect to see an improvement in performance between those 2 versions. If this is not your experience, please raise a support ticket with as much detail as you can provide so that we can investigate.

Thanks

Ben Lyons
Principal Product Specialist - Decipher
SS&C Blue Prism
UK based

JD_CPU · ‎19-11-24

The support ticket was just closed. We learned that Decipher Release 2.3.2 is sometimes unable to accurately update/improve upon a rules-based training file.td that was created with Decipher 2.2.1. Our solution was to simply Lock the DFD immediately upon importing the .td file into 2.3.2

If this was NOT done, as soon as we made a correction during manual data validation and pressed "SUBMIT" every batch that is pushed afterwards was MUCH less accurate selecting the correct fields. Additional training of 100+ documents of similar type using Release 2.3.2 did NOT improve results.

Specifically, in our case, only the first row of a table would ever be automatically identified by Decipher. Apparently, this result is similar to the previously known issue, documented internally as "DS-2310". DS-2310 was known to affect a previous version of Decipher and 2.3.2 was supposed to have already remedied it. But as this current behavior was able to be reliably duplicated by BP Support using Decipher version 2.3.2, they now realize additional work still remains to be done in this regard.

As long as the DFD remains locked you should still see the same results in 2.3.2 as you did when using 2.2.1.

Another possible workaround/solution proposed by BP Support was to try the new misc parameter TemplateMinMatchPercent = 75

In theory, you should only have to place this misc parameter on any single row of your DFD table to affect the entire table. We never tried/tested using the TemplateMinMatchPercent parameter as Locking the DFD prevents 2.3.2 from ever attempting to improve (and corrupting) the existing good results of our .td file that was created by the previous version of Decipher.

Just wanted to post this here in case it helps others in the future. It took considerable time and effort to correctly identify the exact cause of this now re-surfaced/known issue so that BP Developers are now able to duplicate the error on their own.

misc parameter TemplateMinMatchPercent = 75

SS&C Blue Prism Community

Decipher IDP’s configuration setting 'Data capture per page timeout'