19-08-24 04:59 PM
I have been troubleshooting getting certain fields to read. Determined today that on the initial batch load the fields fail. If I return the batch and then in batch admin restart the batch at the OCR step and get back to data verification(no issue with class verification) the fields are now reading as expected.
I found the logs on the machine running the client services but nothing is standing out as an issue.
20-08-24 08:28 AM
Hi @douglas.h.burke ,
It sounds like you have a pdf which includes both vector data and non-vector data. What that means is that some of the text is embedded as metadata and can be read without using OCR, often this can be selected when opening the document in Adobe Reader.
But then some of the data is 'flattened' and can't be selected.
Decipher can extract the vector data without using OCR and in some situations this will skip the OCR stage, hence why you could then read the data after returning the batch to the OCR stage.
This was more common in older versions, but an update in Decipher 2.2 introduced the automatic functionality that would extract the vector data and check for additional data using OCR, providing a blend of this data in data verification. (see feature detail below)
What version of Decipher are you currently using?
Thanks
20-08-24 01:13 PM
@Ben.Lyons1 I am using version 2.3. Here is the screenshot from the about section for the full version.
20-08-24 02:51 PM
That's strange, 2.3 should get the vector and ocr data.
Is the document a pdf? Are you able to select the text when opened in Adobe Reader?
What happens if you covert it to a jpeg and upload it to Decipher?
What are the languages/regions set to in the Document Type and Batch Type?
20-08-24 05:27 PM
Yes, it is a pdf and the field text is selectable when opening in Adobe Reader.
Converting the file to jpeg and uploading had the same result on the initial load and when reprocessing the OCR step was worse in that it still did not read the text.
Batch and Document type only has English has the primary language and no secondary.
21-08-24 08:01 AM - edited 21-08-24 08:01 AM
That doesn't sound like Decipher's performing properly, I would recommend raising a support ticket so we can investigate.
2 weeks ago
G'Afternoon @douglas.h.burke 😀 Did you end up opening a ticket and was this item resolved for you yet? I have a some what similar issue with pdf document. We are using Decipher IDP version 2.3.2 Release notes. My ticket was opened Oct 16th. I would love to hear if/how you were able to remedy the issue you described here.
Thanks in advance,
JD
2 weeks ago - last edited 2 weeks ago
Yes, I did open a ticket but it was really the update from 2.3 to 2.3.2 that resolved my issue. After the update there were still some additional configuration that was needed in the document form definition(DFD) to refine the performance and get all fields to consistently identify. We were lucky the documents we are processing are structed PDFs so turned on "Strict Position" in the Misc parameter for most fields in the DFD.
We also learned anything that is beyond a single line should be defined as mutli-line. This might be obvious to others but new to OCR and DFD creation so was not. PDFs were digital signed, so while this is not the traditional editable multi-line input/paragraph field the details of digital signing get cast on the document in multiple lines. My recommendation is when defining regions in data verification step, if you scale the region bigger than a single line of the font size, set as multi-line in DFD.
I would also recommend if you have processed multiple samples the model could be bad. After our update we created DFDs with only the fields we had issues, so it narrowed down issues to troubleshoot. Once we had a better understanding of how the fields needed to be set in the DFD, we created a new DFD so we could train the model with all correct field settings.