Data Verification - Incorrect Data Extraction

Stephen__Guest · ‎27-02-24

Has anyone had the same Issue with zero being read as letter 'O' in fields that require a mixture of Letters & Number?

michaeloneil · ‎05-03-24

Can you tell us more about the data itself such as where the data is taken from, excel etc? It might help us understand whats causing the issue.

#MVP

Stephen__Guest · ‎05-03-24

Hi @Michael ONeil

The documents are in PDF format & read as an Invoice Reference. For example for Invoice ref INV00126 could be captured in the verification stage as INVO0126.

michaeloneil · ‎05-03-24

Ah from a pdf can be awkward, are you using ocr to identify and extract the information? have you tried using 'Get text' and regex to extract the information?

#MVP

stuart.mar · ‎06-03-24

Hi Michael,

Regex works well when the field is structured, however, we see issues with O/0 and I/1 in the PDFs we process in Decipher as they contain a free-form "Reference" field provided by clients (alphanumeric, no standard length and sometimes containing dashes). Please elaborate on the "Get text" you mentioned as I can't find anything on it in the Decipher documentation and it sounds like something we should look into.

Ben.Lyons1 · ‎07-03-24

Hi Stuart,

This could be due to the document resolution (not necessarily the same thing as document quality). Decipher uses Tesseract OCR to read the text which is optimised for 300dpi, this is an important factor when considering how it's trying to read various fonts. So a font rendered at 300 dpi will have a slightly different appearance to one rendered at 250 dpi, this can cause similar characters to be mistaken. (Though it may also be due to a poor quality scan).

If possible I would recommend using a Format Expression as Decipher can use this to better verify characters prone to this type of 'mistaken identity'. In this case perhaps the following expression would work "(INV[0-9]{5})". If this would cause issues for other invoices you could set this up in a Specific Version.

Thanks

Ben

Ben Lyons
Principal Product Specialist - Decipher
SS&C Blue Prism
UK based

SS&C Blue Prism Community

Data Verification - Incorrect Data Extraction