Data Verification - Incorrect Data Extraction
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
27-02-24 04:26 PM
Has anyone had the same Issue with zero being read as letter 'O' in fields that require a mixture of Letters & Number?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
05-03-24 10:57 AM
Can you tell us more about the data itself such as where the data is taken from, excel etc? It might help us understand whats causing the issue.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
05-03-24 12:19 PM
The documents are in PDF format & read as an Invoice Reference. For example for Invoice ref INV00126 could be captured in the verification stage as INVO0126.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
05-03-24 02:26 PM
Ah from a pdf can be awkward, are you using ocr to identify and extract the information? have you tried using 'Get text' and regex to extract the information?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
06-03-24 05:42 PM
Hi Michael,
Regex works well when the field is structured, however, we see issues with O/0 and I/1 in the PDFs we process in Decipher as they contain a free-form "Reference" field provided by clients (alphanumeric, no standard length and sometimes containing dashes). Please elaborate on the "Get text" you mentioned as I can't find anything on it in the Decipher documentation and it sounds like something we should look into.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
07-03-24 07:59 AM
Hi Stuart,
This could be due to the document resolution (not necessarily the same thing as document quality). Decipher uses Tesseract OCR to read the text which is optimised for 300dpi, this is an important factor when considering how it's trying to read various fonts. So a font rendered at 300 dpi will have a slightly different appearance to one rendered at 250 dpi, this can cause similar characters to be mistaken. (Though it may also be due to a poor quality scan).
If possible I would recommend using a Format Expression as Decipher can use this to better verify characters prone to this type of 'mistaken identity'. In this case perhaps the following expression would work "(INV[0-9]{5})". If this would cause issues for other invoices you could set this up in a Specific Version.
Thanks
Ben
Principal Product Specialist - Decipher
SS&C Blue Prism
UK based
