OCR region marking by Decipher

KrishnaElapavul · ‎26-05-21

Hi all
I would like- to understand How Decipher mark the OCR regions in PDF processing ?

is it for each line as one region ? but this is not happening always
Is it for each word (how it guess each word- based on space) ? but this is not happening always I wonder what logic uses for this region marking by Decipher ?

Please share your views

------------------------------
Krishna Elapavuluri
TEchnology Consultant
DXC.technology
Asia/Kolkata
------------------------------

Ben.Lyons1 · ‎27-05-21

Hi Krishna,

The OCR stage doesn't define what is marked in the document, this is done during the capture stage.

The OCR engine looks to recognise what is text and what characters they are/could be. The capture client uses this information with the training data to determine what will be highlighted as a region. So it will have a general idea based on the spacing of the text and layout, but following training it will update how some of the regions are separated.

E.g. without training it may outline "Invoice No: 0123456", but once trained it may separate those if you have previously selected "0123456".

Regards

------------------------------
Ben Lyons
Product Consultant
Blue Prism
UK
------------------------------

Ben Lyons
Principal Product Specialist - Decipher
SS&C Blue Prism
UK based

SS&C Blue Prism Community

OCR region marking by Decipher