Decipher IDP Fails to recognize multiline text in pdf

Chaithanya340 · ‎24-02-22

Hello there,
We have trained decipher to extract data from pdf document with tabular data. Since the text is closely aligned to table border, decipher is not capturing data properly for multiline text in a grid. We have also provided ML model and trained the same document more than 20 times, Yet it is not capturing required data. Can anyone suggest how to improve the accuracy of the model????? and also last trained in ML is always N/A ,,, is there any specific reason for that ?

Ben.Lyons1 · ‎24-02-22

Hi Chaithanya,

Unfortunately your images are too small to see the detail needed to better understand your use case, but I will advise as best I can.

If the string you're trying to read can be validated either by a Format Expression (Regex), Formula or List, this will help Decipher more consistently recognise the text. However, text near a border is a challenge.

I would advise against training the same document sample multiple times as Decipher will begin the strongly associate not only the the field names but the values as part of the document. So when you process a new sample, Decipher may fail to recognise the new values as quickly.

I would also advise against training an ML model on such a small document count as it will not be sufficient to create an adequate model.

Please refer to our best practice guide for more information.

Thanks

Ben Lyons
Principal Product Specialist - Decipher
SS&C Blue Prism
UK based

SS&C Blue Prism Community

Decipher IDP Fails to recognize multiline text in pdf