cancel
Showing results for 
Search instead for 
Did you mean: 

Couldn't capture data a certain region in Decipher IDP

Sangjun
Level 5

Hello, everyone,

I’m currently working on automating the processing of invoice documents using Decipher IDP.

However, I’ve encountered an issue where data extraction fails for a specific field on the document.

que1.jpg

As shown in the screenshot, a bounding box is drawn around the invoice number, but the data itself isn’t extracted.

que2.jpg

Interestingly, if I manually click the Refresh Region button, the correct value is extracted with 100% accuracy.
(In other words, the field is correctly read only when someone intervenes to press the Refresh Region button.)

que3.jpg

I’d like to know if there’s a way to resolve this issue.

I made a DFD definition like below

que4.jpg

ID : FT_1_USER_FIELD
Format : Text
Flags : Assignable, Required, AutoCalculate
Format Expression : ^\d{3}-?\d{2}-?\d{5}$                        <---- an expression for tax number
Dependent Items : FT_1_USER_FIELD
Formula : STRREPLACE(FT_1_USER_FIELD, " ", "")          <---- Aims To get rid of empty space

 

If anyone has encountered and solved a similar problem, I’d greatly appreciate your advice. Thank you!

5 REPLIES 5

Ben.Lyons1
Staff
Staff

Hi @Sangjun ,

I would recommend removing the formula and deselecting the Auto-Calculate flag as there is potentially a more efficient way to remove the unwanted space.

You can use the misc parameter "RegexMode" and set it to "2", which uses a fuzzy matching method and can automatically remove unwanted spaces.

You can see in my example where I'm using the expression [A-Z]{2}[0-9]{5} that the space is causing an error.

BenLyons1_0-1733127998955.png

I then set the Regex Mode to 2.

BenLyons1_1-1733128052998.png

And retry the same document, this time the space is removed by Decipher.

BenLyons1_2-1733128115298.png

Ensure you test this with other documents as other characters can be replaced e.g. o can be changed to 0.

Also when using a formula like this, it's best to use the special variable SELF e.g. STRREPLACE(SELF, " ", "").

Kind Regards

Ben Lyons
Principal Product Specialist - Decipher
SS&C Blue Prism
UK based

Dear Ben Lyons,

Even after applying the method you suggested, the issue has not been resolved.

When I create a new DFD specification and test it with only the StrictPosition=On option, the region with the rectangular border is formed, but the issue of not capturing the tax number inside remains the same.

The resolution of the PDF file seems fine, and I believe the data is of a vector type,

but I am unsure why this issue is occurring.

Sangjun
Level 5

Anyhow, I am using the method you suggested, Not using formula and Auto-Calculate flag, setting  'RegexMode=2' into Misc-parameters in order to get rid of empty spaces.

Hi @Sangjun ,

I have encountered issues similar to what you mentioned. Below are the steps I usually follow. In some cases, I have observed that Decipher can automatically pick up the fields after performing these steps. You can give it a try.

Take a backup of the training data.

Delete the existing training data.

Train the document again. When starting the training process, please select only the “Assignable” flag and remove the formulas and miscellaneous parameters.

Regards,

Athiban

Ben.Lyons1
Staff
Staff

Hi @Sangjun ,

I agree that it might be useful to restart your training as a new DFD will use the same training data (unless the segregation option has been selected).

It can also sometimes be that the hyphen character "-" is not the same unicode character read by the pdf extraction or OCR engine. It could be worth trying this expression ^\d{3}.?\d{2}.?\d{5}$ .

Though the first step would certainly be to test it without any expression.

Thanks

 

Ben Lyons
Principal Product Specialist - Decipher
SS&C Blue Prism
UK based