cancel
Showing results for 
Search instead for 
Did you mean: 

Regex Crash problem

LuigimariaCerto
Level 3
Good Morning,

I'm developing a process that extract from a pdf multiple values.
I extract this values using Utility - Strings (BP v. 6.6.0.15260) supported by the "function" Extract Regex Values. 
Every time I execute this block BP platform stays in waiting forever and I have to close it with taskmng.
The same problem with AVOX.Regex (Digital Exchange) only when I extract multiple Regex Values.

The regex pattern: (?<X>\d{10})( |\n)(?<Y>\d{1,4})( |\n)((\w{1,8}|\d{1,8}|(:|,|.| |:.))*)*( |\n)(?<Z>\d{1,4})( |\n)(?<K>\d{1,4})"


Is there a solution to my problem? 

Thank you Community!

------------------------------
Luigimaria
------------------------------
3 REPLIES 3

Hi, Luigimaria,

can you provide string which should return something? I am using 6.4.2 and it does not hang. I am using NEOOPS Regex VBO but I doubt it would be different from AVO or BP. I will shortly install 6.6 and let you know the results but it would help to get some string which is returning some value after applying regular expression.

Regards,

------------------------------
Zdeněk Kabátek
Head of Professional Services
NEOOPS
http://www.neoops.com/
Europe/Prague
------------------------------

Hi, Luigimaria,

I tested it on 6.6 and it works - I mean it does not hang but it does not return any data as I don't want to reverse engineer what the regex is supposed to return ;). If you provide me the input string which contains the correct value I can test it for you.

Regards,

------------------------------
Zdeněk Kabátek
Head of Professional Services
NEOOPS
http://www.neoops.com/
Europe/Prague
------------------------------

Morning Zdeněk,

Thank you for the tests, I found out that the regex pattern wasn't well implemented.
If you try to test the following strings, no quotes not mandatory, BP crashes....
The reason why is the regex pattern creates a loop that never stops because the Z pattern is never found. (regex101.com, catastrophic backtracking)
It happens when the comment is too long and the z value (3) is incorporated in the comment "PANEL 123MM SPRAY PAINTED"
"1212341234 PANEL 123MM SPRAY PAINTED2"

So I want this instead:
"1212341234 5 PANEL 123MM YELLOW-GRAY PATTERN 3 2"
The red spaces are mandatory.

Anyway the solution is the following "(?<X>\d{10})( |\n)(?<Y>\d{1,4})( |\n).*( |\n)(?<Z>\d{1,4})( |\n)(?<K>\d{1,4})" if nothing found the pdf is rejected. 
So, I think that some checks needs to be implemented in order to manage this unlucky cases in a future update.

------------------------------
Luigimaria
------------------------------