DFD definition for multiline REGEX
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
14-12-22 09:32 AM
Dear Community,
in my organization we are using decipher in some projects. We have find ourselves with a knowledge limitation.
We are trying to extract a phrase frome PDFs. This phrase can be divided into 2,3,4 or five lines, depending on the structure.
We can define a Regular expression that contemplates this issue, but it seems that decipher is not able to execute the Regex through multiple lines, so it doesn´t recognise our intended phrase.
Examples:


We need a DFD with a regex that extracts the phrase - image 1: "DEMARCACIÓN DE CARRETERAS DEL ESTADO EN CATALUÑA" - image 2: DEMARCACIÓN DE CARRETERAS DEL ESTADO EN CASTILLA Y LEÓN OCCIDENTAL"
We have this regex working: (DEMARCACION)[\s\n\r]*(DE)[\s\n\r]*(CARRETERAS)[\s\n\r]*(DEL)[\s\n\r]*(ESTADO EN)[\s\n\r]*(CATALUÑA|CASTILLA-LA MANCHA) in other languages, but the limitation through lines in decipher doesn´t allow us to succed with it.
Please, if you have an idea to get through this issue, it will help a lot. I have seen other post regarding this problem, but it hasn´t help at all in our case.
Best regardas, have a nice day!
------------------------------
Arturo Garcia
------------------------------
in my organization we are using decipher in some projects. We have find ourselves with a knowledge limitation.
We are trying to extract a phrase frome PDFs. This phrase can be divided into 2,3,4 or five lines, depending on the structure.
We can define a Regular expression that contemplates this issue, but it seems that decipher is not able to execute the Regex through multiple lines, so it doesn´t recognise our intended phrase.
Examples:
We need a DFD with a regex that extracts the phrase - image 1: "DEMARCACIÓN DE CARRETERAS DEL ESTADO EN CATALUÑA" - image 2: DEMARCACIÓN DE CARRETERAS DEL ESTADO EN CASTILLA Y LEÓN OCCIDENTAL"
We have this regex working: (DEMARCACION)[\s\n\r]*(DE)[\s\n\r]*(CARRETERAS)[\s\n\r]*(DEL)[\s\n\r]*(ESTADO EN)[\s\n\r]*(CATALUÑA|CASTILLA-LA MANCHA) in other languages, but the limitation through lines in decipher doesn´t allow us to succed with it.
Please, if you have an idea to get through this issue, it will help a lot. I have seen other post regarding this problem, but it hasn´t help at all in our case.
Best regardas, have a nice day!
------------------------------
Arturo Garcia
------------------------------
4 REPLIES 4
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
15-12-22 09:43 AM
Hi Arturo,
Is the phrase always near the heading "DIRECCION GENERAL DE CERRETERAS", as this may also be useful without the regex requirement?
Your regex also doesn't appear to use the correct characters e.g. "DEMARCACION" should be "DEMARCACIÓN". Have you tried it this way?
Thanks
------------------------------
Ben Lyons
Senior Product Specialist - Decipher
Blue Prism
UK based
------------------------------
Is the phrase always near the heading "DIRECCION GENERAL DE CERRETERAS", as this may also be useful without the regex requirement?
Your regex also doesn't appear to use the correct characters e.g. "DEMARCACION" should be "DEMARCACIÓN". Have you tried it this way?
Thanks
------------------------------
Ben Lyons
Senior Product Specialist - Decipher
Blue Prism
UK based
------------------------------
Ben Lyons
Principal Product Specialist - Decipher
SS&C Blue Prism
UK based
Principal Product Specialist - Decipher
SS&C Blue Prism
UK based
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
15-12-22 09:51 AM
Hello Ben,
thank you for the reply.
Actyally yes, "DIRECCION GENERAL DE CERRETERAS" is allways near the phrase.
We have tried "DEMARCACIÓN" and all kind of possibilities. We have also defined less restrictive Regex and it looks like decipher is not able to extract info that is divided in lines...
For example, If we create an image that has to lines:
"hello
world"
And we define a Regex that accepts all characters (spaces an line jumps also) and words, it only gives us the word "hello".... not both.
Waiting your response... Thank you.
------------------------------
Arturo Garcia
------------------------------
thank you for the reply.
Actyally yes, "DIRECCION GENERAL DE CERRETERAS" is allways near the phrase.
We have tried "DEMARCACIÓN" and all kind of possibilities. We have also defined less restrictive Regex and it looks like decipher is not able to extract info that is divided in lines...
For example, If we create an image that has to lines:
"hello
world"
And we define a Regex that accepts all characters (spaces an line jumps also) and words, it only gives us the word "hello".... not both.
Waiting your response... Thank you.
------------------------------
Arturo Garcia
------------------------------
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
15-12-22 09:57 AM
Hi Arturo,
Hmm, that shouldn't be a problem, I've had success with using multi-line regex.
I assume you've seen this thread where I demo it's possible? https://community.blueprism.com/discussion/bug-using-regex-in-format-expression#bma20a0e34-b004-41b6-8465-07818380d4cd
Thanks
------------------------------
Ben Lyons
Senior Product Specialist - Decipher
Blue Prism
UK based
------------------------------
Hmm, that shouldn't be a problem, I've had success with using multi-line regex.
I assume you've seen this thread where I demo it's possible? https://community.blueprism.com/discussion/bug-using-regex-in-format-expression#bma20a0e34-b004-41b6-8465-07818380d4cd
Thanks
------------------------------
Ben Lyons
Senior Product Specialist - Decipher
Blue Prism
UK based
------------------------------
Ben Lyons
Principal Product Specialist - Decipher
SS&C Blue Prism
UK based
Principal Product Specialist - Decipher
SS&C Blue Prism
UK based
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
15-12-22 10:04 AM
I think a newline is missing after ESTADO, each example shows EN at the start of a new line.
(DEMARCACION)[\s\n\r]*(DE)[\s\n\r]*(CARRETERAS)[\s\n\r]*(DEL)[\s\n\r]*(ESTADO EN)[\s\n\r]*(CATALUÑA|CASTILLA-LA MANCHA)
(DEMARCACION)[\s\n\r]*(DE)[\s\n\r]*(CARRETERAS)[\s\n\r]*(DEL)[\s\n\r]*(ESTADO)[\s\n\r]*(EN)[\s\n\r]*(CATALUÑA|CASTILLA-LA MANCHA)
------------------------------
Ben Lyons
Senior Product Specialist - Decipher
Blue Prism
UK based
------------------------------
(DEMARCACION)[\s\n\r]*(DE)[\s\n\r]*(CARRETERAS)[\s\n\r]*(DEL)[\s\n\r]*(ESTADO EN)[\s\n\r]*(CATALUÑA|CASTILLA-LA MANCHA)
(DEMARCACION)[\s\n\r]*(DE)[\s\n\r]*(CARRETERAS)[\s\n\r]*(DEL)[\s\n\r]*(ESTADO)[\s\n\r]*(EN)[\s\n\r]*(CATALUÑA|CASTILLA-LA MANCHA)
------------------------------
Ben Lyons
Senior Product Specialist - Decipher
Blue Prism
UK based
------------------------------
Ben Lyons
Principal Product Specialist - Decipher
SS&C Blue Prism
UK based
Principal Product Specialist - Decipher
SS&C Blue Prism
UK based
