<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic RE: Bug using regex in format expression in Product Forum</title>
    <link>https://community.blueprism.com/t5/Product-Forum/Bug-using-regex-in-format-expression/m-p/66750#M19355</link>
    <description>Hi Ben,&lt;BR /&gt;&lt;BR /&gt;Thanks for your quick response and your help. I tried the new regex you provided, and it worked.&lt;BR /&gt;&lt;BR /&gt;I still have a question about how the Multiline flag affects extraction.&lt;BR /&gt;Does the flag help Decipher to choose more than one line of data? After training ~20 documents with the Multiline flag Decipher still takes only the first line of the data.&lt;BR /&gt;&lt;BR /&gt;How can I force Decipher to extract always more than one line if I know one specific field is always multiline? One of the fields I want to extract always has three lines. My original idea was to use a regex that includes as many "\n" as I expect the field to have. Any help with this?&lt;BR /&gt;&lt;BR /&gt;In some document types, each line contains different data and I want to keep the "\n" to know which line is which, but in other cases extracting all the data in one line is 100% ok, do you still recommend using the Multiline flag in these cases?&lt;BR /&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="9003.png"&gt;&lt;img src="https://community.blueprism.com/t5/image/serverpage/image-id/9181i4FC5459869D8B999/image-size/large?v=v2&amp;amp;px=999" role="button" title="9003.png" alt="9003.png" /&gt;&lt;/span&gt;&lt;BR /&gt;Thank you very much for your help,&lt;BR /&gt;&lt;BR /&gt;Oroel Ipas.&lt;BR /&gt;&lt;BR /&gt;------------------------------&lt;BR /&gt;Oroel Ipas&lt;BR /&gt;------------------------------&lt;BR /&gt;</description>
    <pubDate>Tue, 15 Nov 2022 11:30:00 GMT</pubDate>
    <dc:creator>OroelIpas</dc:creator>
    <dc:date>2022-11-15T11:30:00Z</dc:date>
    <item>
      <title>Bug using regex in format expression</title>
      <link>https://community.blueprism.com/t5/Product-Forum/Bug-using-regex-in-format-expression/m-p/66747#M19352</link>
      <description>Hi,&lt;BR /&gt;&amp;nbsp;&lt;BR /&gt;I want to show a wrong behavior of decipher when working with regular expressions.&lt;BR /&gt;&lt;BR /&gt;I am working with decipher to extract information from ID documents, so I covered the sensitive information in all the screenshots I attach. Here is one &amp;lt;document example:
&lt;DIV class="media" style="overflow: hidden; zoom: 1;"&gt;
&lt;DIV class="media" style="overflow: hidden; zoom: 1;"&gt;
&lt;DIV class="media" style="overflow: hidden; zoom: 1;"&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="9017.png"&gt;&lt;img src="https://community.blueprism.com/t5/image/serverpage/image-id/9200i13693EABD94CD5B3/image-size/large?v=v2&amp;amp;px=999" role="button" title="9017.png" alt="9017.png" /&gt;&lt;/span&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;BR /&gt;I want to extract the name of the person (all the words covered by white), so the header of the field is "NOMBRE". In order to avoid Decipher to extract the alphanumeric code covered by blue I wrote this Regex: &lt;BR /&gt;&lt;SPAN style="background-color: #f9f2f4; color: #c7254e; font-family: Monaco, Menlo, Consolas, 'Courier New', monospace; font-size: 12.6px;"&gt;([A-ZÁÉÍÓÚÜÑ]+[\n ]+[A-ZÁÉÍÓÚÜÑ]+)([\n ]+[A-ZÁÉÍÓÚÜÑ]+)*&lt;BR /&gt;&lt;/SPAN&gt;The regex makes decipher extract something that has &lt;SPAN style="text-decoration: underline;"&gt;two or more words&lt;/SPAN&gt; (with all the Spanish characters, but not allowing number), separated by spaces or newlines.&lt;BR /&gt;&lt;BR /&gt;As shown in the first screenshot, decipher has not extracted the second line of the name (it is a multiline field), so I manually reshaped the box of the field. After doing this the validation of the field fails and the box turns red even though &lt;STRONG&gt;the data should fix the regex.&lt;/STRONG&gt;&lt;BR /&gt;&lt;BR /&gt;
&lt;DIV class="media" style="overflow: hidden; zoom: 1;"&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="9018.png"&gt;&lt;img src="https://community.blueprism.com/t5/image/serverpage/image-id/9197iEF0A8F3AF66BC824/image-size/large?v=v2&amp;amp;px=999" role="button" title="9018.png" alt="9018.png" /&gt;&lt;/span&gt;&lt;/DIV&gt;
&lt;BR /&gt;The way I found to fix this is:&lt;BR /&gt;
&lt;OL&gt;
&lt;LI&gt;Click inside the field&lt;/LI&gt;
&lt;LI&gt;Modify the data inside (e.g. remove a character)&lt;/LI&gt;
&lt;LI&gt;Click out of the field&lt;/LI&gt;
&lt;LI&gt;Now the field turns green showing the data format is valid&lt;/LI&gt;
&lt;LI&gt;Click inside the field and undo the modification I made (introduce the character I deleted)&lt;/LI&gt;
&lt;/OL&gt;
With these steps &lt;STRONG&gt;&lt;STRONG&gt;the field now contains the same data by decipher can see its format is valid&lt;BR /&gt;&lt;/STRONG&gt;&lt;/STRONG&gt;
&lt;DIV class="media" style="overflow: hidden; zoom: 1;"&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="9019.png"&gt;&lt;img src="https://community.blueprism.com/t5/image/serverpage/image-id/9198iEC7C7B6918FB9310/image-size/large?v=v2&amp;amp;px=999" role="button" title="9019.png" alt="9019.png" /&gt;&lt;/span&gt;&lt;BR /&gt;Can someone explain this behavior?&amp;nbsp;&lt;BR /&gt;It is not a big deal when doing manual data verification, but I see it can become a big problem if there is a bug with the regex when running decipher in autonomous mode.&lt;BR /&gt;&lt;BR /&gt;Thanks in advance for any help&lt;/DIV&gt;&lt;BR /&gt;&lt;BR /&gt;------------------------------&lt;BR /&gt;Oroel Ipas&lt;BR /&gt;------------------------------&lt;BR /&gt;</description>
      <pubDate>Wed, 09 Nov 2022 09:15:00 GMT</pubDate>
      <guid>https://community.blueprism.com/t5/Product-Forum/Bug-using-regex-in-format-expression/m-p/66747#M19352</guid>
      <dc:creator>OroelIpas</dc:creator>
      <dc:date>2022-11-09T09:15:00Z</dc:date>
    </item>
    <item>
      <title>RE: Bug using regex in format expression</title>
      <link>https://community.blueprism.com/t5/Product-Forum/Bug-using-regex-in-format-expression/m-p/66748#M19353</link>
      <description>Hi Oroel,&lt;BR /&gt;&lt;BR /&gt;I've just spent some time looking into this behaviour using similar field. I recall previously having some difficulty with multi-line fields and Regex, but it was possible.&lt;BR /&gt;&lt;BR /&gt;It's due to an interaction between the multi-line flag and Regex for extracting over multiple lines. I deselected the multi-line flag and changed the greedy marker "+" after each of the new line characters to "*" as 0 needs to be an option for it to work.&lt;BR /&gt;&lt;BR /&gt;([A-ZÁÉÍÓÚÜÑ]+[\n ]+[A-ZÁÉÍÓÚÜÑ]+[\n ]*)([\n ]*[A-ZÁÉÍÓÚÜÑ]+)*&lt;BR /&gt;&lt;BR /&gt;&lt;STRONG&gt;Single line field&lt;/STRONG&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="8992.png"&gt;&lt;img src="https://community.blueprism.com/t5/image/serverpage/image-id/9167i2FAE26CF94493370/image-size/large?v=v2&amp;amp;px=999" role="button" title="8992.png" alt="8992.png" /&gt;&lt;/span&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;STRONG&gt;Multiline field&lt;/STRONG&gt;&lt;BR /&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="8993.png"&gt;&lt;img src="https://community.blueprism.com/t5/image/serverpage/image-id/9168i5E0E87394C5E9223/image-size/large?v=v2&amp;amp;px=999" role="button" title="8993.png" alt="8993.png" /&gt;&lt;/span&gt;&lt;BR /&gt;&lt;BR /&gt;Give this a try, if you haven't already, and let me know how you get on. This issue has already been raised with the development team.&lt;BR /&gt;&lt;BR /&gt;Thanks&lt;BR /&gt;&lt;BR /&gt;------------------------------&lt;BR /&gt;Ben Lyons&lt;BR /&gt;Senior Product Specialist - Decipher&lt;BR /&gt;Blue Prism&lt;BR /&gt;UK based&lt;BR /&gt;------------------------------&lt;BR /&gt;</description>
      <pubDate>Wed, 09 Nov 2022 16:26:00 GMT</pubDate>
      <guid>https://community.blueprism.com/t5/Product-Forum/Bug-using-regex-in-format-expression/m-p/66748#M19353</guid>
      <dc:creator>Ben.Lyons1</dc:creator>
      <dc:date>2022-11-09T16:26:00Z</dc:date>
    </item>
    <item>
      <title>RE: Bug using regex in format expression</title>
      <link>https://community.blueprism.com/t5/Product-Forum/Bug-using-regex-in-format-expression/m-p/66749#M19354</link>
      <description>Hi Oroel,&lt;BR /&gt;&lt;BR /&gt;I have an update from the development team on this matter.&lt;BR /&gt;&lt;BR /&gt;There is a deliberate difference how Decipher handles multi-line fields compared with single line fields which affects how the Regex match is used. This is by design and supports the extraction of multi-line fields.&lt;BR /&gt;&lt;BR /&gt;There's a simple change that can be made to your Regex to enable its use with a multi-line field, just by adding "\r" before "\n" as they will appear together.&lt;BR /&gt;&lt;BR /&gt;E.g. &lt;SPAN&gt;([A-ZÁÉÍÓÚÜÑ]+[\r\n ]+[A-ZÁÉÍÓÚÜÑ]+[\r\n ]*)([\r\n ]*[A-ZÁÉÍÓÚÜÑ]+)*&lt;BR /&gt;&lt;BR /&gt;And using the same example from above.&lt;BR /&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="8997.png"&gt;&lt;img src="https://community.blueprism.com/t5/image/serverpage/image-id/9169iD0D144BAB6639484/image-size/large?v=v2&amp;amp;px=999" role="button" title="8997.png" alt="8997.png" /&gt;&lt;/span&gt;&lt;BR /&gt;Let me know how you get on.&lt;BR /&gt;&lt;BR /&gt;Thanks&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;------------------------------&lt;BR /&gt;Ben Lyons&lt;BR /&gt;Senior Product Specialist - Decipher&lt;BR /&gt;Blue Prism&lt;BR /&gt;UK based&lt;BR /&gt;------------------------------&lt;BR /&gt;</description>
      <pubDate>Fri, 11 Nov 2022 14:04:00 GMT</pubDate>
      <guid>https://community.blueprism.com/t5/Product-Forum/Bug-using-regex-in-format-expression/m-p/66749#M19354</guid>
      <dc:creator>Ben.Lyons1</dc:creator>
      <dc:date>2022-11-11T14:04:00Z</dc:date>
    </item>
    <item>
      <title>RE: Bug using regex in format expression</title>
      <link>https://community.blueprism.com/t5/Product-Forum/Bug-using-regex-in-format-expression/m-p/66750#M19355</link>
      <description>Hi Ben,&lt;BR /&gt;&lt;BR /&gt;Thanks for your quick response and your help. I tried the new regex you provided, and it worked.&lt;BR /&gt;&lt;BR /&gt;I still have a question about how the Multiline flag affects extraction.&lt;BR /&gt;Does the flag help Decipher to choose more than one line of data? After training ~20 documents with the Multiline flag Decipher still takes only the first line of the data.&lt;BR /&gt;&lt;BR /&gt;How can I force Decipher to extract always more than one line if I know one specific field is always multiline? One of the fields I want to extract always has three lines. My original idea was to use a regex that includes as many "\n" as I expect the field to have. Any help with this?&lt;BR /&gt;&lt;BR /&gt;In some document types, each line contains different data and I want to keep the "\n" to know which line is which, but in other cases extracting all the data in one line is 100% ok, do you still recommend using the Multiline flag in these cases?&lt;BR /&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="9003.png"&gt;&lt;img src="https://community.blueprism.com/t5/image/serverpage/image-id/9181i4FC5459869D8B999/image-size/large?v=v2&amp;amp;px=999" role="button" title="9003.png" alt="9003.png" /&gt;&lt;/span&gt;&lt;BR /&gt;Thank you very much for your help,&lt;BR /&gt;&lt;BR /&gt;Oroel Ipas.&lt;BR /&gt;&lt;BR /&gt;------------------------------&lt;BR /&gt;Oroel Ipas&lt;BR /&gt;------------------------------&lt;BR /&gt;</description>
      <pubDate>Tue, 15 Nov 2022 11:30:00 GMT</pubDate>
      <guid>https://community.blueprism.com/t5/Product-Forum/Bug-using-regex-in-format-expression/m-p/66750#M19355</guid>
      <dc:creator>OroelIpas</dc:creator>
      <dc:date>2022-11-15T11:30:00Z</dc:date>
    </item>
    <item>
      <title>RE: Bug using regex in format expression</title>
      <link>https://community.blueprism.com/t5/Product-Forum/Bug-using-regex-in-format-expression/m-p/66751#M19356</link>
      <description>Hi Oroel,&lt;BR /&gt;&lt;BR /&gt;The multi-line flag mostly changes how it's displayed in the verification screen and maintains the line breaks in the export. I don't believe it has a significant effect on the training, at least not more so than the region selection by the user.&lt;BR /&gt;&lt;BR /&gt;I'm not sure there is a way to specify a number of lines or force it in any way. Though it might be worth trying to train 4 separate fields and combine them in a 5th field with a formula. Before doing this, export your training data and keep it somewhere safe, then delete the training data in the app as this will speed up the new training.&lt;BR /&gt;&lt;BR /&gt;Thanks&lt;BR /&gt;&lt;BR /&gt;------------------------------&lt;BR /&gt;Ben Lyons&lt;BR /&gt;Senior Product Specialist - Decipher&lt;BR /&gt;Blue Prism&lt;BR /&gt;UK based&lt;BR /&gt;------------------------------&lt;BR /&gt;</description>
      <pubDate>Tue, 15 Nov 2022 14:43:00 GMT</pubDate>
      <guid>https://community.blueprism.com/t5/Product-Forum/Bug-using-regex-in-format-expression/m-p/66751#M19356</guid>
      <dc:creator>Ben.Lyons1</dc:creator>
      <dc:date>2022-11-15T14:43:00Z</dc:date>
    </item>
  </channel>
</rss>

