<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic RE: Data Extraction from Text in Product Forum</title>
    <link>https://community.blueprism.com/t5/Product-Forum/Data-Extraction-from-Text/m-p/51815#M6801</link>
    <description>&lt;P&gt;Hey &lt;a href="https://community.blueprism.com/t5/user/viewprofilepage/user-id/42567"&gt;@ChakkravarthiPR&lt;/a&gt;,&lt;BR /&gt;&lt;BR /&gt;You can create a custom code in the "Code" stage for the same (C# VB.net).&lt;BR /&gt;&lt;BR /&gt;I had used regex to create groups from the result of the string I was getting.&lt;BR /&gt;&lt;BR /&gt;The only condition you require for creating the regex is the pattern of the input string should always remain the same.&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;You can use the website &lt;A href="https://regex101.com/"&gt;Regex101.com&lt;/A&gt; for creating the regex pattern.&amp;nbsp;&lt;BR /&gt;(Note you can also eliminate&amp;nbsp; the unrequired characters which are coming in your input string)&lt;BR /&gt;&lt;BR /&gt;Regards&lt;/P&gt;
&lt;P&gt;​&lt;/P&gt;&lt;BR /&gt;&lt;BR /&gt;------------------------------&lt;BR /&gt;Rushabh Dedhia&lt;BR /&gt;Senior Consultant - Team Lead&lt;BR /&gt;WonderBotz LLC&lt;BR /&gt;Ahmedabad&lt;BR /&gt;+91 9428860307&lt;BR /&gt;------------------------------&lt;BR /&gt;</description>
    <pubDate>Thu, 19 May 2022 11:31:00 GMT</pubDate>
    <dc:creator>RushabhDedhia</dc:creator>
    <dc:date>2022-05-19T11:31:00Z</dc:date>
    <item>
      <title>Data Extraction from Text</title>
      <link>https://community.blueprism.com/t5/Product-Forum/Data-Extraction-from-Text/m-p/51814#M6800</link>
      <description>&lt;P&gt;Hi Everyone,&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;I have a data item(Data type is TEXT) that is extracted from PDF, so the text after extracting is like below:&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Maverick Sample (May 12, 2022, 13:29 CDT) Maverick Sample&lt;/P&gt;
&lt;P&gt;Maverick&lt;/P&gt;
&lt;P&gt;Sample&lt;/P&gt;
&lt;P&gt;04/30/2022&lt;/P&gt;
&lt;P&gt;1 2 3 4 5 6 7 8&lt;/P&gt;
&lt;P&gt;4&lt;/P&gt;
&lt;P&gt;111 AAA st 1111111111&lt;/P&gt;
&lt;P&gt;London YY 78979&lt;/P&gt;
&lt;P&gt;sample@email.com&lt;/P&gt;
&lt;P&gt;4&lt;/P&gt;
&lt;P&gt;4&lt;/P&gt;
&lt;P&gt;05/12/2022&lt;BR /&gt;&lt;BR /&gt;&lt;STRONG&gt;I need to extract and save it to a data item as follows:&amp;nbsp;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Signature: Maverick Sample (May 12, 2022 13:29 CDT) Maverick Sample&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;First Name:&lt;/STRONG&gt; Maverick&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Last Name:&lt;/STRONG&gt; Sample&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Date:&lt;/STRONG&gt; 04/30/2022&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Employee ID:&lt;/STRONG&gt; 1 2 3 4 5 6 7 8&lt;/P&gt;
&lt;P&gt;4&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Address: &lt;/STRONG&gt;111 AAA st&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Phone No:&lt;/STRONG&gt; 1111111111&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;City:&lt;/STRONG&gt; London&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;State:&lt;/STRONG&gt; YY&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Zip Code:&lt;/STRONG&gt; 78979&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Email:&lt;/STRONG&gt; sample@email.com&lt;/P&gt;
&lt;P&gt;4&lt;/P&gt;
&lt;P&gt;4&lt;/P&gt;
&lt;P&gt;Separation Date: 05/12/2022&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;Here 4 4 4 appearing between the lines come from PDF after considering the file as text. Any help will be highly appreciable.&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;a href="https://community.blueprism.com/t5/user/viewprofilepage/user-id/1843"&gt;@devneetmohanty07&lt;/a&gt; FYI&lt;BR /&gt;&lt;BR /&gt;Thanks!&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;
​&lt;BR /&gt;&lt;BR /&gt;------------------------------&lt;BR /&gt;Chakkravarthi PR&lt;BR /&gt;------------------------------&lt;BR /&gt;</description>
      <pubDate>Thu, 19 May 2022 09:18:00 GMT</pubDate>
      <guid>https://community.blueprism.com/t5/Product-Forum/Data-Extraction-from-Text/m-p/51814#M6800</guid>
      <dc:creator>ChakkravarthiPR</dc:creator>
      <dc:date>2022-05-19T09:18:00Z</dc:date>
    </item>
    <item>
      <title>RE: Data Extraction from Text</title>
      <link>https://community.blueprism.com/t5/Product-Forum/Data-Extraction-from-Text/m-p/51815#M6801</link>
      <description>&lt;P&gt;Hey &lt;a href="https://community.blueprism.com/t5/user/viewprofilepage/user-id/42567"&gt;@ChakkravarthiPR&lt;/a&gt;,&lt;BR /&gt;&lt;BR /&gt;You can create a custom code in the "Code" stage for the same (C# VB.net).&lt;BR /&gt;&lt;BR /&gt;I had used regex to create groups from the result of the string I was getting.&lt;BR /&gt;&lt;BR /&gt;The only condition you require for creating the regex is the pattern of the input string should always remain the same.&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;You can use the website &lt;A href="https://regex101.com/"&gt;Regex101.com&lt;/A&gt; for creating the regex pattern.&amp;nbsp;&lt;BR /&gt;(Note you can also eliminate&amp;nbsp; the unrequired characters which are coming in your input string)&lt;BR /&gt;&lt;BR /&gt;Regards&lt;/P&gt;
&lt;P&gt;​&lt;/P&gt;&lt;BR /&gt;&lt;BR /&gt;------------------------------&lt;BR /&gt;Rushabh Dedhia&lt;BR /&gt;Senior Consultant - Team Lead&lt;BR /&gt;WonderBotz LLC&lt;BR /&gt;Ahmedabad&lt;BR /&gt;+91 9428860307&lt;BR /&gt;------------------------------&lt;BR /&gt;</description>
      <pubDate>Thu, 19 May 2022 11:31:00 GMT</pubDate>
      <guid>https://community.blueprism.com/t5/Product-Forum/Data-Extraction-from-Text/m-p/51815#M6801</guid>
      <dc:creator>RushabhDedhia</dc:creator>
      <dc:date>2022-05-19T11:31:00Z</dc:date>
    </item>
    <item>
      <title>RE: Data Extraction from Text</title>
      <link>https://community.blueprism.com/t5/Product-Forum/Data-Extraction-from-Text/m-p/51816#M6802</link>
      <description>In DX there is a VBO with RegEx functionalities that will save u having to create a Custom VBO for RegEx handling. I had started my own VBO and ended up replacing it for the one below.&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;&lt;A href="https://digitalexchange.blueprism.com/dx/entry/3593/solution/avoregex" target="test_blank"&gt;https://digitalexchange.blueprism.com/dx/entry/3593/solution/avoregex&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;------------------------------&lt;BR /&gt;Ramón Requena López&lt;BR /&gt;RPA Developer&lt;BR /&gt;Magenta Telekom&lt;BR /&gt;------------------------------&lt;BR /&gt;</description>
      <pubDate>Thu, 19 May 2022 12:27:00 GMT</pubDate>
      <guid>https://community.blueprism.com/t5/Product-Forum/Data-Extraction-from-Text/m-p/51816#M6802</guid>
      <dc:creator>RamónRequena_L1</dc:creator>
      <dc:date>2022-05-19T12:27:00Z</dc:date>
    </item>
    <item>
      <title>RE: Data Extraction from Text</title>
      <link>https://community.blueprism.com/t5/Product-Forum/Data-Extraction-from-Text/m-p/51817#M6803</link>
      <description>Use this.&lt;BR /&gt;&lt;BR /&gt;Please note it is customised as per the text provided by you any change in the text pattern will throw exception.&lt;BR /&gt;&lt;BR /&gt;add System.IO in code option of main page&lt;BR /&gt;&lt;BR /&gt;------------------------------&lt;BR /&gt;Atyant Srivastava&lt;BR /&gt;Team lead&lt;BR /&gt;Personal&lt;BR /&gt;Asia/Kolkata&lt;BR /&gt;------------------------------&lt;BR /&gt;</description>
      <pubDate>Thu, 19 May 2022 12:46:00 GMT</pubDate>
      <guid>https://community.blueprism.com/t5/Product-Forum/Data-Extraction-from-Text/m-p/51817#M6803</guid>
      <dc:creator>AtyantSrivastav</dc:creator>
      <dc:date>2022-05-19T12:46:00Z</dc:date>
    </item>
    <item>
      <title>RE: Data Extraction from Text</title>
      <link>https://community.blueprism.com/t5/Product-Forum/Data-Extraction-from-Text/m-p/51818#M6804</link>
      <description>&lt;P&gt;@Chakkravarthi&amp;nbsp;PR,&lt;BR /&gt;&lt;BR /&gt;You could also use BP's Utility - Strings VBO which has a "Regex Replace" action which can do the trick?&lt;/P&gt;
&lt;P&gt;As &lt;A class="user-content-mention" data-sign="@" data-contactkey="b8926dc9-0781-4bb7-a0a0-d58aa97139db" data-tag-text="@Rushabh Dedhia" href="https://community.blueprism.com/network/profile?UserKey=b8926dc9-0781-4bb7-a0a0-d58aa97139db" data-itemmentionkey="66388ef3-4272-4587-aea2-c2cd8baf8aae"&gt;@Rushabh Dedhia&lt;/A&gt;&amp;nbsp; pointed out, you should be sure that the PDF will always produce that data you are looking for. If the data is produced in a PDF from a form with proper validation, this usually is not a problem. If the data is not in that format for a minority of cases, you can always do a quick check ("Test Regex Match" action) to see if the Regex pattern matches and throw an exception if it doesn't.&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;
&lt;DIV class="media" style="overflow: hidden; zoom: 1;"&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="20462.png"&gt;&lt;img src="https://community.blueprism.com/t5/image/serverpage/image-id/20607i8446F9F5FE3E8C4A/image-size/large?v=v2&amp;amp;px=999" role="button" title="20462.png" alt="20462.png" /&gt;&lt;/span&gt;&lt;BR /&gt;&lt;STRONG&gt;&lt;STRONG&gt;Before&lt;BR /&gt;&lt;/STRONG&gt;&lt;/STRONG&gt;
&lt;DIV class="media" style="overflow: hidden; zoom: 1;"&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="20463.png"&gt;&lt;img src="https://community.blueprism.com/t5/image/serverpage/image-id/20606i6FB9E9893716C857/image-size/large?v=v2&amp;amp;px=999" role="button" title="20463.png" alt="20463.png" /&gt;&lt;/span&gt;&lt;BR /&gt;&lt;STRONG&gt;&lt;STRONG&gt;After&lt;BR /&gt;&lt;/STRONG&gt;&lt;/STRONG&gt;
&lt;DIV class="media" style="overflow: hidden; zoom: 1;"&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="20464.png"&gt;&lt;img src="https://community.blueprism.com/t5/image/serverpage/image-id/20609i8FBEA0F1F24C7C4B/image-size/large?v=v2&amp;amp;px=999" role="button" title="20464.png" alt="20464.png" /&gt;&lt;/span&gt;&lt;BR /&gt;&lt;STRONG&gt;Search Pattern:&lt;/STRONG&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;PRE&gt;(.*)[\r\n]+(.*)[\r\n]+(.*)[\r\n]+(\d{2}/\d{2}/\d{4})[\r\n]+([\d\s]*?)[\r\n]+.*?[\r\n]+(.+)\s(\S+)[\r\n]+(.*)\s(.*)\s(\S+)[\r\n]+(.*\@.*\..*)[\r\n]+.*?[\r\n]+.*?[\r\n]+(\d{2}/\d{2}/\d{4})​&lt;/PRE&gt;
&lt;STRONG&gt;​Replacement&amp;nbsp;Pattern:&lt;/STRONG&gt;
&lt;PRE&gt;Signature: $1&lt;BR /&gt;First Name: $2&lt;BR /&gt;Last Name: $3&lt;BR /&gt;Date: $4&lt;BR /&gt;Employee ID: $5&lt;BR /&gt;Address: $6&lt;BR /&gt;Phone No: $7&lt;BR /&gt;City: $8&lt;BR /&gt;State: $9&lt;BR /&gt;Zip Code: $10&lt;BR /&gt;Email: $11&lt;BR /&gt;Separation Date: $12&lt;/PRE&gt;
​&lt;BR /&gt;&lt;BR /&gt;------------------------------&lt;BR /&gt;Micheal Charron&lt;BR /&gt;Senior Manager&lt;BR /&gt;RBC&lt;BR /&gt;America/Toronto&lt;BR /&gt;------------------------------&lt;BR /&gt;</description>
      <pubDate>Thu, 19 May 2022 13:01:00 GMT</pubDate>
      <guid>https://community.blueprism.com/t5/Product-Forum/Data-Extraction-from-Text/m-p/51818#M6804</guid>
      <dc:creator>MichealCharron</dc:creator>
      <dc:date>2022-05-19T13:01:00Z</dc:date>
    </item>
  </channel>
</rss>

