Hi Harsh,
According to your use case, the best way to go around this is to use Regular Expressions. For extracting keywords having pdf, doc and docx I would suggest using separate regular expressions instead of a single regular expression. Please find the below regular expressions which you can use:
-
For PDF : (?>docx|doc|\b)(?<PDF>\w+\.pdf)
-
For DOC : (?>pdf|docx|\b)(?<DOC>\w+\.doc)(?!x)
-
For DOCX : (?>pdf|doc|\b)(?<DOCX>\w+\.docx)
Now in order to use these regular expressions, I would suggest using the '
Extract Regex Values' action from '
Utility - Strings' VBO. In order to use this VBO, you first need to have a regular expression ready which we have prepared above.
Next, we require a target string which is the text from where the keywords need to be extracted which in our case is:
L1234ty.pdfL1244re.docxL1221ytr.doc
In the end, we also require a collection which has two columns by default called as '
Name' and '
Value' both of the type '
Text'. You can name this collection anything you want in my case I have named it as
'RegexReturn'. This collection must have the name of the group defined when you are creating it as an Initial value. In our case, since we are working on the Regular Expression for PDF first we will be using the group name that is highlighted in our regular expression :
(?>docx|doc|\b)(?<PDF>\w+\.pdf)
which is '
PDF'.
Below you can find the screenshot of how this collection you need to create:
Once the collection is ready, assign the same collection as the output of the action, '
Extract Regex Values' as well. The action will look something like this:
Upon successfully executing the action, you can see that we get the Value field populated against the Group name that we set:
You need to repeat the same steps in order to create two more similar actions for the keywords DOC and DOCX where the only change would be in the regular expression and the group name. Also, for these two actions you need to create two separate collections where the respective group name value will be stored.
NOTE: However, there is a shortcoming with this solution that has been provided by default which is in case the file name PDF is occurring multiple times, then only the first occurrence is extracted. For example, if the test string would have been: L1234ty.pdfL1244re.docxL1221ytr.docL123.pdf . Here you can see that there are two files with the name ending with .pdf, so in this case using the above approach you will only be able to extract L123ty.pdf which occurs first.
Hence, I would only suggest to use this action directly if the keyword for PDF will only occur once which can be seen in the original test string provided by you.
Enhanced Solution:If you want multiple occurrences to be picked up, you can replace the extracted text in the original text and then again run this action. So this action will basically run in an iterative loop where you first extract the text, then check if the text extracted is blank or not. If it is not blank, it means that the text was available in the current iteration so you can keep storing the extracted value in some other result collection and then replace the extracted text in the original text with a blank values. If in any iteration, you get the extracted text value as blank, then it means no more occurrences are available and then you can simply exit the iterative loop.
The same steps will be followed separately for each keyword value. The overall workflow for the enhanced solution will look something like this:
------------------------------
----------------------------------
Hope it helps you and if it resolves you query please mark it as the best answer so that others having the same problem can track the answer easily
Regards,
Devneet Mohanty
Intelligent Process Automation Consultant
Blue Prism 7x Certified Professional
Website:
https://devneet.github.io/Email: devneetmohanty07@gmail.com
----------------------------------
------------------------------
---------------------------------------------------------------------------------------------------------------------------------------
Hope this helps you out and if so, please mark the current thread as the 'Answer', so others can refer to the same for reference in future.
Regards,
Devneet Mohanty,
SS&C Blueprism Community MVP 2024,
Automation Architect,
Wonderbotz India Pvt. Ltd.