Hey Tom,
Sorry couldn't reply you sooner as I got stuck with some other work at my end. However, I went through the data you provided and came up with a solution based on that. However, it's bit complex workflow though I will try to explain to my best. First let's break our use case into two parts.
1) In the data there are few delimiters which may or may not occur which we are currently identifying as '&' and '<d>'. (
This is comparatively easy to play around !!)
2) Again, we see some repeating names as well which are separated by spaces which is bit complicated to handle if no unique instances can be found before hand which means if let say I have a value called 'Catherine Jane De Roos Miss Ashley Schofield'. Now in order for the solution to work we must have a unique row before hand saying either 'Catherine Jane De Roos' or 'Miss Ashley Schofield' so that we can go ahead and get both of these values separately. If these unique values are not there in any of the pre-existing rows then for us to segregate these two names is practically not possible since we logically can't separate a name via spaces or determine the actual name by just looking at it. Some kind of a lookup needs to be in place before hand. (
This one practically depends on the data we get, but the catch is if we can get unique values by splitting the data as we see in case 1, first then we can surely derive some logic to separate the names!)
So first I will show you the result of the solution that I got:
Inputs:
Outputs:
Pretty much what we want to see right?
Now diving into the solution, the first half of the solution looks something like below:
1) Here, I first loop through the input collection where I got a single column of the Named Occupants with me, in my case I call it '
Output Collection'. Please don't go by the name of the collection and consider it as the input of this solution only.
2) Within the loop I first use the 'Split Text' action and provide the input as the input collection name, the field name I want to split and '&' character as the character to be considered for splitting.
3) I store the value in 'Split Values' collection. Next, I check if the row count of the collection is greater than 1, if yes then loop through the 'Split Values' collection and check if the current named occupant exist in the 'Temp Collection'.
4) This 'Temp Collection' we would create at the design time having the same field structure as our input collection ('Named Occupants' field of text type) with no rows or data. If the current Split Value exists in the 'Temp Collection' skip the iteration and check for the next value in 'Split Collection' otherwise we add the row to 'Temp Collection' and set the current value of 'Split Values' collection to the added row field of the 'Temp Collection'. Once the iteration in 'Split Values' complete we proceed to the next iteration for the 'Output Collection'.
5) If the row count of 'Split Collection' for '&' operator was not greater than 1, we would move with the next Split character '<d>' and repeat the same steps as above.
All these steps will continue till we process for all rows in the Output Collection and finally we would call the 'Keep Unique Values In Collection' action which I shared with you at the very beginning of this thread to remove any duplicates found.
Now you get something like this in your 'Temp Collection';
So you can see we almost got it, however due to the second issue that I mentioned at the starting of this post there are two rows with unwanted name of another row in this collection.In order to get rid of the second problem, now we would apply the second part of the solution: as shown below:
6) In this part of the solution, I would iterate through the '
Temp Collection' and first get the count of our output collection for the solution which is '
Final Collection'. This collection also will have the same schema as the '
Temp Collection' ('Named Occupants' field of text type) but will not have any rows or data. If the count is zero, it means that it is the very first record we are checking for '
Final Collection' so simply add a row to this collection and set the 'Named Occupant' field value of '
Temp Collection' to the newly added row of the '
Final Collection' and we proceed with the next value of the '
Temp Collection'
7) Now if the row count of the '
Final Collection' is not zero (
From second iteration of 'Temp Collection' we will always have this condition as True), we will iterate through the '
Final Collection' and will retrieve the length of the current 'Named Occupant' field from the '
Final Collection' and '
Temp Collection'.
8) Now we will be first checking which of the length is greater among the two so that we can decide how to use the InStr function to check the presence of the 'Named Occupant' field in either of the collection. If the length of 'Named Occupant' field in '
Final Collection' (I have mentioned this as 'Inner Occupant' in my code) is greater than the length of 'Named Occupant' field in '
Temp Collection' (I have mentioned this as 'Outer Occupant' in my code) then we would check if the Inner Occupant is included within the Outer Occupant by using the following expression:
InStr([Final Collection.Named Occupant],[Temp Collection.Named Occupant]) > 0
If this expression is True then we can say that the Inner Occupant is included within the Outer Occupant and we would remove the Outer Occupant from the Inner Occupant using the following expression:
Replace([Final Collection.Named Occupant],[Temp Collection.Named Occupant], "")
and then add a new row to the '
Final Collection' and set the replaced value to the '
Final Collection'. We would then remove the empty rows from the '
Final Collection' and proceed with the next iteration in the '
Temp Collection'. In case above expression was False, then we would skip the current iteration for '
Final Collection' and move with the next row in the '
Final Collection'.
9) If the length of 'Named Occupant' field in '
Final Collection' is not greater than the length of 'Named Occupant' field in '
Temp Collection' as mentioned in Step 8, then we would check if the Outer Occupant is included within the Inner Occupant by using the following expression:
InStr([Temp Collection.Named Occupant],[Final Collection.Named Occupant]) > 0
If this expression is True then we can say that the Outer Occupant is included within the Inner Occupant and we would remove the Inner Occupant from the Outer Occupant using the following expression:
Replace([Temp Collection.Named Occupant],[Inner Collection.Named Occupant], "")
and then add a new row to the '
Final Collection' and set the replaced value to the '
Final Collection'. We would then remove the empty rows from the '
Final Collection' and proceed with the next iteration in the '
Temp Collection'. In case above expression was False, then we would skip the current iteration for '
Final Collection' and move with the next row in the '
Final Collection'.
The variables that are used in the solution are:
I hope my explanation was clear as I can't share the bprelease file directly, though please take your time and go through it. Do let me know if you have any questions. Would be glad to help out!
---------------------------------------------------------------------------------------------------------------------------------------
Hope this helps you out and if so, please mark the current thread as the 'Answer', so others can refer to the same for reference in future.
Regards,
Devneet Mohanty,
SS&C Blueprism Community MVP 2024,
Automation Architect,
Wonderbotz India Pvt. Ltd.