Remove duplicates from collection with key field
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
17-07-20 08:57 AM
I have an action that will remove duplicates but this only works if everything in the row is duplicated.
InputCollection.AcceptChanges() OutputCollection = InputCollection.Defaultview.ToTable(True) InputCollection = Nothing
Is there a way to modify this so that i can specify a column to be used to check for the duplicate entry and then remove the row if it is leaving just unique entries?
------------------------------
James Gregory
RPA Developer
Europe/London
------------------------------
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
17-07-20 04:46 PM
Dim v As New System.Data.DataView(Collection_In)
Collection_Out = v.ToTable(True, Column_Names_CSV.Split(","))
On the other hand, this technique is also a handy way to get a subset of columns from a collection.
------------------------------
John Carter
Professional Services
Blue Prism
------------------------------
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
20-07-20 08:50 AM
I don't think I explained properly what the issue is.
There is a form whereby the submit button doesn't disable when pressed and so duplicate requests are being received.
Each request has a unique ID but the reference number and name will be the same.
I cant use the standard "Remove Duplicates" as the ID is different for each request:
ID | Ref | Name
1 | abc | John
2 | abc | John
3 | xyz | Mike
So I want to have something where I can same, remove duplicates where Ref is not unique which would output the collections as:
ID | Ref | Name
1 | abc | John
3 | xyz | Mike
------------------------------
James Gregory
RPA Developer
Europe/London
------------------------------
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
20-07-20 01:19 PM
I've had a quick play around with the existing Utility - Collection Manipulation and have modified the Remove Null Rows action to do what I think you are after. Here's the code:
'Set local variables dim iRow as integer dim count as integer dim rows as integer Try 'Initialise row counts and prepopulate output table rows = Collection_In.Rows.Count - 1 Collection_Out = Collection_In iRow = 0 'Loop through each row in the input collection Do While iRow < rows 'Reset counter count = 0 'Select all rows in the output collection which match the specified value in the input collection For Each row2 As DataRow in Collection_Out.Select( String.Format("[{0}] ='{1}' or trim([{0}]) = '{1}'", Key_Field,Collection_In.Rows(iRow).Item(Key_Field))) 'If this row is the first found, move on to the next one, otherwise delete it If count > 0 then 'Delete duplicated row row2.Delete() 'Reduce the total number of rows in the output collection rows = rows - 1 End If 'Increase the count of duplicates count = count + 1 Next 'Increase the count of rows being checked iRow = iRow + 1 Loop Success = True Catch ex as Exception Error_Message = ex.ToString() Success = False End Try
The basic premise is to copy the input collection to the output collection, then loop through the input collection and search the output for duplicates. Then, delete any duplicates from the output and adjust the count values to suit. I've tested this against a small dataset and it seems to work as expected. I've also attached the full object (including my test dataset in the action) to set you on your way.
Hopefully this helps, but feel free to ask for clarification if anything isn't clear or quite right!
Kind regards,
------------------------------
Sam Stone
CoE Build Lead
Blue Prism
------------------------------
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
17-02-22 02:28 PM
Thanks for helping and taking this on in the first place 🙂
System.Data.SyntaxErrorException: Syntax error: Missing operand after 's' operator.
at System.Data.ExpressionParser.Parse()
at System.Data.DataExpression..ctor(DataTable table, String expression, Type type)
at System.Data.DataTable.Select(String filterExpression)
------------------------------
Bernadette Stroeder
S&P Global
------------------------------
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
20-02-22 10:13 AM
Are you using the code above?
It's looks like a problem with this line:
For Each row2 As DataRow in Collection_Out.Select( String.Format("[{0}] ='{1}' or trim([{0}]) = '{1}'", Key_Field,Collection_In.Rows(iRow).Item(Key_Field)))
Do you have any " ' " character in your name field?
You are concatenating, so, if any of the values that you are going to replace with the String.Format() function contains " ' " you will break the SQL Syntax, for example:
"[field] ='value's' or trim([field]) = 'value's'"
Let us know if this solves the problem.
Hope this helps you!
See you in the community,
Bye 🙂
------------------------------
Pablo Sarabia
Architect
Altamira Assets Management
Madrid
634726270
------------------------------
