cancel
Showing results for 
Search instead for 
Did you mean: 

Remove duplicate rows based on 3 column names

AmruthaSivaraja
Level 4

Hi All,

Can anyone help me on custom code to remove duplicate rows from a collection based on 3 column names. The input collection has 6 columns in total. i wanted to remove duplicates based on 3 columns.

Thanks in Advance



------------------------------
Amrutha Sivarajan
------------------------------
1 BEST ANSWER

Best Answers

Mukeshh_k
MVP

Hello Amrutha Sivarajan,

You can add an action in one of your extended objects for collection manipulations, make sure you have correct external references for Dll and their Namespaces listed in Initialise stage, please follow below approach:

  1. Add an action in existing collection manipulation extended objects or create a new object if not already.13875.png13876.png13877.png
  2. Write Code as below to act on 3 column based duplicate removal in collections:

    System.GC.Collect()
    'filteredCollection=RawData.clone()
    'filteredCollection=RawData.DefaultView.ToTable(true, columnToCheck)
    dim count as integer=rawData.rows.count-1
    dim i as integer =0
    dim rowIndex as integer
    for  rowIndex = rawData.rows.count-2 to 0 step rowIndex-1
        if rawData.rows(rowIndex).item(columnToCheck1)=rawData.rows(rowIndex+1).item(columnToCheck1) and 
            rawData.rows(rowIndex).item(columnToCheck2)=rawData.rows(rowIndex+1).item(columnToCheck2) and 
            rawData.rows(rowIndex).item(columnToCheck3)=rawData.rows(rowIndex+1).item(columnToCheck3)

            rawData.rows(rowIndex+1).delete
        end if
    next
    filteredCollection=rawData.copy()


    13878.png
  3. Check required Dll and Namespaces -if you find any errors (Snippet below)13879.png

Let me know if you find any difficulties implementing this.



------------------------------
Kindly up vote this as "Best Answer" if it adds value or resolves your query in anyway possible, happy to help.

Regards,

Mukesh Kumar - Senior Automation Developer

NHS England, United Kingdom, GB
------------------------------
Regards,

Mukesh Kumar

View answer in original post

6 REPLIES 6

LakshmiNarayan3
Level 6

Hi Amrutha

Can you try the below code 

Dim dt As DataTable = [your DataTable]
Dim columnNames() As String = inputColumnName.Split(","c) 'replace with your input parameter that contains the column names
Dim distinctDT As DataTable = dt.DefaultView.ToTable(True, columnNames)

above code from chat gpt and also refer below thread for better understanding

Delete Repetitive Values in a collection or Excel | Blue Prism Product

Hope this helps

Regards

Lakshmi Narayana



------------------------------
Lakshmi Narayana
------------------------------

Hi Amrutha,

Avoidance of duplicates can be achieved in two ways:

  • Removing already existing duplicate values 
  • Avoiding getting in duplicate values 

While Lakshmi describes how to tackle the first method, you may also consider to have a look at the second method. 

If your collection data was derived from an XL, you might consider extracting the data using OLEDB using an SQL DISTINCT statement that, if constructed correctly, will only populate the collection with unique data. 

If you data comes from other sources, you might consider checking if a row to be added to the collection perhaps already exists there. And then not adding it if it appears to be a duplicate.



------------------------------
Happy coding!
---------------
Paul
Sweden
------------------------------
Happy coding!
Paul, Sweden
(By all means, do not mark this as the best answer!)

Mukeshh_k
MVP

Hello Amrutha Sivarajan,

You can add an action in one of your extended objects for collection manipulations, make sure you have correct external references for Dll and their Namespaces listed in Initialise stage, please follow below approach:

  1. Add an action in existing collection manipulation extended objects or create a new object if not already.13875.png13876.png13877.png
  2. Write Code as below to act on 3 column based duplicate removal in collections:

    System.GC.Collect()
    'filteredCollection=RawData.clone()
    'filteredCollection=RawData.DefaultView.ToTable(true, columnToCheck)
    dim count as integer=rawData.rows.count-1
    dim i as integer =0
    dim rowIndex as integer
    for  rowIndex = rawData.rows.count-2 to 0 step rowIndex-1
        if rawData.rows(rowIndex).item(columnToCheck1)=rawData.rows(rowIndex+1).item(columnToCheck1) and 
            rawData.rows(rowIndex).item(columnToCheck2)=rawData.rows(rowIndex+1).item(columnToCheck2) and 
            rawData.rows(rowIndex).item(columnToCheck3)=rawData.rows(rowIndex+1).item(columnToCheck3)

            rawData.rows(rowIndex+1).delete
        end if
    next
    filteredCollection=rawData.copy()


    13878.png
  3. Check required Dll and Namespaces -if you find any errors (Snippet below)13879.png

Let me know if you find any difficulties implementing this.



------------------------------
Kindly up vote this as "Best Answer" if it adds value or resolves your query in anyway possible, happy to help.

Regards,

Mukesh Kumar - Senior Automation Developer

NHS England, United Kingdom, GB
------------------------------
Regards,

Mukesh Kumar

Hi Mukesh

Thanks for your solution 

I have tested but their is thing that it is checking for consecutive rows I guess what if their is one row at index like 1 and same is repeated at row index 10 

I have implemented and tested it but it is not able to remove duplicate rows which are not in consecutive positions

 Can you please check if I am doing any wrong

Regards



------------------------------
Lakshmi Narayana
------------------------------

Hi Mukesh

Got the answer from your reply in other thread as first need to sort the collection with any one of the column in the three columns

Please ignore the above

Regards



------------------------------
Lakshmi Narayana
------------------------------

Glad you were able to sort that out  ! There was a similar query but for one column value duplicate - Row removal. 

The only thing that changes in the code is the AND condition operator for column/columns to check.



------------------------------
Kindly up vote this as "Best Answer" if it adds value or resolves your query in anyway possible, happy to help.

Regards,

Mukesh Kumar - Senior Automation Developer

NHS England, United Kingdom, GB
------------------------------
Regards,

Mukesh Kumar