cancel
Showing results for 
Search instead for 
Did you mean: 

Remove Duplicates from excel

NupurSood
Level 5
Hello All

Am looking a a code that is able to remove all duplicates from excel directly. I do not want to take the data in a collection and then remove duplicates because of huge volume. The de-duping must happen on the basis of all columns of excel and not just one defined column.

Thanks in advance for your guidance

Regards
Nupur

------------------------------
Nupur Sood
Research Associate
S&P
Asia/Kolkata
------------------------------
31 REPLIES 31

@NupurSood I think part of the problem is your "Columns" entry. The method expects a Variant array of column indexes. However, Variant is no longer a supported type, so I think you can create an array of Object type (which will just be the column indices). Here's an example:

Dim varData(2) As Object
varData(0) = "1"
varData(1) = "2"

ws.Range("A1:C100").RemoveDuplicates(Columns:=varData, Header:=2)
​
 ​
Cheers,

------------------------------
Eric Wilson
Director, Partner Integrations for Digital Exchange
Blue Prism
------------------------------

sonuiiml
Level 5

Hi Nupur,
You can use the below code 

Dim wb,ws, dup as object

try

wb= getworkbook(handle,workbook)
ws= getworksheet(handle,workbook, worksheet)

dup= ws.application.range(Input_Range).removeduplicates(columns:=1)

success= True

Catch e as exception
success = False
Message = e.message
Finally
wb=nothing
ws=nothing
dup=nothing
end try

You have to supply inputs for 

1. Workbook as text
2. Worksheet as text 
3. input_Range as text Eg. for full column it will be "B:B" for anything else "A1:B8"

4. handle as number 

be mindful of the column number you are entering in the line 

dup= ws.application.range(Input_Range).removeduplicates(columns:=1)

'here in this example its column A

1 refers to column A 
2 refers to columns B 

Good luck 



------------------------------
Susamay Halder Consultant
Consultant
Bruce Power
+1(437)217-1086
------------------------------

Thanks Eric

I get the below shown error with the Array defined as suggested. 
26207.png
26208.png
Regards


------------------------------
Nupur Sood
Research Associate
S&P
Asia/Kolkata
------------------------------

Hello @NupurSood,

Do you want to remove the duplicates based on one specific column value or you wanted to compare the full rows and then remove the data?

Since the code stage will vary based on the requirement. Please confirm and i can share the code with you. :)​​​



------------------------------
Ritansh Jatwani
Consultant
EY
Gurgaon
------------------------------

Hello @Ritansh Jatwani

I am looking at comparing full row please​

Thanks 
Nupur

------------------------------
Nupur Sood
Research Associate
S&P
Asia/Kolkata
------------------------------

@NupurSood It seems to be a rather weird conflict between VBA and VB.NET. It doesn't help that the documentation for the Excel interop library is vague as well. What I've found is similar to what Susamay posted below. It seems if you pass in a single column index things work fine. Alternatively, you can pass in an array of indices if you know them ahead of time. Here are examples of both approaches that I've verified on my machine:

ws.Range(myRange).RemoveDuplicates(Columns:=New Object() {1, 2}, Header:=1)

or 

ws.Range(myRange).RemoveDuplicates(1,1)

The first example shows how you could pass in an array of indices. I've tried pulling that array creation out and assigning it to an actual Object variable, but the same data type exception is thrown. Very weird. The second example shows passing in a single column index.

Cheers,

​​

------------------------------
Eric Wilson
Director, Partner Integrations for Digital Exchange
Blue Prism
------------------------------

ArchanaR
Level 2
Hi Nupur, 

Were you able to get a custom code that works for this scenario?

------------------------------
Archana R
Consultant
Deloitte
Europe/London
------------------------------

Hi

I previously created a vbo to do this but it does work on a named worksheet and range so you would need to do it for each worksheet but the range you can set to whatever you need to be. See below:

Dim wb, ws As Object
Dim excel, sheet, range As Object

wb = GetWorkbook(Handle, Workbook)
ws = GetWorksheet(Handle, Workbook, Worksheet)

wb.Activate()
ws.Activate()
excel = ws.Application
sheet = excel.ActiveSheet
range = sheet.Range(Reference)
range.Select()

'Range("B2178:F2187").Select
ws.Range(Reference).RemoveDuplicates (Columns:=Column_number, Header:= 2)

------------------------------
Michael ONeil
Technical Lead developer
Everis Consultancy
Europe/London
------------------------------

Remove duplicate values
  1. Select the range of cells that has duplicate values you want to remove. Tip: Remove any outlines or subtotals from your data before trying to remove duplicates.
  2. Click Data > Remove Duplicates, and then Under Columns, check or uncheck the columns where you want to remove the duplicates. ...
  3. Click OK.

Follow these steps:
  1. Select the range of cells, or ensure that the active cell is in a table.
  2. On the Data tab, click Remove Duplicates (in the Data Tools group).
  3. Do one or more of the following: television wall mount at wallmountedhub...
  4. Click OK, and a message will appear to indicate how many duplicate values were removed, or how many unique values remain.
If you want to remove duplicate data from a list you can use Remove Duplicates function in excel directly, then the unique data will be saved in the list, the duplicate one will be removed. But if you want to remove both duplicates from your list, this function cannot help you.


------------------------------
rwe sbokas
------------------------------

Hi, Nupur,

if you want one row code solution I can offer you in C# (you can change the language in VBO in Code Option I guess you know where)

Output_Collection = Input_Collection.DefaultView.ToTable( /*distinct*/ true);

Put above code in code stage. Input parameter is Input Collection and output parameter is Output_Collection. That returns only rows which are unique and compares all the columns.

Regards,

Zdenek

------------------------------
Zdeněk Kabátek
Head of Professional Services
NEOOPS
http://www.neoops.com/
Europe/Prague
------------------------------