cancel
Showing results for 
Search instead for 
Did you mean: 

Remove Duplicates from excel

NupurSood
Level 5
Hello All

Am looking a a code that is able to remove all duplicates from excel directly. I do not want to take the data in a collection and then remove duplicates because of huge volume. The de-duping must happen on the basis of all columns of excel and not just one defined column.

Thanks in advance for your guidance

Regards
Nupur

------------------------------
Nupur Sood
Research Associate
S&P
Asia/Kolkata
------------------------------
31 REPLIES 31

Hi, Nupur, 

ahh, I just realized that I already proposed this and you said it is 50.000 rows in Excel. What about using OLE DB and SQL query using DISTINCT keyword. That should be the fastest way getting what you need. Of course, you need to specify all the columns which you want to compare but you can get headers from the excel and build the SQL command from those headers.

Regards,

Zdenek

------------------------------
Zdeněk Kabátek
Head of Professional Services
NEOOPS
http://www.neoops.com/
Europe/Prague
------------------------------

Hello Eric,

I used this ws.Range(myRange).RemoveDuplicates(Columns:=New Object() {1, 2}, Header:=1) and its working for me. But the problem is I have 21 columns to look for, not just 2 columns. So I had to do this and the code is working. But I hardcoded this. Just wondering if there is an alternative solution for this? 

26221.png



------------------------------
Keerthana Jegannathan IPA Developer
------------------------------

Hi Keerthana,

You can modify your code a bit to accept one more additional parameter called as columnindices of type 'Text' where we will be passing comma separated column index values and then use the following code:

26229.png

26230.png

26231.png

Try

   message = ""

   Dim wb as Object = _
   GetWorkBook(handle, workbook)

   Dim ws as Object = _
   GetWorkSheet(handle, workbook, worksheet)

   Dim columnIndicesLst As List(Of Object) = New List(Of Object)

   For Each index In columnindices.Split(","c)

      columnIndicesLst.Add(1)

   Next

   Dim dup As Object = ws.Application.Range(Input_Range).RemoveDuplicates(columns:= columnIndicesLst.ToArray(), Header:=1)

   result = True

Catch ex As Exception

  result = False
  message = ex.Message

End Try

------------------------------
----------------------------------
Hope it helps you and if it resolves you query please mark it as the best answer so that others having the same problem can track the answer easily

Regards,
Devneet Mohanty
Intelligent Automation Consultant
Blue Prism Community MVP | Blue Prism 7x Certified Professional
Website: https://devneet.github.io/
Email: devneetmohanty07@gmail.com

----------------------------------
------------------------------
----------------------------------
Hope it helps you out and if my solution resolves your query, then please provide a big thumbs up so that the others members in the community having similar problem statement can track the answer easily in future.

Regards,
Devneet Mohanty
Intelligent Process Automation Consultant | Technical Business Analyst,
WonderBotz India Pvt. Ltd.
Blue Prism Community MVP | Blue Prism 7x Certified Professional
Website: https://devneet.github.io/
Email: devneetmohanty07@gmail.com

----------------------------------

Hi Susamay,

Thank you for the above code. It removes the duplicates from the column. 
What can we do to remove duplicates from column A based on the Date mentioned in column B, so that the recent data remains and older is deleted?

For example if input is:
26242.png 
Then after removing duplicates output should be:
26243.png
Anyone please suggest the solution.

One approach is -
First sort Column B in descending order and then use the above remove duplicates code.
For sorting I cannot use collections as the data is very huge ~ 80000 records.

Thanks,
Shweta



------------------------------
Shweta Dharmadhikari
RPA developer
Accenture Solutions Pvt Ltd
Asia/Kolkata
------------------------------

Hi Devneet,

Can you please let me know the sample example of columnindices input parameter.

Regards,
Shweta

------------------------------
Shweta Dharmadhikari
RPA developer
Accenture Solutions Pvt Ltd
Asia/Kolkata
------------------------------

Hi @ShwetaDharmadhi,

Hope you are doing well. Please find the below example for a better understanding. 

NOTE: I would also suggest you Shweta to create a new thread if you want quick responses as these threads are an year old and most often people tend to miss any updates being done on these threads.

I created an input file for  our example as below:

26257.png

As you can see, we have three columns named as Name, Section and Date Of Sign In. Now let say I only want to remove duplicates for the unique combination of names and section in my table, then I can use the 'Remove Duplicates' action which I have created in the following way:

26258.png

Here, I have defined the range as "A:C" Now within this range, I know 1st column is name and 2nd column is section so my column indices will be 0 and 1 since indices start from 0 for any array. So I can pass the column indices as "0,1"

Upon the execution, I get the below result:

26259.png


------------------------------
----------------------------------
Hope it helps you out and if my solution resolves your query, then please mark it as the 'Best Answer' so that the others members in the community having similar problem statement can track the answer easily in future

Regards,
Devneet Mohanty
Intelligent Process Automation Consultant | Sr. Consultant - Automation Developer,
WonderBotz India Pvt. Ltd.
Blue Prism Community MVP | Blue Prism 7x Certified Professional
Website: https://devneet.github.io/
Email: devneetmohanty07@gmail.com

----------------------------------
------------------------------
----------------------------------
Hope it helps you out and if my solution resolves your query, then please provide a big thumbs up so that the others members in the community having similar problem statement can track the answer easily in future.

Regards,
Devneet Mohanty
Intelligent Process Automation Consultant | Technical Business Analyst,
WonderBotz India Pvt. Ltd.
Blue Prism Community MVP | Blue Prism 7x Certified Professional
Website: https://devneet.github.io/
Email: devneetmohanty07@gmail.com

----------------------------------

Thank you @devneetmohanty07 , this resolved my query!
Also, I have posted a new query on "Delete rows based on a column value from excel having huge data"
Can you please have a look at that too...
 ​

------------------------------
Shweta Dharmadhikari
RPA developer
Accenture Solutions Pvt Ltd
Asia/Kolkata
------------------------------

Hi @devneetmohanty07,
Follow-up doubt to the solution:

I have a table with data in columns A to H. Some of the columns do have blank values.
Duplicates are to be removed from column F​, based on date mentioned in column B. In case of duplicates, the row having recent date in Column B should stay.
As per the solution above, I am passing parameters as input range - "A:H" & column indices "5" since it is the only column based on which duplicates need to be removed.

Now, what is happening is 2 abnormal outputs:
1. Some unique F valued rows are getting deleted.
2. Some duplicate F valued rows are staying back. (both of them)

Am I missing anything here? Request you to guide me further.

Regards,
Shweta

------------------------------
Shweta Dharmadhikari
RPA developer
Accenture Solutions Pvt Ltd
Asia/Kolkata
------------------------------

Hi Shweta,

When you say "Duplicates are to be removed from column F​, based on date mentioned in column B." I assume you mean its a grouping kind of a logic, so ideally you should be passing both F and B column indices which would be "1,5" 

Also, since you mention recent dated row needs to stay, first sort the rows with respect to column B and then try to use this action with column indices as "1,5". If still the issue remains, if possible send us some screenshot hiding any client sensitive data or you can prepare a dummy case and show us what is exactly happening once.


------------------------------
----------------------------------
Hope it helps you out and if my solution resolves your query, then please mark it as the 'Best Answer' so that the others members in the community having similar problem statement can track the answer easily in future

Regards,
Devneet Mohanty
Intelligent Process Automation Consultant | Sr. Consultant - Automation Developer,
WonderBotz India Pvt. Ltd.
Blue Prism Community MVP | Blue Prism 7x Certified Professional
Website: https://devneet.github.io/
Email: devneetmohanty07@gmail.com

----------------------------------
------------------------------
----------------------------------
Hope it helps you out and if my solution resolves your query, then please provide a big thumbs up so that the others members in the community having similar problem statement can track the answer easily in future.

Regards,
Devneet Mohanty
Intelligent Process Automation Consultant | Technical Business Analyst,
WonderBotz India Pvt. Ltd.
Blue Prism Community MVP | Blue Prism 7x Certified Professional
Website: https://devneet.github.io/
Email: devneetmohanty07@gmail.com

----------------------------------

Hai @devneetmohanty07 I am getting the error
Internal: Could not execute code stage: Parameter count mismatch.​

------------------------------
DevendraNath Chowdary Maganti
------------------------------