cancel
Showing results for 
Search instead for 
Did you mean: 

Delete Repetitive Values in a collection or Excel

vinodchinthakin
Level 9
Hi All,
I have a following scenario where I want to remove duplicate/repetitive values in a column.
In below example, I want to remove complete rows of text Raju which is repeated more than once in Column Name and output should look as shown below.
What logic I can apply other than conventional solution like using a loop. If the Input has 50K records or high number of records where using a loop might not be a perfect solution.
Do we have a any action for that in collection or in Excel VBO or any developed code in VB.net or C#?

Input Collection : 
Name    Age    Location
Raju       23        Mumbai
Ravi       34         Bangalore
Raju       27         Hyderabad
Ramesh 45         Delhi

Output Collection : 
Name    Age    Location
Ravi       34         Bangalore
Ramesh 45         Delhi


------------------------------
vinod chinthakindi
------------------------------
1 BEST ANSWER

Best Answers

Hi Vinod,

You can create a new business object and add the below External References ('System.Data.DataSetExtensions.dll', 'System.Core.dll') and Namespace Imports ('System.Data.DataSetExtensions', 'System.LINQ') on the Page Description stage of your Initialize action for the LINQ queries to work properly. Also, ensure that the language is selected as 'Visual Basic':

18924.png
Once you have the updated code options as shown above, create a new action named 'Get Non Repeating Records' and pass two input arguments, Input Collection (Collection) and Field Name (Text). Based on the field name that you provide the duplicate records will be fetched from the Input Collection. Also set an Output parameter as Output Collection (Collection) for this action as shown below:

18925.png

Add the code stage and use the below code with the input and out arguments as show:

Output_Collection = (From row In Input_Collection _
Group row By a = row(Field_Name).Trim.ToString Into grp = Group _
Where grp.Count=1 _
Select grp.ToList).SelectMany(Function(x) x).CopyToDataTable()

18926.png18927.png
18928.png
The run results are as follows:

Input Arguments:

18929.png

Output Result:

18930.png

You can publish the action and test the same from Process Studio. Let us know if this helps you out 🙂


------------------------------
----------------------------------
Hope it helps you and if it resolves you query please mark it as the best answer so that others having the same problem can track the answer easily

Regards,
Devneet Mohanty
Intelligent Process Automation Consultant
Blue Prism 7x Certified Professional
Website: https://devneet.github.io/
Email: devneetmohanty07@gmail.com

----------------------------------
------------------------------
----------------------------------
Hope it helps you out and if my solution resolves your query, then please provide a big thumbs up so that the others members in the community having similar problem statement can track the answer easily in future.

Regards,
Devneet Mohanty
Intelligent Process Automation Consultant | Technical Business Analyst,
WonderBotz India Pvt. Ltd.
Blue Prism Community MVP | Blue Prism 7x Certified Professional
Website: https://devneet.github.io/
Email: devneetmohanty07@gmail.com

----------------------------------

View answer in original post

13 REPLIES 13

Hi Vinod,

If the input is an excel file you can simply use the OLEDB VBO and execute a query like 'SELECT * FROM [<Sheet_Name>$] WHERE [Name] IN (SELECT [Name] FROM [<Sheet_Name>$] GROUP BY [Name] HAVING COUNT (*) = 1)'

Just replace the <Sheet_Name> with the excel sheet consisting of your dataset.

For collection, you can try to use a LINQ query like:

var Collection_Out = Collection_In.GroupBy(item => item.Name).Where(g => g.Count() == 1).Select(g => g.Single());

Do let me know if you want to know how to create a LINQ VBO, I have posted multiple threads on this as well otherwise I can show detailed steps as well.

------------------------------
----------------------------------
Hope it helps you and if it resolves you query please mark it as the best answer so that others having the same problem can track the answer easily

Regards,
Devneet Mohanty
Intelligent Process Automation Consultant
Blue Prism 7x Certified Professional
Website: https://devneet.github.io/
Email: devneetmohanty07@gmail.com

----------------------------------
------------------------------
----------------------------------
Hope it helps you out and if my solution resolves your query, then please provide a big thumbs up so that the others members in the community having similar problem statement can track the answer easily in future.

Regards,
Devneet Mohanty
Intelligent Process Automation Consultant | Technical Business Analyst,
WonderBotz India Pvt. Ltd.
Blue Prism Community MVP | Blue Prism 7x Certified Professional
Website: https://devneet.github.io/
Email: devneetmohanty07@gmail.com

----------------------------------

Thanks Dev for quick response. I will try to implement your solution and will let you know once am done with it,

------------------------------
vinod chinthakindi
------------------------------

Hi Dev, I have tried your code
var Collection_Out = Collection_In.GroupBy(item => item.Name).Where(g => g.Count() == 1).Select(g => g.Single());
When I tried above code in C# getting following error. Can you name the dll which is required here.
Description: Compiler error at line 1: 'DataTable' does not contain a definition for 'GroupBy' and no accessible extension method 'GroupBy' accepting a first argument of type 'DataTable' could be found (are you missing a using directive or an assembly reference?)

Can you also provide same code in VB.Net so that I can accumulate in the code stage in existing collection VBO.

Thanks



------------------------------
vinod chinthakindi
------------------------------

Hi Vinod,

You can create a new business object and add the below External References ('System.Data.DataSetExtensions.dll', 'System.Core.dll') and Namespace Imports ('System.Data.DataSetExtensions', 'System.LINQ') on the Page Description stage of your Initialize action for the LINQ queries to work properly. Also, ensure that the language is selected as 'Visual Basic':

18924.png
Once you have the updated code options as shown above, create a new action named 'Get Non Repeating Records' and pass two input arguments, Input Collection (Collection) and Field Name (Text). Based on the field name that you provide the duplicate records will be fetched from the Input Collection. Also set an Output parameter as Output Collection (Collection) for this action as shown below:

18925.png

Add the code stage and use the below code with the input and out arguments as show:

Output_Collection = (From row In Input_Collection _
Group row By a = row(Field_Name).Trim.ToString Into grp = Group _
Where grp.Count=1 _
Select grp.ToList).SelectMany(Function(x) x).CopyToDataTable()

18926.png18927.png
18928.png
The run results are as follows:

Input Arguments:

18929.png

Output Result:

18930.png

You can publish the action and test the same from Process Studio. Let us know if this helps you out 🙂


------------------------------
----------------------------------
Hope it helps you and if it resolves you query please mark it as the best answer so that others having the same problem can track the answer easily

Regards,
Devneet Mohanty
Intelligent Process Automation Consultant
Blue Prism 7x Certified Professional
Website: https://devneet.github.io/
Email: devneetmohanty07@gmail.com

----------------------------------
------------------------------
----------------------------------
Hope it helps you out and if my solution resolves your query, then please provide a big thumbs up so that the others members in the community having similar problem statement can track the answer easily in future.

Regards,
Devneet Mohanty
Intelligent Process Automation Consultant | Technical Business Analyst,
WonderBotz India Pvt. Ltd.
Blue Prism Community MVP | Blue Prism 7x Certified Professional
Website: https://devneet.github.io/
Email: devneetmohanty07@gmail.com

----------------------------------

Awesome Dev. Its works well.!
I need a help where I have a code C# needs to convert to VB.Net. Can you help me out. The code helps in to "Find Row and Column Numbers by Value"

DT = new DataTable();
DT = new DataTable();
DT.Columns.Add("Column Name");
DT.Columns.Add("RowIndex");DT.Columns.Add("ColumnIndex");
int rowindex=0;
int columnindex=1;
foreach (DataColumn Column in in_DT.Columns)
{
DataRow[] rows = in_DT.Select(Column.ColumnName+" Like '%"+valuetofind+"%'");
if (rows.Length>0)
{ foreach (DataRow row in rows)
{
rowindex = in_DT.Rows.IndexOf(row)+1;
DataRow dr = DT.NewRow(); dr["Column Name"]=Column.ColumnName;
dr["RowIndex"]=rowindex; dr["ColumnIndex"]=columnindex;
DT.Rows.Add(dr);
}
} else {rowindex = 0;
} columnindex=columnindex+1;
}

Note : Also Can you suggest how to convert code from C# to VB, is there a simple way to understand the syntax or any tool which helps in conversion. If possible can you share/make a video/document on it.

------------------------------
vinod chinthakindi
------------------------------

Hey Vinod,

I went through your code and have converted the same into VB .NET. You can extend the 'Utility - Collection Manipulation' VBO by creating the action 'Find Row And Column Numbers By Value' with the following input and output parameters:

Input Parameters:

in_DT (Collection): The input collection consisting of the data where the operation needs to be performed.

in_ValueToFind (Text): The text value which needs to be looked up for in the input collection.

Output Parameters:

out_DT (Collection) : The output collection consisting of the Column Name, Row Index and Column Index where the values are found.

Workflow:

18938.png

Code Stage Parameters:

18939.png

Code:

18940.png

Dim DT As New DataTable()
Dim rows() As DataRow

DT.Columns.Add("Column Name")
DT.Columns.Add("RowIndex")
DT.Columns.Add("ColumnIndex")

Dim rowindex As Integer = 0
Dim columnindex As Integer = 1


For Each Column As DataColumn In in_DT.Columns

    rows = in_DT.Select(Column.ColumnName + " Like '%" + valuetofind + "%'")

    If rows.Length > 0 Then

        For Each row As DataRow In rows

             rowindex = in_DT.Rows.IndexOf(row)+1
             Dim dr As DataRow = DT.NewRow()
             dr("Column Name")= Column.ColumnName
             dr("RowIndex") = rowindex
             dr("ColumnIndex") = columnindex
             DT.Rows.Add(dr)

         Next

    Else

         rowindex = 0

    End If

    columnindex = columnindex + 1

Next

out_DT = DT


Test Results:

Input Arguments:

18941.png

Output Collection:

18942.png

VB .NET and C# have few notable differences to state explicitly which I can tell you upfront:

- In VB .NET in case you need to declare variables using Dim and As notations whereas C# directly uses the typical class declaration.
- In VB .NET, you don't have any semicolons whereas in C# it is mandatory
- In VB .NET, you define scopes using For Each....Next, If....End If whereas in C# you use curly braces.
- In both languages the way the arrays are defined differ syntactically a bit

To me personally, C# seems easier to use as I am majorly from a Java based background and C# is a lot similar to Java, but you might have seen that mostly I use VB .NET for my objects, reason being is since last one year or so most of my automations have been using VBA or VB Script in some sort so I have kind of developed this habit of writing syntaxes on it more frequently.

In the end I would say it depends on your personal preference which language you choose and converting logic is not much difficult if you know what object both languages are using. With internet resources being so abundant in nature it has become even easier. But coming to the part if there is any converter which translates C# into VB .NET or vice versa, I am not really aware of this yet but perhaps someone can shed some light over it. I would be more than happy to give it a try as it would reduce my pain point by a bunch to be honest lol 😄

------------------------------
----------------------------------
Hope it helps you and if it resolves you query please mark it as the best answer so that others having the same problem can track the answer easily

Regards,
Devneet Mohanty
Intelligent Process Automation Consultant
Blue Prism 7x Certified Professional
Website: https://devneet.github.io/
Email: devneetmohanty07@gmail.com

----------------------------------
------------------------------
----------------------------------
Hope it helps you out and if my solution resolves your query, then please provide a big thumbs up so that the others members in the community having similar problem statement can track the answer easily in future.

Regards,
Devneet Mohanty
Intelligent Process Automation Consultant | Technical Business Analyst,
WonderBotz India Pvt. Ltd.
Blue Prism Community MVP | Blue Prism 7x Certified Professional
Website: https://devneet.github.io/
Email: devneetmohanty07@gmail.com

----------------------------------

Thanks Dev. Code works Perfect!
The reason why I have asked to convert to VB script as most of the Assets published by BP or DX team are developed using VB script. It will be easy for me to incorporate new actions in existing VBO using same coding language.

Even I have used some opensource web tools for conversion but seems lot of editing is required. For Example https://converter.telerik.com/
Thanks for providing valid inputs, will work around.

------------------------------
vinod chinthakindi
------------------------------

GabrielaEsquive
Level 3
Does anyone know how can this be achieved but to review all columns to identify the duplicated values? 

Thanks in advance

------------------------------
Gabriela Esquivel
------------------------------

Hi @Gabriela Esquivel,

Thanks for reaching out to community. Wanted to check what exactly you mean by review all columns to identify duplicated values? From what I perceive it is that you want non-repeating records ​based on all the column values just except any one specific field name. If that is your requirement you can achieve the same by adding a simple action in the 'Collection Manipulation' VBO:

18962.png
18963.png
18964.png

Code:

Output_Collection = Input_Collection.DefaultView.ToTable(True)​


------------------------------
----------------------------------
Hope it helps you out and if my solution resolves your query, then please mark it as the 'Best Answer' so that the others members in the community having similar problem statement can track the answer easily in future

Regards,
Devneet Mohanty
Intelligent Process Automation Consultant | Sr. Consultant - Automation Developer,
WonderBotz India Pvt. Ltd.
Blue Prism Community MVP | Blue Prism 7x Certified Professional
Website: https://devneet.github.io/
Email: devneetmohanty07@gmail.com

----------------------------------
------------------------------
----------------------------------
Hope it helps you out and if my solution resolves your query, then please provide a big thumbs up so that the others members in the community having similar problem statement can track the answer easily in future.

Regards,
Devneet Mohanty
Intelligent Process Automation Consultant | Technical Business Analyst,
WonderBotz India Pvt. Ltd.
Blue Prism Community MVP | Blue Prism 7x Certified Professional
Website: https://devneet.github.io/
Email: devneetmohanty07@gmail.com

----------------------------------