cancel
Showing results for 
Search instead for 
Did you mean: 

Issue related Excel Data Comparison

cs
Level 4
1 BEST ANSWER

Helpful Answers

@cs,

Below is an example of the Linq query I put together to look for the difference between the two files based on what I can see in your screenshots as well as the output. Notice I'm having to specifically compare columns by name because, as I mentioned, your screenshot seems to show that the columns have various differences between their column names.

var differences =
    (from drPMD in PMD.AsEnumerable()
     join drPFS in PFS.AsEnumerable() on drPMD.Field<string>(@"G/L Acct") equals drPFS.Field<string>(@"G/L acct") into match
	 from drPFS in match.DefaultIfEmpty()
     where (drPFS == null) || (drPMD.Field<string>(@"Curr") != drPFS.Field<string>(@"Acct cu")) ||
								(drPMD.Field<string>(@"FStGro") != drPFS.Field<string>(@"Fld stat")) ||
								(drPMD.Field<string>(@"Sort") != drPFS.Field<string>(@"Sort"))
	 
     select drPMD).ToList();

Differences = differences.CopyToDataTable();
​

28923.png
Cheers,

------------------------------
Eric Wilson
Director, Integrations and Enablement
Blue Prism Digital Exchange
------------------------------

View answer in original post

28 REPLIES 28

ewilson
Staff
Staff
Hello @csandms1@gmail.com csandms1@gmail.com,

What you're describing sounds like a type of SQL JOIN. A situation where you have two tables and you want to get a collection of rows from those tables based on some condition that exists, or doesn't, between them.

I came across an article on Microsoft's Developer Network that seems related and includes an example of how to compare two DataTables (that's what a Collection is under the hood) and return a third. Perhaps this solution will cover your needs?

https://social.msdn.microsoft.com/Forums/en-US/502ef961-2644-41d6-84be-464d28fcc24d/compare-two-datatables-and-return-3rd-with-difference?forum=aspwebforms

Cheers,
  ​

------------------------------
Eric Wilson
Director, Integrations and Enablement
Blue Prism Digital Exchange
------------------------------

 

So you want to flag rows that contain the same column values but are not in the same overall order between the two spreadsheets too? In your example above you call out the 3rd row of Excel1 where column B = 24 but the 3rd row of Excel2 has column B=26. The only issue here is that the rows from #3 onward are in a different order between the two collections. Would those then be flagged in your differences collection?

28557.png
The following code will perform a comparison of two DataTables and return a DataTable that contains the actual differences between them. However, this is looking for actual differences in DataRow objects. If the rows are simply in a different order, they don't count as a difference. That's just a sorting issue. The only way a DataRow is actually different is if it contains completely different field values.

28558.png
28559.png
28561.png
28562.png
Here's the actual code:

var differences = excel2.AsEnumerable().Except(excel1.AsEnumerable(), DataRowComparer.Default);
Differences = differences.Any() ? differences.CopyToDataTable() : new DataTable();​


FYI - Your username on the Community probably shouldn't contain an "@" sign as it screws up the mention feature a bit. 


Cheers,
​​

------------------------------
Eric Wilson
Director, Integrations and Enablement
Blue Prism Digital Exchange
------------------------------

Original Message:
Sent: 01-27-2022 12:12
From: Eric Wilson
Subject: Issue related Excel Data Comparison

Hello @cscsandms1@gmail.com csandms1@gmail.com" data-itemmentionkey="b8bdb7f6-0929-4b19-bb49-8ee8b79feecd" biobubblekey="mentionb81700d3-e5fb-4ff6-b924-c1464624e2cf" href="https://community.blueprism.com/network/profile?UserKey=b81700d3-e5fb-4ff6-b924-c1464624e2cf" data-can-remove="True">@csandms1@gmail.com csandms1@gmail.com,

What you're describing sounds like a type of SQL JOIN. A situation where you have two tables and you want to get a collection of rows from those tables based on some condition that exists, or doesn't, between them.

I came across an article on Microsoft's Developer Network that seems related and includes an example of how to compare two DataTables (that's what a Collection is under the hood) and return a third. Perhaps this solution will cover your needs?

https://social.msdn.microsoft.com/Forums/en-US/502ef961-2644-41d6-84be-464d28fcc24d/compare-two-datatables-and-return-3rd-with-difference?forum=aspwebforms

Cheers,
  ​

------------------------------
Eric Wilson
Director, Integrations and Enablement
Blue Prism Digital Exchange

Original Message:
Sent: 01-26-2022 20:06
From: csandms1@gmail.com csandms1@gmail.com
Subject: Issue related Excel Data Comparison

Hello to all 🙂
I have a scenario.....
1, Compare 2 excel files, each file contains 18 columns and approx. 7000 rows.
2, Columns are the same 18 in both files but rows may be different.
3, I need to compare these 2 files and find out the difference then take that difference as an output copy into excel.
Now, created a process and it partially worked in one scenario eg. if the columns and rows are the same then it compares row to row and extracts the difference correctly.
BUT when there is a difference in rows in both files eg. in one file 6000 and in 2nd file 5000 then the result is completely wrong.

A          B          C          D          E                         A          B          C          D          E

11       22        33        AB         55                      11       22        33        XY         56       

11       23       33         44         SS                      11       23       33         GG          JJ             Till here it's fine and result are ok        

11       24        33        44         55                     11       26        33        44         55            BUT from here onwards results are showing wrong        

11       25        33        44         55                     11       27        33        44         55                     

11       26        33        44         55                      11       24        33        44         55                     

11       27        33        44         55                      11       25        33        44         55                     

I have created 2 collections for these 2 excel files and then merged these 2 collections into 1 collection.
And then compare row by row for each column, till first 2 rows the result is fine but afterward it's giving the wrong result.

Is there any way out to compare these files like..... eg. the value (24 )in the 3rd row of the 2nd column (B) in the first file, if it won't find the value  (24) in the 3rd row of the 2nd column in 2nd file then it should move directly to 4th row of 2nd file until it gets the same value till the end? AND if it won't find the value in the 2ns file till the end then it should come back to the 4th row of the first file and start the comparison again in the same way and if it found any difference then catch it and save?

I hope some code stage will work BUT honestly, I don't have any idea as of now.

If you supporter/helper need any other information then please feel free to ask me.
Will be highly grateful for any reply !!!
Thank you



------------------------------
csandms1@gmail.com csandms1@gmail.com
------------------------------​

Hello @cs,

No need to apologize. I'm simply a bit thick-headed sometimes. 🙂

So it looks like you're doing some General Ledger comparison. From your example I'm making the following assumptions:
  • Column CoCode will always be the same, so we can effectively ignore it.
  • Column G/L Acct becomes our primary key, and all comparisons will be based off it.
  • The column definitions (basically the schema) between the two files (i.e. Collections) will be the same.
Given these assumptions I still think Linq is going to be your best bet, but it will be a slightly more complex query.​

I've set up another example below. I'm using smaller Collection definitions than what your actual files reflect, but the solution should still apply.

Collection1 - Master File
28566.png
Collection2
28567.png
Code Stage

28568.png
28569.png
var differences =
	(from drColl2 in Collection2.AsEnumerable()
	join drColl1 in Collection1.AsEnumerable() on drColl2.Field<string>(@"G/L Acct") equals drColl1.Field<string>(@"G/L Acct") into match
	from drColl1 in match.DefaultIfEmpty()
	where drColl1 == null || !DataRowComparer.Default.Equals(drColl2, drColl1)
	select drColl2).ToList();

Differences = differences.CopyToDataTable();

The above code basically performs a SQL Join on the two collections, using G/L Acct as the primary key, and returns any rows where:

  • the values of any of the other columns, in Collection2, do not match between rows, of Collection1, that have a matching G/L Acct.
  • any rows, from Collection2, that do not have a matching G/L Acct in Collection1. 
28570.png
If you only wanted to perform the comparison on certain columns you could revise the above WHERE clause to look at specific columns:
var differences =
    (from drColl2 in Collection2.AsEnumerable()
     join drColl1 in Collection1.AsEnumerable() on drColl2.Field<string>(@"G/L Acct") equals drColl1.Field<string>(@"G/L Acct") into match
	 from drColl1 in match.DefaultIfEmpty()
     where drColl1 == null || drColl2.Field<string>(@"Acct Curr") != drColl1.Field<string>(@"Acct Curr")
     select drColl2).ToList();
​
Differences = differences.CopyToDataTable();
 
28571.png
Hope this helps.

Cheers,




------------------------------
Eric Wilson
Director, Integrations and Enablement
Blue Prism Digital Exchange
------------------------------

Original Message:
Sent: 01-31-2022 18:03
From: Eric Wilson
Subject: Issue related Excel Data Comparison

Hello @cs,

No need to apologize. I'm simply a bit thick-headed sometimes. 🙂

So it looks like you're doing some General Ledger comparison. From your example I'm making the following assumptions:
  • Column CoCode will always be the same, so we can effectively ignore it.
  • Column G/L Acct becomes our primary key, and all comparisons will be based off it.
  • The column definitions (basically the schema) between the two files (i.e. Collections) will be the same.
Given these assumptions I still think Linq is going to be your best bet, but it will be a slightly more complex query.​

I've set up another example below. I'm using smaller Collection definitions than what your actual files reflect, but the solution should still apply.

Collection1 - Master File
28583.png
Collection2
28584.png
Code Stage

28585.png
28586.png
var differences =	(from drColl2 in Collection2.AsEnumerable()	join drColl1 in Collection1.AsEnumerable() on drColl2.Field<string>(@"G/L Acct") equals drColl1.Field<string>(@"G/L Acct") into match	from drColl1 in match.DefaultIfEmpty()	where drColl1 == null || !DataRowComparer.Default.Equals(drColl2, drColl1)	select drColl2).ToList();Differences = differences.CopyToDataTable();

The above code basically performs a SQL Join on the two collections, using G/L Acct as the primary key, and returns any rows where:

  • the values of any of the other columns, in Collection2, do not match between rows, of Collection1, that have a matching G/L Acct.
  • any rows, from Collection2, that do not have a matching G/L Acct in Collection1. 
28587.png
If you only wanted to perform the comparison on certain columns you could revise the above WHERE clause to look at specific columns:
var differences =    (from drColl2 in Collection2.AsEnumerable()     join drColl1 in Collection1.AsEnumerable() on drColl2.Field<string>(@"G/L Acct") equals drColl1.Field<string>(@"G/L Acct") into match	 from drColl1 in match.DefaultIfEmpty()     where drColl1 == null || drColl2.Field<string>(@"Acct Curr") != drColl1.Field<string>(@"Acct Curr")     select drColl2).ToList();​Differences = differences.CopyToDataTable();
 
28588.png
Hope this helps.

Cheers,




------------------------------
Eric Wilson
Director, Integrations and Enablement
Blue Prism Digital Exchange

@CSANDMS1,

Let me clarify a bit. When I said we can ignore the CoCode, what I meant is that the comparison you perform will always be against a Master File and Comparison File that have the same CoCode. I was not implying that the CoCode was not important, or that the solution above couldn't be run in a loop.

You can always add a small bit of logic at the front end to first verify that both files are for the same CoCode​, or you can extend the Linq query to include CoCode as part of the join (if there's a chance that one  or both of the files could contain multiple different CoCodes).

Cheers,

------------------------------
Eric Wilson
Director, Integrations and Enablement
Blue Prism Digital Exchange
------------------------------

Original Message:
Sent: 02-01-2022 12:12
From: Eric Wilson
Subject: Issue related Excel Data Comparison

@cs,

Let me clarify a bit. When I said we can ignore the CoCode, what I meant is that the comparison you perform will always be against a Master File and Comparison File that have the same CoCode. I was not implying that the CoCode was not important, or that the solution above couldn't be run in a loop.

You can always add a small bit of logic at the front end to first verify that both files are for the same CoCode​, or you can extend the Linq query to include CoCode as part of the join (if there's a chance that one  or both of the files could contain multiple different CoCodes).

Cheers,

------------------------------
Eric Wilson
Director, Integrations and Enablement
Blue Prism Digital Exchange

@CSANDMS1,

Apologies for the delayed response. I was out of the office today. There are two DLL's you'll want to reference:

  • System.Linq.dll
  • System.Data.DataSetExtensions.dll

For Namespace Imports, you'll want to include System.Linq.

Cheers,


------------------------------
Eric Wilson
Director, Integrations and Enablement
Blue Prism Digital Exchange
------------------------------