cancel
Showing results for 
Search instead for 
Did you mean: 

Issue related Excel Data Comparison

cs
Level 4
28 REPLIES 28

Original Message:
Sent: 03-24-2022 22:59
From: Eric Wilson
Subject: Issue related Excel Data Comparison

@cs,

The first, of the two, compile errors you have there is because you're defining an output variable in the Code1 stage, on the page Community Solution, called differences, but you're also defining an actual C# variable in the source code named differences. 
28857.png

​Notice, I've changed the Output parameter name to Differences with a capital 'D'. There are a million different ways you could change this to resolve the name conflict.
28858.png

I'm thinking once you resolve that name conflict, it may resolve the second error too.

Cheers,

------------------------------
Eric Wilson
Director, Integrations and Enablement
Blue Prism Digital Exchange

Original Message:
Sent: 03-24-2022 16:49
From: Eric Wilson
Subject: Issue related Excel Data Comparison

What version of Blue prism are you using? I tried importing your VBO, but BP threw an "unrecognized GUID format" error. Are you using a localized version of the software (different language)?

Cheers,


------------------------------
Eric Wilson
Director, Integrations and Enablement
Blue Prism Digital Exchange

Original Message:
Sent: 03-11-2022 17:24
From: Eric Wilson
Subject: Issue related Excel Data Comparison

@cs,

There's one DLL reference I forgot. You need to add System.Core.dll to your External References.

Cheers,
​​

------------------------------
Eric Wilson
Director, Integrations and Enablement
Blue Prism Digital Exchange

Original Message:
Sent: 03-10-2022 12:41
From: Eric Wilson
Subject: Issue related Excel Data Comparison

@cs,

On the differences error, did you define an Output data item names differences too? If so, that's the cause. Basically you have an Output variable with the same name as the internal var differences variable you defined in the code. If you refer back to my example, I used a capital "D" on the output data item. It makes a difference, no pun intended, because of case sensitivity.

28859.png
For the first error, I'm not sure on that one. Would you post a screenshot of your Code Options page?

Cheers,

------------------------------
Eric Wilson
Director, Integrations and Enablement
Blue Prism Digital Exchange

Original Message:
Sent: 03-03-2022 23:28
From: Eric Wilson
Subject: Issue related Excel Data Comparison

@cs,

Apologies for the delayed response. I was out of the office today. There are two DLL's you'll want to reference:

  • System.Linq.dll
  • System.Data.DataSetExtensions.dll

For Namespace Imports, you'll want to include System.Linq.

Cheers,


------------------------------
Eric Wilson
Director, Integrations and Enablement
Blue Prism Digital Exchange

Original Message:
Sent: 02-01-2022 12:12
From: Eric Wilson
Subject: Issue related Excel Data Comparison

@cs,

Let me clarify a bit. When I said we can ignore the CoCode, what I meant is that the comparison you perform will always be against a Master File and Comparison File that have the same CoCode. I was not implying that the CoCode was not important, or that the solution above couldn't be run in a loop.

You can always add a small bit of logic at the front end to first verify that both files are for the same CoCode​, or you can extend the Linq query to include CoCode as part of the join (if there's a chance that one  or both of the files could contain multiple different CoCodes).

Cheers,

------------------------------
Eric Wilson
Director, Integrations and Enablement
Blue Prism Digital Exchange

Original Message:
Sent: 01-31-2022 18:03
From: Eric Wilson
Subject: Issue related Excel Data Comparison

Hello @cs,

No need to apologize. I'm simply a bit thick-headed sometimes. :)

So it looks like you're doing some General Ledger comparison. From your example I'm making the following assumptions:
  • Column CoCode will always be the same, so we can effectively ignore it.
  • Column G/L Acct becomes our primary key, and all comparisons will be based off it.
  • The column definitions (basically the schema) between the two files (i.e. Collections) will be the same.
Given these assumptions I still think Linq is going to be your best bet, but it will be a slightly more complex query.​

I've set up another example below. I'm using smaller Collection definitions than what your actual files reflect, but the solution should still apply.

Collection1 - Master File
28860.png
Collection2
28861.png
Code Stage

28862.png
28863.png
var differences =	(from drColl2 in Collection2.AsEnumerable()	join drColl1 in Collection1.AsEnumerable() on drColl2.Field<string>(@"G/L Acct") equals drColl1.Field<string>(@"G/L Acct") into match	from drColl1 in match.DefaultIfEmpty()	where drColl1 == null || !DataRowComparer.Default.Equals(drColl2, drColl1)	select drColl2).ToList();Differences = differences.CopyToDataTable();

The above code basically performs a SQL Join on the two collections, using G/L Acct as the primary key, and returns any rows where:

  • the values of any of the other columns, in Collection2, do not match between rows, of Collection1, that have a matching G/L Acct.
  • any rows, from Collection2, that do not have a matching G/L Acct in Collection1. 
28865.png
If you only wanted to perform the comparison on certain columns you could revise the above WHERE clause to look at specific columns:
var differences =    (from drColl2 in Collection2.AsEnumerable()     join drColl1 in Collection1.AsEnumerable() on drColl2.Field<string>(@"G/L Acct") equals drColl1.Field<string>(@"G/L Acct") into match	 from drColl1 in match.DefaultIfEmpty()     where drColl1 == null || drColl2.Field<string>(@"Acct Curr") != drColl1.Field<string>(@"Acct Curr")     select drColl2).ToList();​Differences = differences.CopyToDataTable();
 
28867.png
Hope this helps.

Cheers,




------------------------------
Eric Wilson
Director, Integrations and Enablement
Blue Prism Digital Exchange

Original Message:
Sent: 01-27-2022 17:50
From: Eric Wilson
Subject: Issue related Excel Data Comparison

So you want to flag rows that contain the same column values but are not in the same overall order between the two spreadsheets too? In your example above you call out the 3rd row of Excel1 where column B = 24 but the 3rd row of Excel2 has column B=26. The only issue here is that the rows from #3 onward are in a different order between the two collections. Would those then be flagged in your differences collection?

28868.png
The following code will perform a comparison of two DataTables and return a DataTable that contains the actual differences between them. However, this is looking for actual differences in DataRow objects. If the rows are simply in a different order, they don't count as a difference. That's just a sorting issue. The only way a DataRow is actually different is if it contains completely different field values.

28869.png
28870.png
28871.png
28872.png
Here's the actual code:

var differences = excel2.AsEnumerable().Except(excel1.AsEnumerable(), DataRowComparer.Default);Differences = differences.Any() ? differences.CopyToDataTable() : new DataTable();​


FYI - Your username on the Community probably shouldn't contain an "@" sign as it screws up the mention feature a bit. 


Cheers,
​​

------------------------------
Eric Wilson
Director, Integrations and Enablement
Blue Prism Digital Exchange

Original Message:
Sent: 01-27-2022 15:27
From: csandms1@gmail.com csandms1@gmail.com
Subject: Issue related Excel Data Comparison

Dear Mr. Wilson,
First of all, thank you so much for the instant reply. Really, it's an honor!!!
Just some additional points would like to highlight...
1, Is there any possibility that we can complete this via Blueprism?
2, In my case,  First excel is the master sheet, where the data is correct. And wants to compare with 2nd excel where some rows are missed OR the data in the cells in the same row is different. (as mentioned up and highlighted in red and blue color).

If any other information is needed then please feel free to ask me any time.

Thanks

------------------------------
csandms1@gmail.com csandms1@gmail.com

Original Message:
Sent: 01-27-2022 12:12
From: Eric Wilson
Subject: Issue related Excel Data Comparison

Hello @cscsandms1@gmail.com csandms1@gmail.com" data-itemmentionkey="b8bdb7f6-0929-4b19-bb49-8ee8b79feecd" biobubblekey="mentionb81700d3-e5fb-4ff6-b924-c1464624e2cf" href="https://community.blueprism.com/network/profile?UserKey=b81700d3-e5fb-4ff6-b924-c1464624e2cf" data-can-remove="True">@csandms1@gmail.com csandms1@gmail.com,

What you're describing sounds like a type of SQL JOIN. A situation where you have two tables and you want to get a collection of rows from those tables based on some condition that exists, or doesn't, between them.

I came across an article on Microsoft's Developer Network that seems related and includes an example of how to compare two DataTables (that's what a Collection is under the hood) and return a third. Perhaps this solution will cover your needs?

https://social.msdn.microsoft.com/Forums/en-US/502ef961-2644-41d6-84be-464d28fcc24d/compare-two-datatables-and-return-3rd-with-difference?forum=aspwebforms

Cheers,
  ​

------------------------------
Eric Wilson
Director, Integrations and Enablement
Blue Prism Digital Exchange

@cs,

​Ok, so what I think is happening here is that the Linq check is comparing every column in the row. If there are any differences, then it grabs it, so my assumption is every one of your rows differs between the two Excels in some way.​ If you only want to compare specific columns within the row then there's slightly more complex comparison logic that you have to use. That was shown in my example from 2022/01/31 that was marked as "Best Answer".

var differences =
	(from drColl2 in Collection2.AsEnumerable()
	join drColl1 in Collection1.AsEnumerable() on drColl2.Field<string>(@"G/L Acct") equals drColl1.Field<string>(@"G/L Acct") into match
	from drColl1 in match.DefaultIfEmpty()
	where drColl1 == null || !DataRowComparer.Default.Equals(drColl2, drColl1)
	select drColl2).ToList();

Differences = differences.CopyToDataTable();​

The above code basically performs a SQL Join on the two collections, using G/L Acct as the primary key, and returns any rows where:
  • the values of any of the other columns, in Collection2, do not match between rows, of Collection1, that have a matching G/L Acct.
  • any rows, from Collection2, that do not have a matching G/L Acct in Collection1. 
If you want to compare against more specific columns then you'd extend the above code to include those additional column definitions.

Cheers,

------------------------------
Eric Wilson
Director, Integrations and Enablement
Blue Prism Digital Exchange
------------------------------

Original Message:
Sent: 03-25-2022 15:36
From: Eric Wilson
Subject: Issue related Excel Data Comparison

@cs,

​Ok, so what I think is happening here is that the Linq check is comparing every column in the row. If there are any differences, then it grabs it, so my assumption is every one of your rows differs between the two Excels in some way.​ If you only want to compare specific columns within the row then there's slightly more complex comparison logic that you have to use. That was shown in my example from 2022/01/31 that was marked as "Best Answer".

var differences =	(from drColl2 in Collection2.AsEnumerable()	join drColl1 in Collection1.AsEnumerable() on drColl2.Field<string>(@"G/L Acct") equals drColl1.Field<string>(@"G/L Acct") into match	from drColl1 in match.DefaultIfEmpty()	where drColl1 == null || !DataRowComparer.Default.Equals(drColl2, drColl1)	select drColl2).ToList();Differences = differences.CopyToDataTable();​

The above code basically performs a SQL Join on the two collections, using G/L Acct as the primary key, and returns any rows where:
  • the values of any of the other columns, in Collection2, do not match between rows, of Collection1, that have a matching G/L Acct.
  • any rows, from Collection2, that do not have a matching G/L Acct in Collection1. 
If you want to compare against more specific columns then you'd extend the above code to include those additional column definitions.

Cheers,

------------------------------
Eric Wilson
Director, Integrations and Enablement
Blue Prism Digital Exchange
------------------------------

@cs,

Are there specific columns that you're trying to compare, or are you looking for ANY difference where the G/L Account is the same between the two files? Part of the issue here is that it seems your two files have different column names. It's hard to tell from screenshots that don't show the entire column names. If the columns have different name, and they're not specifically joined, Linq won't compare them.

Cheers,


------------------------------
Eric Wilson
Director, Integrations and Enablement
Blue Prism Digital Exchange
------------------------------

@cs,

Below is an example of the Linq query I put together to look for the difference between the two files based on what I can see in your screenshots as well as the output. Notice I'm having to specifically compare columns by name because, as I mentioned, your screenshot seems to show that the columns have various differences between their column names.

var differences =
    (from drPMD in PMD.AsEnumerable()
     join drPFS in PFS.AsEnumerable() on drPMD.Field<string>(@"G/L Acct") equals drPFS.Field<string>(@"G/L acct") into match
	 from drPFS in match.DefaultIfEmpty()
     where (drPFS == null) || (drPMD.Field<string>(@"Curr") != drPFS.Field<string>(@"Acct cu")) ||
								(drPMD.Field<string>(@"FStGro") != drPFS.Field<string>(@"Fld stat")) ||
								(drPMD.Field<string>(@"Sort") != drPFS.Field<string>(@"Sort"))
	 
     select drPMD).ToList();

Differences = differences.CopyToDataTable();
​

28923.png
Cheers,

------------------------------
Eric Wilson
Director, Integrations and Enablement
Blue Prism Digital Exchange
------------------------------

Hello Mr. Willson,
The code is working fine. As I'm using this code in between the Loop because the comparison that I have to do is for multiple CCs e.g. more than 100.
When this process is starting then it's working fine for first at least 10 CCs and giving the result perfectly fine for each and every CCs. BUT then this error appears.
28943.pngInfo........ Both comparison files in form of collections are fine like the fisrt 10 CCs where comparison done perfectly fine but as it moves to next CC then error appears BUT no change in settings or in anything.
If any other info is needed then please feel free to ask anytime.

Thanks and Regards,

------------------------------
cs
------------------------------

@cs,

Can you show me your loop set up? The last VBO you shared didn't have the Code stage within the loops. There was a loop for PFS Files and then a second loop for PMD Files. The Code stage was called afterwards. Did you just add a loop around the Code stage? If so, which Collection is the controlling collection of the loop?

Cheers,


------------------------------
Eric Wilson
Director, Integrations and Enablement
Blue Prism Digital Exchange
------------------------------


Original Message:
Sent: 03-30-2022 15:20
From: Eric Wilson
Subject: Issue related Excel Data Comparison

@cs,

Can you show me your loop set up? The last VBO you shared didn't have the Code stage within the loops. There was a loop for PFS Files and then a second loop for PMD Files. The Code stage was called afterwards. Did you just add a loop around the Code stage? If so, which Collection is the controlling collection of the loop?

Cheers,


------------------------------
Eric Wilson
Director, Integrations and Enablement
Blue Prism Digital Exchange

@cs,

Is it the Collections titled PFS and PMD, within the loop, that are being passed into the Code stage, or is it the ones titled Amended PFS and Amended PMD?​ 

29031.pngHave you considered putting a Collections::Remove All Rows step in at the end of the loop to ensure those Collections are reset before going back to the Code stage?

You might also add an if conditional statement around the Linq code to ensure the datat tables (aka Collections) actually contain rows before trying to execute the Linq code. That way you don't hit the exception.

I also noticed you're using the original Linq code without the specific column mappings. Is that what you intended to do?

Cheers,



------------------------------
Eric Wilson
Director, Integrations and Enablement
Blue Prism Digital Exchange
------------------------------