11-07-22 01:03 PM
Hello all.
I've been tasked with tracking differences between two different Excel files for reporting purposes. My first thought was to gather them as collections through the Excel VBO and then Remove Duplicates via Collection Manipulation, but this had a poor success rate. So my second thought was to create a nested loop of the two collections (that contain the data from the spreadsheets) and do a comparison on the values of both (there are 20 columns per row to check against). If there was a match, remove the row and go to the next entry. If there was no match, write those values to a new collection - this would be my 'unique data'. Here's the flow, in case I haven't described that very well:
Now this works perfectly well in terms of accuracy. However, at the moment, it is only seemingly feasible for fairly small spreadsheets due to the time taken to execute. Admittedly, this is taken from stepping over in Debug mode, but for two collections with 117 rows, the time taken is about a minute. For 278 rows, it takes about 6 minutes, whereas for 400 rows it took about 15 minutes. I've just given up on waiting for it to do a collection with 827 rows after about an hour. The biggest collection it will ultimately need to parse through is about 1,800 rows. This obviously is not feasible with this seemingly exponential growth in processing power/time required.
Are there any existing code stages that might make this more manageable whilst maintaining accuracy? The other thought that occurred to me is to split the collections into batches of 100 rows or so, collect the unique data from those and then re-process these 'unique data' collections to get down to a final data set, although this feels very inefficient as well.
Open to all suggestions!
11-07-22 02:07 PM
11-07-22 02:11 PM
UniqueRowsCollection = DuplicatesCollection.DefaultView.ToTable(true, "ColA", "ColB", "ColC")
11-07-22 03:26 PM
12-07-22 05:11 AM