cancel
Showing results for 
Search instead for 
Did you mean: 

Merge Collections based on Key Field

ChrisHodgson1
Level 4
Hi, I'm looking to merge two collections with different columns that have one column with data that will match between them. Looping through both into a new collection does work, but takes far too long with collections that have 1000+ lines. Is there a way of merging them in a way similar to a SQL "Join" query? If so, does anybody have any example code? Thanks in advance, Chris
12 REPLIES 12

david.l.morris
Level 15
Hi Chris, Are you sure 'Merge Collection' isn't what you need? Of course, you'd need to remove/delete the key column from one of the collections before the merge or perhaps rename it. As long as both collections have the same number of rows and exactly the same keys, then it will work just fine. You may want to do some kind of validation afterward to sanity check it. If you're saying that the columns do not have the same rows, in that one has keys that the other does not have, then what I'm suggesting isn't what you need. Respectfully, Dave

Dave Morris, 3Ci at Southern Company

david.l.morris
Level 15
I forgot to mention you'd need to sort both collections on their key columns before merging as well.

Dave Morris, 3Ci at Southern Company

ChrisHodgson1
Level 4
Hi David, Unfortunately, I am attempting to merge two collections with different columns/rows, i.e.: Table 1 ID / Name / Age AL55W / John / 21 AL89R / James / 30 AL64W / Jack / 45 Table 2 ID / Orders / Frequency AL55W / 50 / Weekly AL89R / 75 / Weekly AL64W / 100 / Monthly Both collections contain the column ""ID"", but each collection contains different data in all other columns. Effectively, I'm looking to merge these collections based on the ""ID"" column without resorting to looping through 1000+ lines on each collection as it's taking far too long. Also, sometimes Table 2 will not have the ""ID"" values from Table 1, so these would either need to be removed or returned with blank values in the joined columns.  Thanks for taking the time to respond. Chris

david.l.morris
Level 15
Hi Chris, I see what you mean. I couldn't tell from the original post whether it was the same number of rows in each collection. You may be aware of this already, but I wanted to mention to completely ignore the processing time while in DEBUG mode (Process/Object Studio). It is significantly slower than in Control Room. I did a 'quick' test where I operated on 1500 rows, and it took 4 minutes in Control Room. That's not even accounting for the nested loop I imagine you'd need in your use case. Maybe LINQ is what you need in a code stage, since that's like SQL, and I imagine it's optimized for this kind of thing. Dave

Dave Morris, 3Ci at Southern Company

ChrisHodgson1
Level 4
Hi David, We've tried in Control Room also. You're correct that it's a lot faster, but our tables are increasingly large so this seems to get slower over time. I will look into LINQ as that may help. Thanks, Chris

AbhishekKV
Level 2
Hi Chris, Please give a try with OLEDB VBO. It should help you join the tables based on ID.   Abhishek

david.l.morris
Level 15
The only issue with OLEDB is that it's expecting you to connect to a source outside of BP. I suppose it could still be faster to write both tables to Excel and then query them both to join on ID. But at that point, why not just get someone to write a reusable action using VB/C# code?

Dave Morris, 3Ci at Southern Company

StuartRedmore
Level 3
HI Chris, I managed to acheive this recently by using a .net code stage. I added the collections to a data set, Created a data relation for the data set. Then loop throughed ""collection 1"" finding its children and added data to the row. I can post the code if you require. Regards

ChrisHodgson1
Level 4
Hi Stuart, That sounds like it could be useful. I'd love to have a look at it if you don't mind posting the code. Thanks for your response. Chris