cancel
Showing results for 
Search instead for 
Did you mean: 

Issue with collection for very large excel or json

MayankGoyal2
Level 8
Hi, I have a excel file with 100k+ rows and 16 columns. I imported it to BP collection, the file was imported and collection shows row count, however I am not able to open the collection. Please suggest what is the best way/data structure to store huge volume of data in blue prism at run time..

30019.png


------------------------------
Mayank Goyal
------------------------------
1 BEST ANSWER

Helpful Answers

Hello Mayank,

It could be that the data collection is simply too large to load all at once. If you are already running into size issues when loading the data, there is potentially other problems that you will run into down stream as well (for example, if you are loading data into a work queue, you may encounter a network timeout because the Add to Work Queue operation takes too long on the network.)

One best practice is to try to trunk the large input file. This would not only fix this problem (which could be Blue Prism unable to allocate enough system memory for the whole data collection,) but also improve the overall resilience and performance of the final business process.

------------------------------
Wing Ling Leung
Senior Product Consultant
Blue Prism Professional Services
------------------------------

View answer in original post

6 REPLIES 6

MayankGoyal2
Level 8
Any data structure of .net which can handle large volume of data from excel/csv etc and can perform operations on it will be helpful.

------------------------------
Mayank Goyal
------------------------------

Hello Mayank,

It could be that the data collection is simply too large to load all at once. If you are already running into size issues when loading the data, there is potentially other problems that you will run into down stream as well (for example, if you are loading data into a work queue, you may encounter a network timeout because the Add to Work Queue operation takes too long on the network.)

One best practice is to try to trunk the large input file. This would not only fix this problem (which could be Blue Prism unable to allocate enough system memory for the whole data collection,) but also improve the overall resilience and performance of the final business process.

------------------------------
Wing Ling Leung
Senior Product Consultant
Blue Prism Professional Services
------------------------------

@Wing Ling Leung - Thanks for your response, a follow up question on same thread, if I am using collection in a process in loop to import data and I want to ensure that once I delete all rows from my collection, the memory allocated is released. I believe we have some concepts of datatable.clear(), datatable.dispose(), datatable = null used by developers to ensure memory is released once use of current data in data table is completed. How will I ensure the same for collection variables in my process for effective memory management in BP when lots of data is being used and exchanged between process and objects.
Is there anything specific we have to use for garbage collection in blue prism?

------------------------------
Mayank Goyal
------------------------------

@AmiBarrett  - Kindly have a look and provide some inputs, when process involves multiple collections and these collections are passed within process and objects, how can we ensure memory of a collection is released once its work is done, will deleting all rows of collection help after data in it is no more needed in process further.
Also want to understand how memory is consumed when collections are passed between process and objects, will that collection occupy separate memory in process and in object or collection is just created once and its the reference that is passed from process to object?

I am relating this to .net pass by value basic functionality where memory will just be consumed once and any operation on datatable in function2 is reflected in function1 datatable, however I believe it works different in BP, kindly suggest -

function1() {
calling function2 (datatable)
}

function2(datatable1){
operations on datatable like deleterows, deletefields ----> will reflect in function1 datatable
}​

------------------------------
Mayank Goyal
------------------------------

Hi Mayank,

I think this KB article has some of what you are looking for. It includes an method to explicit call the System's GC. Just be very careful of when and how often you call it as that may cause unintended side effects as well. If you do plan on using this method, please Google "gc.collect often" to see some good discussions on when to use this capabilities. In the right conditions, we have seen using this call to significantly reduce the memory (more quickly) after a complete run of a specific process in Blue Prism.

http://portal.blueprism.com/customer-support/support-center#/path/1141772192


------------------------------
Wing Ling Leung
Senior Product Consultant
Blue Prism Professional Services
------------------------------

@Wing Ling Leung - Thanks for your response, as per my understanding lot of articles on google suggest not to use gc.collect and let that be called automatically. 

Also want to understand how memory is consumed when collections are passed between process and objects, will that collection occupy separate memory in process and in object or collection is just created once and its the reference that is passed from process to object?

I am relating this to .net pass by value basic functionality where memory will just be consumed once and any operation on datatable in function2 is reflected in function1 datatable, however I believe it works different in BP, kindly suggest -

function1() {
calling function2 (datatable)
}

function2(datatable1){
operations on datatable like deleterows, deletefields ----> will reflect in function1 datatable
}​

------------------------------
Mayank Goyal
------------------------------