cancel
Showing results for 
Search instead for 
Did you mean: 

Work Queues, Collections - Limitations - design question

Venkata_Sreedha
Level 4
 Hi, I have a requirement where I need to process about 400K records which are in an excel. Of the 400K records, there could be about 75K records (based on some eligibility criteria) for which, I need to go to a web portal and obtain the required information and update the details in the portal. For rest of the records which did not satisfy eligibility criteria, no updates are necessary. There are two ways I can accomplish this task. 1) Using Work Queues  Approach:  a) read entire 400K records into a collection and load them to a work queue  b) use multi bot architecture to process the records and update the details in excel. 2) Using Single bot:  Approach:  a) Read each record from the excel and identify if the record needs to be updated.  b) if yes, get the details from the web portal and update the details in the excel.  c) repeat above two steps until end of the excel file. I almost certainly need to develop the bot using first design approach (multi bot architecture) unless there is some thing that cannot be achieved using first approach. In this regard, below are the questions I have - a) is there any limitation on the size of the collection I can use in Blue prism. would blue prism be able to read 400K records into a collection?  b) I know work queues can handle large amounts of data but would data size of 400k records be of any challenge for work queue to handle?  c) are there any pitfalls I need to look out for in this design? Any suggestions/comments/feedback on the approaches mentioned above are welcome. If you have any other thoughts on any new approach, please let me know. Your inputs would be invaluable for me in designing the bot.
2 REPLIES 2

John__Carter
Staff
Staff
Theoretically a collection has unlimited size but practically you should avoid 'mega collections' - doing so can drain PC resources and bandwidth. BP isn't a DB or MI tool and you should not imagine it has infinite capacity for data handling, you have to be realistic. A 3rd option to consider is to read the Excel file via an OLEDB query that targets only the 75K row/columns.. Depending on the amount of data in each row, even 75K could be too big and you may have to experiment with reading in chunks. Likewise pushing 75K into the DB in one go might stretch things too far. Once you have the records in a queue the multi-bot design is easy - the internal queue logic is such that a queue item cannot be locked by more than one instance. So to improve your overall completion time all you need to do is run more instances.

NarayanShrawagi
Level 6
@vsnalam    One approach I can think of  Design two processes.  One process will load the data in the queue and another process will work on the data loaded . Break down the data into a smaller chunks like 10K records in one sheet of excel.Loader process will take each sheet once and load the data in the queue and stops. You can also Implement environment lock as no two instance of the process is allowed to load same sheet or process will lock the sheet/excel  and once it is loaded, it will be archieved/deleted to another location.  
Narayan