cancel
Showing results for 
Search instead for 
Did you mean: 

Memory issues when dealing with large collections - maximum size?

john.hammond
Level 6
Good Afternoon all.

We have a process that pull JSON down from an API and converts the response back into a file. However, over the past few days, we've noticed a number of slightly larger files (the first one was an email with various attachments that totaled around 22mb, the second case has been a large png file of 12.2mb). Both of these, during the process of extracting the raw JSON from the payload, has caused the process to hang. Attempts to view the log have also returned the error 'An error occurred while retrieving the logs: Execution Timeout Expired. The timeout period elapsed prior to completion of the operation or the server is not responding. ' (although this may be unrelated - the process itself runs for about 17 hours a day, so the log file can get rather large).

As an aside, on the machine that the process has run on during these hangs, the listener on this machine has crashed (hovering over it with the mouse on the taskbar has caused it to disappear). This has obviously resulted in a process termination.

So, I have a couple of questions. Firstly, is there a maximum filesize that a data item/collection can hold within Blue Prism? It should be noted that we're running BP 6.6, which I believe is 32-bit, but that shouldn't result in any particular issues in accessing memory. Secondly, are there any mitigation suggestions for this issue? My first thought is a separate API call that pulls down an ID for the document and a filesize, with a decision in place to not download files greater than x mb. Any other ideas or suggestions welcome.

------------------------------
John Hammond
------------------------------
4 REPLIES 4

steven.boggs
Staff
Staff
Hi John,

I will let others add their feedback on potential mitigation suggestions, API workarounds, etc., and other best-practices to prevent OOM events and SQL timeouts when working with large collections, but wanted to ensure you have access to our best-practice guidance in our Knowledge Base article, "How do I avoid Out Of Memory issues?", specifically the section on (and sub-sections under) 'Process or Object design considerations'. The article, "How do I fix a SQL Server 'Timeout expired' error?" contains some steps you could implement for this scenario as well.

To address your question directly, there is not a specific maximum filesize for data items/collections that can cause these types of scenarios, as there are numerous factors (environmental, collection type, sizes of Processes/Objects, etc.) that can allow them to occur. Our general guidance for this is to break up the data into more manageable "chunks" if possible, implement regular garbage collection and wait stages in your Process design, and ensure the database is healthy enough to process large sets of data by following the guidance in our Maintaining a Blue Prism Database Server documentation.

------------------------------
Steve Boggs
Senior Software Support Engineer
Blue Prism
Austin, TX
------------------------------

Hello,

If possible you can add some filters directly in your API call to limit the data returned.

------------------------------
Thanks & Regards,
Tejaskumar Darji
Sr. RPA Consultant-Automation Developer
------------------------------

Thank you for your response Steve.

On further exploration, this seems to be a more complicated issue than I perhaps first thought. So our process is hitting an error state whilst extracting a payload from an API call. I don't necessarily think that the way the process is set up is the most memory-conserving approach, but there may be something else happening that I don't fully understand. So, say that a API call is receiving a 10mb string of base 64 data. Our process essentially triplicates this for further work down the line. However, the process is entering a warning state at this particular step. I'm not entirely convinced that this is a bad thing, however what is happening simultaneously is that the listener on the worker machine is terminating during this process. As a result, the Live server can no longer communicate with the worker machine. 

So I think my question really is, what is the listener doing whilst Blue Prism itself is dealing with data in memory? Why might it be terminating? The process itself continues running, in that it remains in a warning state, but the connection to the resource PC is lost. In essence, I think the process is (or has completed the task and...) is waiting for further instructions. That said, I'm not sure about the inner workings of Blue Prism - is the xml of the process communicated on an action-by-action basis, or is a 'block' of actions communicated in chunks at a time?

------------------------------
John Hammond
------------------------------

Hi John,

Thank you for the additional details; I agree doesn't seem to be a straightforward scenario. To directly address your question, its likely that the listener is bound by its default timeout and Blue Prism may be trying to process the data all at once - exceeding this timeout value - thus putting the Process in a warning state. You may be able to get around this by experimenting with changing (or disabling warnings in) the Default Session Warning Time area of System > Settings, or overriding the system-wide default warning at the individual stage level(s) to see if this helps allow more time for the data to be processed. These can be adjusted in the settings discussed in this Knowledge Base article here. As a Warning status is not based on values in the database (like other Process statuses), rather calculated in the code itself based on the values in the BPASession table (if the lastupdated time + warningthreshold time is earlier than the current time then the session is in a 'Warning' state), this may turn out to be a viable workaround.

If this suggestion doesn't work and/or you require more input from our Product team about how Blue Prism may be behaving in this particular scenario, we would suggest to open a ticket with us in Support so we could investigate this further than may be possible in a Community thread. If you would be able to attach the Process and the sample data that's causing this behavior to the ticket so we could attempt to reproduce it, we may be able to determine if there's another workaround we could suggest or at least rule out if there's a potential product defect causing this by getting some feedback from our Product team on what's happening under the hood in Blue Prism during this scenario. Alternately if this would come down to a Process review, we could help connect you with our Professional Services team to get their best-practice design tips to work with this large data set according to your requirements.

------------------------------
Steve Boggs
Senior Software Support Engineer
Blue Prism
Austin, TX
------------------------------