cancel
Showing results for 
Search instead for 
Did you mean: 

Process is taking upwards of 37 hours to run - Can I make this more efficient?

DavidWood
Level 3
Hello all!

I have a process that I created whereby I collect 2 sets of data from a sybase dataserver into two collections (Dataset 1, dataset 2)

I then compare those two sets of data to highlight common values in both, Unique values in dataset 1, and unique values in dataset 2.

The issue is that this is taking so long that it's almost not worth automating. I was wondering if anyone had any advice on how to streamline this to make it quicker. I've attached some screenshots of my process to highlight what it's doing.

It seems that it is looping through every item in dataset 1 against the current row of dataset 2. Obviously if I have greater than 7000 items in both collections this is going to take a hell of a long time.

Any advice is greatly appreciated!! (Even if it's not possible, at least I can tell my boss that :P)

26557.png


------------------------------
David Wood
------------------------------
5 REPLIES 5

NrupalJoshi
Level 5
I am no expert but would it be possible to write the collection to excel and use vb code to separate the data as you have mentioned above? I think that would be much quicker?

------------------------------
Hetal Rathore
------------------------------
Rup Joshi

Hi David,

Here are a coupe of ideas:

1) Loop through collection 1, and use a calc stage to set the current row as a filter text item. Then use the Filter Collection action in the Utility – Collection Manipulation VBO. Filter collection 2 based on the filter from collection 1. Some helpful guidance on filter syntax is here:
https://portal.blueprism.com/customer-support/support-center#/path/Automation-Design/Studio/Visual-Business-Objects/1194312962/What-is-the-syntax-for-an-expression-used-by-the-Filter-Collection-action...

You would only be looping through one collection instead of two. It's an improvement, but still probably not ideal for larger datasets.

2) Perform SQL-style joins. Here is a Utility Collection Booster Asset on the Digital Exchange:
https://digitalexchange.blueprism.com/dx/entry/78038/solution/spgmi--utility-collection-booster
Use Inner Join to get the common values, and use two Outer Joins to get the values present in one collection but not the other (switching out the left and right collections).
I have not had the opportunity to test this solution, so I would be interested to hear how it performs.

------------------------------
Patrick Aucoin
Senior Product Consultant
Blue Prism
------------------------------

John__Carter
Staff
Staff
Hi David - if the datasets are the result of SQL queries, could you create a query that did the 'common portfolios' join for you and gave the 3rd data set as the result? That would be more  'right tool for the right job' - although BP can do this sort of matching, munching through large data sets can be painful. That said, 37 hours sounds odd to me:
  • Do you have full logging on? It's recommended to disable logging in areas where a process is just number crunching and recording logs of that activity serves no purpose.
  • Is the machine properly spec'ed, or is it struggling for memory/cpu/bandwidth/whatever?
  • Did you run the process from control room? Running via the diagram is considerably slower. Sorry if these are dumb questions.


------------------------------
John Carter
Professional Services
Blue Prism
------------------------------

david.l.morris
Level 14
Just want to echo what John mentioned. In my opinion, your first course of action should be to turn off the logging for all the stages that are inside the loop. I personally set all the stage logging inside loops to be Errors Only. I'll often change the color of the text of those stages to something abnormal and leave a note on the page as well so that other developers know to keep the logging to Errors Only for those stages.

Just turning off the stage logging typically causes processes I've developed to be 10x-50x faster. It sounds crazy, but try it and you'll likely find similar results.

------------------------------
Dave Morris
Cano Ai
Atlanta, GA
------------------------------

Dave Morris, 3Ci at Southern Company

Hi,

to draw data from database's tables and go trough all of it in a loops sounds a bit strange form me - I might be wrong but the database systems are built and optimized for such data manipulations.

So maybe just write correct SQL statements and get already prepared data into your process?

BR,



------------------------------
Mindaugas Breskus
Software engineer
Swedbank
Europe/Vilnius
------------------------------