cancel
Showing results for 
Search instead for 
Did you mean: 

Error Messages in a log are unacceptable. Tell me why I am wrong.

ChadDokmanovich
Level 3

We've been doing RPA for a couple of years now and we have a decent number of automated processes. The one thing that I seem to notice across all of the production runs is that there are tons of error messages. As I see it, unless the error is due to the true unexpected unavailability of some dependent application (something that is or should be very rare), error messages indicate a problem that needs to likely needs to be fixed by changing Blue Prism code. I don't care whether it is extending the wait time before or after performing a step, waiting/checking for an item's existence before attempting to interact with it or just an unexpected application scenario such as a popup that was never seen before, all of these things require a code change.

 

Sometimes coders try things in a loop X number of times before they give up. Again leaving out the possibility of a system being offline unexpectedly, is there ever a valid reason to do this instead of adjusting the wait times where you check for the existence of an element? If re-trying X number of times is sometimes genuinely necessary and I doubt that this is the case with our generally light processes that consume little resources on the BOT, I would think that a well written process should only log an error if the final attempt failed.

Instead, I see errors everywhere and it seems, at least for RPA, to be accepted as the norm.

 
Does RPA really need to be like this? Do I need to realign my expectations or is this achievable with good code that's been reworked until it is bullet proof. 

instead of code that is being reworked and improved, I just see a growing base of BOTs making our Prod environment come to a crawl while the errors continue to grow. The performance of the Control Room is so bad, no one can even see them except me, since I query the db directly-the only way to be able to see logs without the Blue Prism going belly up when you access the Control Room.

That said, the Admin team is slowly trying to identify and reduce excessive logging but they still haven't address an large and fast growing log table, well over 100 M records. it's unfortunate that BP runs into memory issues if you try to archive more than 2 handfuls or error messages at a time. I think we should delete them right in the database and be done with it. 

I think we are growing too fast and on a very shaky foundation, both the tool and probably, our code, too, based largely on all of the errors that I see from the product and our processes. 

Your thoughts?









------------------------------
Chad Dokmanovich
IT Architect
Fin IT
America/New_York
------------------------------
5 REPLIES 5

PritamPoojari
Level 6
Hi

Developers add lots of logs to help debug issues, these logs should be reduced as you move from UAT to Production. As you have realized it is causing the database to grow quite rapidly.
On top of that you need to run BluePrism housekeeping jobs to get rid/archive old logs periodically. 

Regarding retry loops vs waits, I am not a BP developer, but I think it is standard practice to attempt action before giving up. Wait times can vary between environments and depending on systems responsiveness etc. Someone will BP dev can probably correct me 😉

Regards




------------------------------
Pritam Poojari
Solution Architect
Capgemini
Europe/London
------------------------------

Anonymous
Not applicable
Chad, you raise a number of excellent points.

As Pritam mentioned, one of the ways to combat the excessive logs is by adjusting the stages that use logging as a deployment is moved through the pipeline. In practice, however, I've certainly seen our team struggle to determine what "enough" logging is to ensure our operations team can quickly identify and resolve the cause of a new error in an operational production process.

There is also some truth, in my humble opinion, that RPA does create unusually high error messages by design. Some of this is part of the value proposition; for example, to enable partial automation on a process where a small percentage of use cases are workflowed to human experts by design. Some of this, however, seems to be part of the design of the platform. You may find it interesting that, where we've had RPA developers with a coding background, their early automated processes tended to perform very poorly. They, as good coders, attempted to handle all possible errors and it counter-intuitively resulted in processes that were unstable. I don't know if this means you need to abandon your quest for good code, but I would certainly be interested in what you learn on your journey.

You mention some of the "try x times" loops in particular as being puzzling. It's a bit outside my core realm, but my understanding is that my team generally chooses this approach where system performance is highly variable on the underlying application. Instead of increasing the time to completion of every task to allow for an extreme wait time, I've seen solutions using a "try x times" design and more variable wait stages so that under normal conditions the total processing time remains low but the automation  also remains stable when system performance of the underlying application declines.

I'm concerned with what you say about your control room, though. If you haven't been able to archive your production database, Blue Prism Support has some SQL trim scripts that will certainly help manage this. We're in the process of designing a full archiving solution that doesn't rely on the in product features as, to your point, the in product solution isn't reliably handling archiving the way we'd like to see. We're running 60 machines and > 190 processes in our production environment, but we aren't encountering the same degree of issues in control room as you are. If you're running a bigger shop, I'd love to know at what point you started to encounter issues.

Open to chatting further if you'd find it useful to delve into specifics, but very interested to hear what others have to say on this topic.

------------------------------
Heather Ruhl
Solution Architect
ATB Financial
America/Edmonton
------------------------------

Hi,
You can change the log level while running it in QA or in Prod, like log errors only or no logs etc.
If you need to analyze then you can enable the log from control room.

Regards
Sutirtha Gupta


------Original Message------

Hi

Developers add lots of logs to help debug issues, these logs should be reduced as you move from UAT to Production. As you have realized it is causing the database to grow quite rapidly.
On top of that you need to run BluePrism housekeeping jobs to get rid/archive old logs periodically. 

Regarding retry loops vs waits, I am not a BP developer, but I think it is standard practice to attempt action before giving up. Wait times can vary between environments and depending on systems responsiveness etc. Someone will BP dev can probably correct me 😉

Regards




------------------------------
Pritam Poojari
Solution Architect
Capgemini
Europe/London
------------------------------

​We have about 100 BOTs in Prod and our system is on its knees. it takes 7 minutes for the Control Room screen to begin to display, Admin users have direct connections to the database and their performance is only a little better. Obviously, direct connections are not ideal and most users have to connect in 3 tier mode. Errors abound and productivity is on the floor.

How many BOTs do you have?

------------------------------
Chad Dokmanovich
IT Architect
Fin IT
America/New_York
------------------------------

Anonymous
Not applicable
We run 60 bots, but generally our number of concurrent sessions is closer to 10-24. Interactive clients/number of concurrent users is higher. We've definitely seen a progressive slowdown that seems to be accelerating over time, but not to the extent you're facing. We're also seeing more issues with v6.6 than our current version (6.2.1). It sounds like we're seeing the early signs of the coming storm that you're in. 

Support directed me to this knowledge base: http://portal.blueprism.com/customer-support/support-center#/path/1332229212  They also suggested the number of unique tags we have could be part of the issue, and we're looking at addressing this.

If you're seeing that degree of performance issues, I can appreciate why you're looking at sharply reducing logging. We're experimenting with a few solutions to reduce the size of the application database in use. I'm hoping we won't have to resort to a multi-team approach, but I'm not ruling it out either.

If you like, I'll update you on what we learn as we work on resolving this.

------------------------------
Heather Ruhl
Solution Architect
ATB Financial
America/Edmonton
------------------------------