September 19, 2024

The Art of Maintenance, Part 2: Platform Maintenance

Read the second installment of our 3 part series on the art of maintenance.

Community Blogs Banner Master Template (21).jpg

Alizee
Staff
Staff

Welcome back to our three-part series on “The Art of Maintenance”! 🎨 In this installment, we’re delving into the subject of platform maintenance — the foundational work that keeps your infrastructure running smoothly.

We’ll explore:

  • How proactive maintenance is key to avoiding disruptions
  • Why collaboration with key stakeholders is crucial
  • The best practices you can implement to ensure your platform remains stable and resilient.

Let’s get started!

 

What Is Platform Maintenance?

🚀Platform maintenance is exactly what it says on the tin: The maintenance of your platform and its infrastructure, which is mostly applicable to our Enterprise customers (otherwise included in the SS&C | Blue PrismÒ Cloud fees).

It’s quite literally the foundations that will keep your automation program standing.
Let’s have a look at what it entails! 🚀

 

Proactive Maintenance

Effective platform maintenance is proactive (Our fellows in manufacturing would probably go further and say it needs to be predictive). It’s about ensuring that the frequency and severity of issues are minimal or non-existent.

Platform maintenance can’t quite happen in isolation within the Centre of Excellence (CoE). It requires engagement with several stakeholders around your business and building tight-knit relationships with them to be in sync with your organization’s bigger picture.

Having a seat at the CAB and other tables

This really starts with ensuring you have representatives from IT on your automation Change Advisory Board (CAB), and that you’re invited to conversations taking place in other forums that may discuss technology changes and business changes. These are a gold mine of information.

Having visibility over these changes will help you inform your automation roadmap, your maintenance roadmap and enable a much better mutual planning process to ensure you maintain the status quo around your platform to keep it operational and resilient before issues materialize.

Articulating the platform architecture

Having a clear understanding of the platform architecture is extremely important so you can have the conversations required to keep your platform operational. You need to be able to articulate:

  • The components that comprise the automation platform
  • The role of each of these components
  • The efforts required to maintain each component in an operational state

We all know how maintenance is usually perceived in organizations, and we recommend using the maintenance value proposition we introduced in the first blog of this series to frame your ask and explain the cost of doing versus the cost of not doing anything, rather than simply stating that maintenance is there to avoid a standstill.

Enabling your maintenance stakeholders

The value proposition itself won’t get you all the way to a stable platform, and you’ll also need some stakeholder enablement to ensure everyone has a clear view of the components that comprise your platform and the context of why it's important.

Example scenario:

If you went to somebody in your infrastructure team and talked about a change you needed to make, do they understand:

  • The context of that as a service in your organization?
  • The potential impact of not doing that in terms of how it's going to affect your external customers and the capabilities that you're trying to deliver?

--

That’s the foundation you need to achieve to get people on the same page.

Implementing change controls

Once you’ve got a seat at the table, what’s the state of change control for your technology estate?

Example scenario:

If a DBA wanted access to your production database for activity X, do they understand:

  • That you need to be consulted as a stakeholder?
  • A conversion needs to happen around what they want to do, and why and when might be an appropriate time to do it?
  • If there’s a documented configuration for that database that should be adhered to?
  • If you’ve updated the documentation, what’s going to change and why?

This is to ensure any changes are mutually understood by all parties and scheduled to be performed safely and at an optimal time.

--

These are the basics that need to be put in place (and documented) to ensure you don’t suffer random changes to different components of your platform – and the issues that will materialize off the back of such changes.

All changes should be discussed and documented in a controlled way that minimizes risk.

If, despite all reasonable measures being in place, the risk materializes into an issue, we have a change control process to go back to in the aftermath of the change. We can understand where the issue stemmed from and isolate it and its root cause. We can respond to it much more quickly and informed than we otherwise could have.

 

Typical Maintenance Activities and How To Go About Them

SS&C Blue Prism software version and platform upgrades

The good old platform upgrade.

Anecdotally, it’s not uncommon for customers to consider upgrades (for genuine reasons) when we give an end-of-life notification – since it’s the “last chance”.

While this is a frequent scenario, it’s not good practice. Referencing our “maintenance mindset” principles, it’s not because it’s working that will continue to work (the Manifest v3 deprecation announcement was a perfect example of that).

If you’re not on the latest version of our software, you’re in essence paying some sort of technical debt – an opportunity cost because you're not getting the value that you could get from the latest bug fixtures, or new features or architecture options, which all present opportunities:

  • Broadening the scope of addressable use cases and your pipeline.
  • Offering new architecture options (e.g., high availability, ASCR, etc.)
  • Eliminating frustrations and workarounds linked to bugs.
  • Enhancing efficiency and productivity.

Sometimes the need for an upgrade appears for other unexpected reasons, such as:

  • A related technology stack announces an end-of-life, such as that of the Manifest v3 framework used by Chromium extensions.
  • A vulnerability may be detected and drive the need for a patch (which will be an upgrade).

This is why we ramble about taking a proactive approach to maintenance and having a 360° view of why it's a good and important thing to do. You continuously plan for your next upgrade; you know what next version you’ll deploy and why. And the closest you are to the latest version, the easier it gets to react and respond to a scenario.

So, let’s all turn that into a proactive exercise rather than waiting for an end-of-life notification and working out what we're going to do from there 😊.

Operating systems

Operating system patching or upgrades are another common challenge when it comes to platform maintenance but that can be easily addressed with the principles we laid out in the first section of this blog.

Here are some questions to ask to assess your readiness in that area:

  • Have you got a clear view on your OS version X to version Y?
  • Is there a structured plan around it?
  • Is that factored into your platform maintenance roadmap?
  • Is the patching team aware of peak processing times in the realm of automation?
  • Have you kept abreast of known issues for your version of the platform?

As we all know, Microsoft tends to operate a regular patching system. Your organization will likely have a patching cadence to deploy those patches to servers and virtual machines.

It goes without saying that if not planned correctly, this would have an impact on your digital workers; the last thing you’d want would be for a subset of your digital workers to go offline at a peak processing time on your platform.

There are plenty of options that can work to minimize the risk and impact around patching, including:

  • Updating different pools of digital workers at different times.
  • Standardizing component images to enhance rollout efficiency and maintenance overhead (with the added benefit of simplifying and accelerating troubleshooting whenever something goes wrong with a digital worker).

Databases

The SS&C Blue Prism database is certainly the most talked about topic in the realm of maintenance. It’s both a recurring topic and almost a rite of passage among our customers.

Odds are, at some point, you’ve done something horrendous to your database and caused a production outage.

If your database is performing well, your platform will generally be too. But that only works if you’re thorough with your archiving processes.

By default, work queue items store data in the database. But configured correctly, they also earn you a little more flexibility. The platform offers the ability to enable or disable logs, or to log errors only which you can determine on any given action stage. And when you process a lot of transactions, your database can start filling up quickly. Whichever your need when it comes to logging, SS&C Blue Prism can accommodate it. But depending on those needs, you’ll also want to make sure you have:

  • Sound archiving practices since you can’t keep log data forever in your production database.
  • Regular maintenance, using scripts to clean the database (which we can provide where needed 😊 – just ask your customer success advisor).

To this point, we also recommend checking our practical guide to processing personal data with RPA for tips and tricks on logging, specifically for privacy by design.

 

Reactive Maintenance Is NOT the key

All of the above should show that you really can’t afford to perform reactively, as it would be synonymous with outages, issues, bad press and more frustrations stemming from all parts of your organization.

At no point do you want to be diagnosing high-severity incidents that could have been avoided had you done the proactive work upfront.

So, we’ll say it once more: Be proactive about platform maintenance.

In our next blog, we’ll dive into process maintenance. 🏊‍