How to mitigate Load Balancer Fail-over risk

ashish.kushwaha_2185 · ‎30-01-19

Hi All, We are planning to setup disaster recovery for the environment by placing Load Balancer between pool of App server and resource machine. Load balancer will remove the dependency on single app server and runtime resource and ensure greater availability. What are the chances of Load balancer failure? Do we need to have multiple load balancer in place? If yes, are they available for handling request without doing any manual intervention. Thanks & Regards, Ashish Kushwaha RPA Solution Architect

pranavred · ‎06-02-19

I'm not an expert on the architecture, but I think a single load balancer would be enough since it's already a fail-over plan. That's how we set it up. Also, on a side note, what software did you use for load balancing? We initially didn't use any software, but used DNS round robin to alternate requests to app servers, but it fails everytime a scheduler is run.

Pranav

AmiBarrett · ‎06-02-19

Anything has a chance of failure, and the more complex you make a diagram, the more potential points of failure you're going to introduce to it. Stability depends on your type of solution. I would agree Pranavred's Round Robin installation, because (at least in my opinion) a DNS is more reliable than a piece of software running on one box. If that box goes down, both application servers are up the creek (as it were). Likewise, as they said, a single load-balancing solution should be all that is needed per environment. I've also seen scheduling issues using load-balancing solutions, as Pranavred reports. We've run into problems where the runtime resource was connected to the wrong controller. But, since the connection was already previously established and the IP is resolved, the runtime resource doesn't see an issue. By far, the most reliable solution I've seen involves a single application server, with the resources connected directly to the DB (or DB cluster as it may be). One other thing to be weary of with this approach, is something we've identified recently in our 5x environment. It seems that the servers are constantly polling the DB for status requests of the VMs, to the point the throughput has become flooded and is preventing 'Get Next Item' stages from functioning properly (This is with 48 VMs across two controllers, split 38 and 10 respectively). So keep an eye on your DB traffic as you raise more application servers. Likewise, be weary of how many development machines you hook up to any one environment. We got around that (among other things) by writing our own controller software, but I suppose that's neither here nor there for the sake of the original thread topic.

SS&C Blue Prism Community

How to mitigate Load Balancer Fail-over risk