Hi
@Tejaskumar_DarjiInteresting question and a good opportunity for discussion
🙂We have similar problems in our company. Not only with occasional connection losses but also with performance problems like degradation or processing problems.
The solution for this was creating external alerts to check every 5 minutes the status of the platform. In this way, Nagios was really good, case you can implement alerts and make actions directly in the server depending on the result.
The more complex task here is to design the alert. We have two types
- Robot is alive? -> Checking directly the status of the Runtime Resource. If is disconected we wake up the Resources PC again
- Any process issue? -> For this alert we impliment an external log system. In a database we put the result of the items. If we have high volumen of errors in the last hour, we send an alert to the maintenance team for manual check in.
Hope this helps you!
See you in the community, bye
🙂
------------------------------
Pablo Sarabia
Architect
Altamira Assets Management
Madrid
------------------------------