System Monitoring? Work or Art or The Devils Playground?

I have been witness to an IT service metamorphosis from the reactive firefighting to the modern proactive evolution ensuring our good customers productivity is maximised. In 2019 it is indeed a very good time to be an end user; however, the implementation of proactive solutions can be to the detriment of your IT teams and ultimately the sustainability of a high quality service. I am in no way implying that proactive solutions are a negative thing, the men in white coats would surely have me off to a room of comfy walls leaving my career in tatters along the way if I was.

The challenge with proactive solutions is having a robust and exquisitely tuned monitoring system, for the larger organisations this stretches beyond having strong operational resources to correctly respond to alerts, it requires incredibly strong collaboration across business, technical roles, delivery managers and application specialists to name a few. Without all key stake holders understanding their responsibilities for ensuring their monitoring solutions are precisely tuned to their particular service\servers\applications you are risking drowning even the most robust of operations teams in a sea of critical alerts which invariably leads to loss of control and system failure.

A good implementation for monitoring solutions should go through an extensive implementation phase for each application\service. First and foremost, key stake holders commonly drift from services once they are in production focusing on their next project, future update or anything other than looking back, it is imperative that monitoring design and implementation for their service is included in the project deliverables, it is also essential that “good monitoring implementations” are extremely well defined to ensure all parties understand their responsibilities.

From my experience strong monitoring solutions are far more dependent on the right investment of time in the beginning; for every hour invested pre-production it can save hundreds of hour’s post-production. The following are purely suggestions from my own experience for ensuring your organisation does not get taken in by the “snake oil salesmen” suggestions of system tranquillity.

Let’s assume we have just installed some bare bones monitoring solutions and are ready to start integrating them in to your services, it is imperative that agents do not start getting deployed otherwise that alert tidal wave will soon arrive.


1: Identify Business Critical Services

While we ultimately want everything monitored, it is essential monitoring deployments are controlled and it is most logical to focus efforts in the areas which are most critical to the organisation. Once identified the key stake holders need to be actively involved and made clear that investing their time now will have significant cost\time savings later.

2: Project Team Alignment

Monitoring design is a major part of service deliveries, this is not something that the business should allow to be bolted on as an afterthought to projects, it should be a key discussion during project planning and given ample attention through every project phase and the entire service lifecycle.

3: Operational\Support Team Sign Off

Involve your key operations teams please and ensure they get to sign off on accepting the monitoring solution for the specific service, that very nice chap that saves your bacon in operations understands their responsibilities and have carefully organised their team based on the workload, unexpected noisy “non-issue” services are detrimental to their team and ultimately your service will suffer due to “the boy who cried wolf” scenario. Involving members of support teams are essential; they have an insight through day to day experience that will prove valuable to great monitoring implementations.

4: Change Management

Change Management should adapt their process to ensure monitoring solutions are considered during any changes, while the priority may be the successful implementation, ensuring adequate planning and considering for the Monitoring Solutions will contribute to a more successful change. Situations when one technical resource is trying to bring a service down for maintenance while another is desperately trying to bring things back up due to service alerts are best avoided by some careful planning.

5: TUNE till it purrs…

Monitoring implementation is not a case of install agent, setup application specifics and walk away, it is an ongoing process that should follow some specific steps

Test Environment – All service monitoring should start in a test environment, this is where you tune and configure the alerts specifically for your own implementation of a particular solution. Many monitoring applications include applications specific settings\add-ons these are “cookie cutter” and are generally verging on crazed paranoia, the first week to ten days a great deal of focus should be tuning the alerts and getting rid of the ones that you simply do not need to see.

Alert Handling – Service Delivery managers I am looking at you here, you own the service it is your responsibility to give input on alerts relating to how your services are managed. Provide Work Instructions where necessary.

Production Analysis – Operational support teams have a large responsibility in managing system alerts, they should be keeping an eye open for repeat offenders and those dreaded none issues. It is essential that operational teams implement key strategies for managing monitoring alerts and maintain a relationship with service delivery managers\technical resources for the specific services and applications.

Monitoring Integration – Everything can be monitored and commonly it is, there is monitors monitoring the monitors it must feel a little Orwellian for those poor services who can barely have a hiccup without demands for “digital papers please”. I could write an entire article on the intricacies of integrating different monitoring solutions together, it is imperative that these are in tune with each other to provide a beautiful harmony in their alerting relationship. You know you have it perfect when performance monitoring is identifying issues with a disk subsystem linked to an alert regarding a failed disk in a raid array before the helpdesk receives a single call. Great monitoring integration can eliminate significant time lost to troubleshooting.

Virtual Monitoring Teams – For larger organisations or those with large amounts of applications and services it is highly advantageous to have a specific monitoring teams, this can be a virtual team comprised of key resources, initially during implementation there will be significant co-ordination and time demands, having a co-ordinated dedicated team will greatly improve successful implementations.


Reap the rewards

System Monitoring is so much more than a test of service availability; it can be both a source of information and a tool to aid in resolving issues. Using monitoring solutions to resolve singular issues is one thing. But being able to utilise the alert history of a service to understand the pattern of events, the timings and the related elements that contributed to a specific issue is an incredibly power tool to have in your technical article. Used effectively you prevent untold numbers of future service outages and unchain yourself from further reactionary firefighting.


Great system monitoring solutions is something we are incredibly passionate about at Plan2IT.  For any business that is plagued with fighting IT issues and overburdeoned IT teams it is exceedingly costly and demoralizing for management, support and end users.   Contact us via our website Plan2IT or email us at enquiries@plan2it.co.uk.