Sunday 14 August 2011

On Monitoring

Tonight it occurred to me that monitoring is more important than we probably give it credit for.
The way I see it, it should server 3 purposes:

1) Alerting - this is obvious but sometimes ineffective. As an example, that email that comes in at 3am, to tell you that a critical job has failed, is only going to be read when you wake up in the morning. So not only does it have to be accurate and timious, but it also needs to be appropriately delivered.

2) Provide situational information. So you get the dreaded phone call from the night shift that something is broken. You dial in and open your monitoring tools and you should be immediately be greeted with both what went wrong / is going wrong, and what is causing it. I guess this is a little eutopian to believe that both cause and effect would be shown that easily, but the goal is that at least there is enough info to diagnose the problem directly from the metrics the monitoring is displaying to you.

3) Trending. I guess most people would classify this under a different category to monitoring, but I believe that trending and performance prediction is an integral aspect of any monitoring solution. All the data you need for your capacity planning should be captured by your monitoring solution, and is probably already stored in it's database right now - the question is, can you access it easily?

I have a fetish for monitoring, and believe that none of the 3rd party products I have seen adequately  address all facets of SQL monitoring. Most do a reasonable job, and you could get by with almost any of them, but what are you inadvertently missing? To combat this, I prefer to run multiple solutions simultaneously on our high value systems. Yes, this means greater overhead on these servers, but the combination of overlapping redundancy and the different perspectives given buy different monitoring solutions can be extremely enlightening. On course you can be hardcore and roll your own, but I'm too busy doing interesting things to be writing a million scripts that probably only cover 50% of the things that I need to know.

No comments:

Post a Comment