One of my new projects is to get my head round VMware Operations Manager – or VCOPS as its more commonly known. I’d freely admit that performance is one of my weak areas. I’m pretty good at troubleshooting and resolving any number of configuration problems, but resolving performance problems isn’t one of my strengths. Why?
Well, I’ve spent a great many years living in the abstract world of the lab whether that be as trainer, author or now at VMware. In that time I didn’t a get a whole lot of exposure to genuine performance issues – after all a lab environment never experiences the same non-linear performance issues that you see in the real world. What did happen a lot of the times was that students would bring me their performance problems. And I would from principles try to diagnose them. By first principles – I mean things such as:
- Over-use of SMP vCPU
- Disk intensive VMs placed on the same LUN/Volume/Spindles
- The wrong RAID level used
- Insufficient RAM allocated
- Miss-use of various features in vSphere such as poor resource pool design, inappropriate shares and so on
Despite my lack of exposure, one of my favourites parts of the Install & Configure/Fast-Track course were things like Limits/Reservations/Resource Pools/DRS/ESXTOP and so on. Mainly, because these are really juicy topics that can be tricky to explain. I enjoyed the challenge.
So now I’m looking at VCOPS I’m looking at this subject all over again, whilst bearing mind that vCOPS isn’t merely or just a performance monitoring solution. It actively, or rather pro-actively goes looking for “health problems”. There’s two analogies for this. See yourself as Dr VI Admin MD at vSphere Hospital for the Virtual Machine if you like. What VCOPS is giving the pre-emptive, pro-active diagnostic tools to analyse your patients (the VMs), and deal with their minor symptoms before they are really unwell. Another analogy I like is the “dashboard” in your car. Not only does tell you speeds and feeds – there’s also a little red light that gets illuminated when your running low on oil. It’s better to receive pro-active, pre-emptive alerts then be at the side of the road with seized engine.
So anyone thing I’ve been looking at is different tools for generating a fake workload inside a VM – for the four core resources of CPU, Memory, Disk and Network. I thought if I put together a compendium of tools in a single web-page, it might help someone looking to do the same thing.
A couple of really strong observations have come out from my early use of VCOPS. Firstly, vCenter and other 3rd party performance monitoring tools tend to just contain “thresholds” offer a simple “traffic light” view of performance. You know the same tedious green, yellow, red system where alarms are triggered at 75%, 90% and so on. That’s all well and good. The trouble is these 3rd party tools in the main aren’t really showing deviations. So if an application grows in resource usage (say CPU) over a 6hr period from 10%, 20%, 30%, 40%, 50% and 70% – most of them won’t tell you anything. Until the VM has smashed through one of their pre-configured “thresholds” such 75%. By then I think you could argue that problem has got out of hand. Wouldn’t it be far better to alert the administrator to underlying, under-the-surface, iceberg of a problem – rather than waiting for the tip of the iceberg to appear just above the water? I see this very much like a Doctor looking for diagnostic information about the health of patient. When the Doctor monitors the patient they look for tools that can show that a change is taking place, something other than normal. The other thing I’ve liked about VCOPS is how its identified a number of problems in the build of my vSphere homelab. Sometimes those problem have been my own making, other times VCOPS has identified problems in vSphere that has lead to know some known issue in a KB. That for me shows two benefits. VCOPS isn’t just about performance, its about configuration – or should I say “Operations” (the clue is in the name of the product after all!), secondly I can absolutely trust what it telling me – it doesn’t try to pretend that everything is right in the world when it isn’t.
Anyway – less of my ramblings – to the tools overview…. which I’ve catagorized as CPU/Memory, Disk/Network IO, and Application Tools.
Read the rest of this entry »