KPIs and SLAs – More important than outcomes?

Some time ago I was working in a vSphere role and I got an escalation through about an unexepcted reboot of a host. Sure, we all encounter a PSOD if we’re unlucky. They’re certainly not a normal occurace, and it’s a stop screen where the default config doesn’t result in a reboot. I started looking at the logs, vpxd.log, hostd.log, vpa.log, and so on. There was no indication of any failure. The logs just stopped, then re-started when the host was booting up.

That’s neither normal nor expected. No problem though, keep looking and something will show up. I had a look at the host SEL to see if anything showed up. The host had been rebooted by a user in a different team. Problem solved.

