KPIs and SLAs – More important than outcomes?

Some time ago I was working in a vSphere role and I got an escalation through about an unexepcted reboot of a host. Sure, we all encounter a PSOD if we’re unlucky. They’re certainly not a normal occurace, and it’s a stop screen where the default config doesn’t result in a reboot. I started looking at the logs, vpxd.log, hostd.log, vpa.log, and so on. There was no indication of any failure. The logs just stopped, then re-started when the host was booting up.

That’s neither normal nor expected. No problem though, keep looking and something will show up. I had a look at the host SEL to see if anything showed up. The host had been rebooted by a user in a different team. Problem solved.

Continue reading KPIs and SLAs – More important than outcomes?

Why are we slowly heading back to inaccessibility?

When the pandemic hit businesses, understandably, panicked. Suddenly the majority of the workforce was remote, and that was a new thing to a lot of companies. How do you manage staff when they’re not sat at a desk you own? How do you know people are working if you can’t see them?

Zoom, WebEx, Teams enter stage left. For me, Teams has been a revolution – You can have a text chat with someone, turn it in to a phone call, a video call, a screen share with no real effort. Wonderful. Collaboration should be this easy, and in 2020 it was.

Industry events that used to be in-person only were suddenly, and hastily, thrown online. Yes, thrown. It’s different to put on an event that is online than one that is held in-person, and a lot of the first events I attended didn’t have time to re-assess how to deliver their content. Those events were painful to be honest, but it was new so nobody is really at fault.

There were a lot of clunky handovers from speaker to speaker, all waiting for the another to give the signal they had finished speaking / it was your turn to speak. There was a surprisingly large number of backgrounds that featured guitars, too.

Continue reading Why are we slowly heading back to inaccessibility?

Do we want to solve problems, or say we’ve solved problems?

I saw the above tweet the other day. It’s fairly amusing, and later I saw a few imply it was the process solution implemented as a result of the Facebook outage on 4th October 2021.

It resonated – How often do we implement solutions without solving problems? How often is a process introduced (‘Do not unplug’) that makes no attempt to solve the underlying problem (Whatever cannot be unplugged).

All too often we focus on implementing a solution without taking a step back to ensure it actually solves the problem. If we ignore all the bad stuff we can only talk about the good stuff, right? A report that’s all green is better than a report that has red on it, right? All that matters is the report, right?

Ostrich management never ends well.