Early SRE Ethnographic Research

As a UX researcher, one of my jobs is to observe users to find certain problematic patterns in their behavior. My goal for identifying these patterns is to try and tease out the cause of the problem. As I mentioned in my previous blog, I am learning about this new...

read more

Happiness is a measured user journey

We mess with Jenkins configs.  We struggle with failing automated tests.  We finally get Kubernetes doing what we want it to do.  We wake up at 2am to restart a failing API gateway.  Why?  So the user gets the best experience possible.  How do we know if our efforts...

read more

Wind-up Top Monitoring

Greetings from kaizenOps.io; my name is Nate and I’ve been in the application performance and availability space for 15+ years.  Though looking back, one could say I’ve been working in this space as far back as my first bona fide corporate job as a college intern for...

read more

SREs, who are you?

My name is Kyoko. I am a user researcher for kaizenOps.io. The nature of my job is to learn, understand and sympathize with others – specifically, users.  I often meet very interesting people as ‘users,’ from general consumer to people in very specific technical...

read more

SRE Pain Points

I want to spend a little time reflecting on the kaizenOps.io journey over the past few months.  As I mentioned at the outset of this blog, many years ago, I was touched by how much Lakshay’s life was impacted by his job.  Of course, everyone’s job impacts their life,...

read more

Abnormal vs. Bad

Is your solution detecting actual business threats? Reflecting on the alert fatigue problem, I think a lot of the problem comes down to conflating abnormal metric values with bad user experiences.  Many monitoring products reinforce the confusion by making it easy...

read more

Alert Fatigue

One of the issues that I’ve run across over the years is alert fatigue.  As the linked article points out, it’s not just a problem for SREs, but we’re definitely victims of it.  I can’t count the number of times the question, “Hey, what is that alert about?” is...

read more

Welcome to kaizenOps.io

My name is Mark and I’ve been in the site reliability game for a while now – going on about fifteen years. A lot has changed since I was a fresh-faced consultant joining Wily Technology back in 2000; the rise of AWS, Docker, APM, Nagios, Slack, and on and on....

read more