undefined | Better HN

0 pointsmothsonasloth4y ago0 comments

Call me old fashioned but the latest trend of showing "empathy" for a serious incident, then proceeding to dance around the aftermath of it, whilst people give themselves a pat on back in a retro/post-mortem, isn't the way to do it.

People need to be blamed, and responsibility for actions taken (without covering asses)

0 comments

q3k4y ago

The point isn't to dance around the incident, but to not blame people. You can blame systems, design, engineering culture, processes, but don't blame people. Even if someone accidentally pressed the 'destroy prod' button, that's not the fault of that person, it's the fault of that button existing and being accessible in the first place.

I have no empathy for Fastly-the-company. I hate the fact that the Internet is centralized around CDNs. I wish this idea of 'but we _must_ run a CDN for our 1QPM blog!' would die in a fire. But I can still empathize with the Fastly engineers handling this shitstorm right now.

2 more replies

gurgus4y ago

Blame culture isn't the way forward here.

Do a post-mortem, work out root causes, work as a unit to ensure this doesn't happen again.

Obviously if there are levels of gross negligence or misconduct discovered during post-mortem, that will need to be dealt with accordingly, but coming into this with an attitude of "we must find someone to blame and incur repercussions" isn't healthy at all.

We are humans - don't forget that.

edit: forgot some words.

3 more replies

mhandley4y ago

The problem is a blame culture ensures the near-misses are never reported. Air safety discovered this many years back - a no-blame culture ensures anything safety-related can be reported without fear of repercussions. This allows you to discover near misses due to human error and ensure that the overall system gains resilience over time. If you blame people for mistakes, they cover the non-obvious ones up, and so you cannot protect against similar ones in future, so your reliability/safety ends up much lower in the long run. It's all about evolving a system that is resilient to human error - we will make mistakes, but the system overall should catch them before they become catastrophies. In air travel now, the remaining errors almost never have a single simple cause, except in airlines/countries that don't have an effective safety reporting culture.

hyper_reality4y ago

I recommend reading about "blameless postmortems" [1]. Our natural tendency is to look for who is responsible for an incident and point the finger of blame. Over time this leads to a cover-your-ass culture, whether you like it or not. Therefore such a tendency needs to be actively fought against to keep the focus on quality engineering and not politics.

"An atmosphere of blame risks creating a culture in which incidents and issues are swept under the rug, leading to greater risk for the organization."

[1] https://sre.google/sre-book/postmortem-culture/

darkcha0s4y ago

I'm sure you've never made a mistake.

The best way (in a team), to tackle mistakes, is to ensure the process in place corrects these mistakes. The only way to do that, is a post-mortem/learning from the mistake. If you blame it on some engineer who did it, that guy will eventually be replaced by some other guy, who may make the same mistake.

1 more reply

berkes4y ago

We need to learn from our, and other mistakes, or else we keep repeating them. Nothing "old fashioned" about that.

And we, especially companies, typically only learn if there is something at stake. Stock-price, a job, customers, liability etc.

(Call me old fashioned, but what I learned from it, having no stake in the game, is we are truly demolishing the resilient, decentralised nature of the internet; or already have done so)

OJFord4y ago

I don't agree about the blame, but I do also find the empathy cringeworthy. Something's broken; someone's job is to fix it; they'll fix it; it will work again. /shrug/

Post-mortems make far more interesting submissions IMO, but I suppose people up-vote 'yes down for me too'.

erwald4y ago

the attitude that "people need to be blamed" will never improve reliability in the long run. people come and go; systems and processes endure. blaming people is the best way to avoid making durable improvements to systems and processes.

1 more reply

hardwaresofton4y ago

I hear you, but I just want to point out that this rarely happens anywhere else. It's great if tech (and people in general) hold themselves to progressively higher standards than what is out there already, but I don't think tech needs to be that much better, I'd settle for just doing a good honest retro (without throwing anyone under the bus, and without covering their asses)

A good leader will take the hit (and the repercussions) for their underlings, compensate customers where compensation can make it better (and offer to make it easy to use fallbacks if this happens again) -- and internally fix the problem so it can't happen again, without throwing anyone to the dogs.

sophacles4y ago

> People need to be blamed, and responsibility for actions taken (without covering asses)

What i think this syntactically invalid sentence is trying to say is:

People need to be blamed, and held responsible for actions taken.

Why do people need to be blamed? Why do we need to make someone the scapegoat? What does being held responsible look like?

Let say we find some sacrificial engineer to pin this on:

* does the downtime magically disappear?

* does the engineer suffering (say losing his job or whatever) make your downtime meaningful? You'll recoup your revenue somehow from it?

* does the fact that there's a scapegoat mean that everyone else at fastly is perfect and it's ok to keep using them?

taurath4y ago

Scapegoating in those situations happens more often than not. In an operations team all problems are systemic - having to do with decision makers throughout the process, sometimes acting on perverse incentives set up by others. Blame then gets diluted but still tends to fall upon the organization responsible rather than an individual, which is where it should be. Gross negligence is not so cut and dry.

austinjp4y ago

"Call me old-fashioned but..." is a dog-whistle harking back to "better days" that never existed.

Emapthy and responsiblity are not mutually exclusive.

colesantiago4y ago

> People need to be blamed, and responsibility for actions taken (without covering asses)

This. When people talk about "HugOps", "empathy" and all that when a worldwide incident affecting a huge amount of time critical customers (e.g. trading, hft, cargo, food delivery, etc.) is happening for an hour, it has catastrophic consequences.

I hope the engineers also understand the other side and why we are paying huge sums of cash for their service.

1 more reply

j / k navigate · click thread line to collapse

0 comments

q3k4y ago

2 more replies

gurgus4y ago

Blame culture isn't the way forward here.

Do a post-mortem, work out root causes, work as a unit to ensure this doesn't happen again.

We are humans - don't forget that.

edit: forgot some words.

3 more replies

mhandley4y ago

hyper_reality4y ago

"An atmosphere of blame risks creating a culture in which incidents and issues are swept under the rug, leading to greater risk for the organization."

[1] https://sre.google/sre-book/postmortem-culture/

darkcha0s4y ago

I'm sure you've never made a mistake.

1 more reply

berkes4y ago

We need to learn from our, and other mistakes, or else we keep repeating them. Nothing "old fashioned" about that.

And we, especially companies, typically only learn if there is something at stake. Stock-price, a job, customers, liability etc.

(Call me old fashioned, but what I learned from it, having no stake in the game, is we are truly demolishing the resilient, decentralised nature of the internet; or already have done so)

OJFord4y ago

I don't agree about the blame, but I do also find the empathy cringeworthy. Something's broken; someone's job is to fix it; they'll fix it; it will work again. /shrug/

Post-mortems make far more interesting submissions IMO, but I suppose people up-vote 'yes down for me too'.

erwald4y ago

1 more reply

hardwaresofton4y ago

sophacles4y ago

> People need to be blamed, and responsibility for actions taken (without covering asses)

What i think this syntactically invalid sentence is trying to say is:

People need to be blamed, and held responsible for actions taken.

Why do people need to be blamed? Why do we need to make someone the scapegoat? What does being held responsible look like?

Let say we find some sacrificial engineer to pin this on:

* does the downtime magically disappear?

* does the engineer suffering (say losing his job or whatever) make your downtime meaningful? You'll recoup your revenue somehow from it?

* does the fact that there's a scapegoat mean that everyone else at fastly is perfect and it's ok to keep using them?

taurath4y ago

austinjp4y ago

"Call me old-fashioned but..." is a dog-whistle harking back to "better days" that never existed.

Emapthy and responsiblity are not mutually exclusive.

colesantiago4y ago

> People need to be blamed, and responsibility for actions taken (without covering asses)

I hope the engineers also understand the other side and why we are paying huge sums of cash for their service.

1 more reply

j / k navigate · click thread line to collapse