story

It's about what broke, not who broke it (opens in new tab)

rachelbythebay.com

215 pointsrodrodrod8y ago100 comments

100 comments

I work in the nuclear industry, where most places are pretty good about maintaining a "blame-free" culture. You focus on what processes and procedures failed, what controls were missing, etc., that allowed somebody to make a mistake.

As this attitude was adopted, things shifted too far (at least in the opinion of industry groups, and my observation) to the point where people underperforming to the point of negligence weren't blamed, and the corrective actions to prevent reoccurrences of problems they caused ended up being cumbersome and expensive without really improving safety. (And in this industry, everything relates back to safety.)

In recent years, things have shifted back towards a more pragmatic middle ground. There are tools to assess if a problem was organizational (and it still almost always is) or if there was some element of personal negligence involved. This follows with an industry wide trend of trying to fix the real problems that affect safety and operations, not over-engineer cumbersome corrective actions.

lostcolony8y ago

Well, that's just it, the fact that underperformers aren't recognized is also a symptom of the process, and the fix is fixing the process, not throwing away the process.

Every problem is organizational, even those caused by individuals, because it's the organization's job to recognize and remove those individuals where appropriate.

lev998y ago

In a bad organization many people that would perform to satisfactory levels begin to underperform. One signal of this is when individual responsibility is removed from the equation.

I enjoy organizations where individual's expectations are clearly defined, and I prefer if there are consequences for missing those expectations because I feel like it increases the reliability of the team.

commandlinefan8y ago

> where people underperforming to the point of negligence

Well (to play devil's advocate just a bit) - isn't the ultimate end-goal of an end-state robust process one in which people can not just underperform, but be replaced completely?

MR4D8y ago

Only if the technology exists to do so.

Given the above discussion about a nuke plant, think of all the complexity inside it: - monitors - alerts & triggers - valves - compressors and other rotating equipment - fire safety - electrical systems

All of those have to be checked, tested, maintained, and fixed on a periodic basis over their lifetimes.

An incompetent person (or group) will eventually be the cause of something.

Maybe 100 years in the future we'll have self-operating nuke plants, but doubtful in my lifetime because of the incredible scale of complexity.

advisedwang8y ago

What are the tools you you to assess if a problem is organizational or personal?

nevatiaritika8y ago

My manager at work especially has the reverse attitude where the person who broke it is more significant than what broke/how we fixed it/ how to avoid it in the future. I have seen people get taunted for a bug they caused two years ago, a bug which didn't affect any revenue or was pretty easy to fix. And of course it still gets pointed out during appraisals.

Its a nightmare, because there's no room for experiment left anymore. Everyone just sticks to the template, afraid to do more than required, never deleting unused code etc. An attitude like this never ever helps!

smallbigfish8y ago

We don't touch production. We don't upgrade. We are are a X million company we can't afford the risks.

These are some of the excuses they put up.

And then they sit 10 years or more with that bad stuff in there, build even uglier ways around it.

But the time comes to actually do something about. And what was once a one day job becomes "we will hire a consultancy firm to guide us".

iamdave8y ago

Are you my coworker? Because you sound like him /s

No really I appreciate risk management but when it cripples your ability to make decisions, innovate or otherwise ACT on information that could help you be a more efficient team and the develoment team becomes a room full of people doing nothing but maintenance for years on end people leave and companies fail.

I just watched that very thing happen this year to my company for exactly that reason. Someone with the word "senior" in their job title was so risk averse that the market caught up, passed us and started eating our lunch.

God help them because I can't do it anymore, and writing on the wall says they'll be closing up shop this fall. I'm out the door for good at the end of the week.

bigiain8y ago

Ha. "Outsourcing of blame"!

Outsourcing of blame - as a Service. Where's my VC???

1 more reply

itronitron8y ago

code rots, whether it is being used in production or not, and should be consistently refactored and updated to accommodate the current status quo

1 more reply

ashleyn8y ago

This must be how data leaks/vulnerabilities happen.

zer00eyz8y ago

Im going to say this -

When it is all said and done, if you fucked up, you should get some shit for it. However this should be good natured, YOU should be laughing at it and everyone else laughing WITH you.

watwut8y ago

No one has right to demand what I laugh about or not. There is enough shit I have to take regularly to have zero desire to have to pretend laugh to it.

The discussions about everybody mistakes should be open, with emphasis on everybody, but leave mockery out of it. Inform everyone when they make mistakes without mocking them or attacking their egos and keep it factual. Don't assume everyone is friend with everybody nor that everybody is happy, it is not true. The line between laughing at it and with me is thin and oftentimes muddled.

Relaxed laughing at mistakes is result of good teamwork, but you don't get to good work by demanding that people accept being laughed at or mocked.

1 more reply

iovrthoughtthis8y ago

In the quiet isolation of a one2one, perhaps.

Publicly i.e. I front of the team? No. It serves no purpose than to stroke some egos and reduce the "Overton window" of development discussion and experimentation.

1 more reply

bjpbakker8y ago

> if you fucked up, you should get some shit for it

If you're intending that you (and the rest of your team) should learn from your mistakes, then I fully agree

1 more reply

hennsen8y ago

The importance of this person is the part where i agree.

But defining this person important is independent from the action of loading this person with guilt and financial, career relevant or social sanctions . That makes access to the important knowledge actually difficult, because no one will want to admit errors and share how they happened and what could have been done to prevent them...

fwip8y ago

Sounds like it's past time to go job-shopping.

magic_beans8y ago

Unless you’re on a visa, I don’t understand why anyone would tolerate this. Why not leave this company?

csours8y ago

I took down an assembly plant by clicking on a Network status icon from a particular hardware supplier.

Over the weekend, firmware patches were applied, and the server rebooted. After reboot, everything worked fine, so the tech marked the change successful and went home.

Well, apparently the NICs would work just fine, but not all settings were applied until you opened the UI provided by the vendor. When you opened the UI, the final settings would be applied, and the NICs would reboot, just long enough to kill TCP connections.

That loss of TCP connection killed the parent system, and then all the other children systems also died when the parent died.

So who would you even blame there? The guy who set the tripwire? The guy who tripped on the tripwire? The guy who designed a system that could be brought down by a momentary loss of connection?

I'm lucky that my boss wasn't the type to point fingers, because I was the guy who was there when it happened, and it sure got a lot of attention.

dozzie8y ago

> [...] not all settings were applied until you opened the UI provided by the vendor. [...] the NICs would reboot, just long enough to kill TCP connections.

The UI part suggests that it was Windows, and if it was, it's not quite the case that "just long enough" to kill TCP connections, as you need quite a lot of downtime to terminate a typical TCP session.

In Windows, if a NIC goes down, all the TCP connections that use the NIC get closed immediately. (Or at least this was the case a few years ago. I had a similar system with similar drawbacks deployed back then, though it was an automated warehouse, not an assembly plant.)

> So who would you even blame there?

The idiots who designed the system to run on non-industrial-grade operating system. Windows was never a good choice to control industrial installations.

dfox8y ago

Windows is often the only vendor-supported choice for interfacing your computer applications to PLCs and such things. Also most of the proprietary protocols run over industrial ethernet are some kind of legacy serial (232, 485..) bytestream format wrapped in TCP and the software usually does not handle loss of the TCP connection particularly gracefully. (on multiple occasions I've seen rules like "reboot the whole installation on every shift change" to "handle" the obvious reliability issues of such systems)

It is not about some small and well defined set of "idiots", it is essentially industry-wide design mistake.

1 more reply

csours8y ago

> In Windows, if a NIC goes down, all the TCP connections that use the NIC get closed immediately.

Yes, that seems more likely.

I think Windows can be a decent platform for light industrial applications - which this system in particular was. The problem is all of the partners and suppliers were either stuck in the past, or had weird ideas.

The parent system was *nix based, but there was a flaw in a communications protocol that lead to the channel bouncing between two boxes, and eventually bringing down the parent system.

My lesson from that was that you can have flaws on any system, no matter how solid the OS.

itronitron8y ago

one view that a lot of your colleagues may have had is that you just made clear to the company how relevant their jobs are (I am assuming most of the systems were built in house) and that decisions that were made in the interest of expediency can now be revisited in order to scope out additional work

csours8y ago

Shhhh... stop telling secrets.

aytekin8y ago

We have put a rule that made our system very strong over the years: We don’t care if you broke the site, just fix it quickly and more importantly write a test that will catch the same problem if it happens again.

Every time someone breaks something, we get harder to break.

taneq8y ago

Sounds like your system is antifragile.

gowld8y ago

Robust. The word is "robust". We don't need to promote buzzwords.

1 more reply

CoolGuySteve8y ago

I used to think this way until I started working with someone who was nearly always the one who broke it. At some point we just had to face the fact that his work was unreliable even after significant mentoring.

If the tasks were difficult that would be one thing, but I'm talking about stuff like committing code to prod that was clearly never even executed once.

vvanders8y ago

Sure and those people exist. However what you're really looking for is remorse + understanding of the magnitude of the issue.

If you have those two things then someone is already motivated to learn from what happened and will probably never make that mistake again(which are the large majority of engineers in my experience).

pnathan8y ago

Most mistakes aren't problematic. But while we blame the code, not the writer - it serves well to quietly have a counter of "problematic errors" and to keep an eye on the people who increment it the most. After a while, and after a pattern has been established...

jerrre8y ago

You could also have a counter of problematic area's: some parts are easier to break than others, and could be improved/made more robust...

2 more replies

altano8y ago

Sounds like you have a code review and automated testing problem and not a bad coworker problem.

lawn8y ago

Sounds like both.

CoolGuySteve8y ago

We have both of those setup, the guy is just an asshole.

uoaei8y ago

In that case, the "what broke" was the hiring process and the fix is them leaving that role.

ComputerGuru8y ago

I do a lot of open source work and unfortunately a very common posion is focusing on “who broke it,” which is especially disparaging when done in public. A particularly nasty habit is when outsider Alice opens an GitHub issue saying “xxxx is broken” and developer Bob replies with “yup, @Charlie’s commit fubar’d everything.”

Unfortunately both very demoralizing and very common.

silveroriole8y ago

Demoralizing - why? That seems like an attitude problem on the part of “Charlie”, not “Bob”. If “Charlie” is going to slink off with his tail between his legs every time he makes a mistake, he’ll have a tough time of it - it’s not like everyone can’t SEE that he broke it through version control anyway!

I just don’t really get it. Even when I was a junior, if I overheard “this thing is broken,” I was the first to pop up and say “oh, I bet that was me, let me have a look.”

ComputerGuru8y ago

I’m with you 100% except you’re not taking into account what I said about this being in public and Alice not being a part of the project. Internally assigning blame isn’t the issue, it’s about the “team” facade being shattered when dealing with the outside. If you’ve accepted Charlie into the organization then from without it isn’t about Charlie or Bob, the answer should be “yes, we’re aware; a recent commit broke that functionality and we’re working on fixing it.” I’m not even talking about a dev mailing list or GitHub PR discussion, I’m taking about the specific case of badmouthing a developer to an enduser.

Imagine if Apple came out and said “yeah, that blank root password bug, it was all because of John Smith and his crap patch that caused this.”

Outsiders don’t have the same perspective as insiders. If Charlie’s commit message read “implementing the really difficult thing we talked about,” the team might be aware of mitigating factors that Alice won’t. But even without those mitigating factors, all you’ve done is badmouth your own devs to the public. Additionally, you are not considering whether Charlie is an otherwise stellar developer that has never had a bad patch before. Alice may incorrectly presume that the only reason he’s being called out is because this is a habit of his, perhaps.

1 more reply

mieseratte8y ago

Two things:

First, it's generally best to praise publicly, and criticize in private.

Second, saying "@Users's commit screwed-the-pooch" blames but, frankly, may not be the whole picture. It's entirely possible that the commit caused the issue, but everything was done by the book in which case it's really an organizational failure.

Personally, I sympathize with your argument. I have no personal problem with Torvald-style correction. I used to work under an asshole who would threaten to have me fired routinely. Personally, I prefer the blowhards because you can always tell where you stand. Still, not everyone is wired this way and part of leadership is recognizing that and playing to various folks strengths and weaknesses.

1 more reply

foobarchu8y ago

I think its demoralizing because it's active blaming. Charlie doesn't have an attitude problem just because he doesn't like being publicly blamed for something, Bob has an attitude problem because piling it on somebody else is his first reaction.

The proper thing here is to acknowledge that there is an issue, but not assign blame. Go to the person you think is responsible in private, and let them admit the mistake in public if they want. Assigning them blame publicly shows a huge lack of respect, even if it was their fault, while admitting blame freely shows modesty.

Plus, what if it's not Charlie's fault, and his commit simply revealed the problem? Perhaps the actual issue is in a little used function deep down in the codebase, and his commit is just the first one to actually exercise that area the right way? Maybe this whole thing comes around to being Jim's fault instead.

commandlinefan8y ago

I wonder, though, how much the culture of "talent" and "rockstar developers" contributes to this. We programmer types often perpetuate this narrative that programming ability is something that you're "born with" and you have it or you don't - unlearnable, unteachable and ephemeral is the mystique of the programmer. So, how do you figure out who the capable ones are and who the incapable ones are? Well, of course - the ones who f'ed something up are the incapable ones, who just didn't "have it" after all.

userbinator8y ago

I had to then tell them that this person still worked there.

The old IBM story is worth mentioning in relation to this: http://www.mbiconcepts.com/watson-sr-and-thoughtful-mistakes...

kosei8y ago

When someone makes a mistake, that's an incredible investment in them. I'm always surprised* when people try to throw it away by firing them or making them want to quit. Help them learn from it and apply that knowledge moving forward. Otherwise they're just taking that knowledge and using it to help another company.

*Obviously with the caveat that some people are repeat offenders who are careless or just not good employees

lev998y ago

In other professions some mistakes cost the professional real money (doctor malpractice) or cause them to lose their license (drinking and driving with a commercial vehicle license).

As an industry we don't have a response to a truly neglectful mistake yet.

ashleyn8y ago

Reminds me of when someone ran "rm -rf /" at Pixar and deleted all of Toy Story 2.

The backups were crap and the only reason it survived was because someone took a server to work from home.

When all was said and done, they never really found who did it, they just made organisational changes to ensure it didn't happen again. No blame game.

andrewmcwatters8y ago

When I worked with my first non-remote team in Phoenix, I basically did this to our mobile app codebase with an in-house git repository due to some faulty rsync changes to a grunt task.

To the old NPL team, sorry about that. Culture is important.

partycoder8y ago

If in soccer the opposing team scores, who is to blame? the goalkeeper, defenses? the coach? the whole team? the referee? nobody?

Preventing goals means that the strategy needs to ensure good ball possession, and staying on the offense, to reduce the burden on the defense, to reduce the burden on the goalkeeper, who is the last line of defense.

If the last line of defense fails that's not an individual failure but a team failure, coach included, since the coach selects who gets to play, when and their roles.

Same in software: bad management passes the burden to developers, bad development passes the burden to testers, bad testing passes the burden to release management.

partycoder8y ago

Now, there are cases when everyone knows what to do, steps are taken so everyone is informed of it, but someone still decides to go against it. In that case the individual is at fault.

zer00eyz8y ago

It's not about whats broken, its about what you DO when it is broken.

This my favorite interview question to ask candidates:

"What is your all time biggest screw up, and how did you come back from it" - I then tell them the story of me loosing several hundred thousand dollars and the funny things that happened around it to set the tone. If you have been in tech for any length of time you have one of these stories (if not a few). I have heard some great ones by simply asking and it gives great insight into a candidate (humor, stress response, the things you have seen).

1 more reply

dancek8y ago

I think this is an important piece of organization culture. If the first reaction to problems is blame and punishment, issues are covered up. But if finding bugs and fixing them is considered valuable, there will be less issues in the long run.

Of course I write enough stupid bugs myself that I'm bound to think this way.

tzhenghao8y ago

This is so true. Providing the incentive to squash bugs than punishing people for making them is the driving force for innovation in a team. Take that away, and you get a toxic culture where everybody starts finger pointing when an issue arises.

PeterStuer8y ago

I found this to be the touchstone of spotting a dysfunctional enterprise. There it is all about the 'who', never about the fix. In those environments every new project is CYA from day 1. The disconnect between daily activities and the success of the company is so large, that all actions and projects are just about personal politics. A failure that can be blamed on the right target is often even a preferred outcome as eliminating a competitor for a promotion is even better than not having failed. If you find yourself in such an environment, try to leave asap.

silveroriole8y ago

Sure, if you have a huge company and a revolving door, the solution is a bunch of processes and idiot-proof safety nets, and no one person is to blame for most bugs. If you’re in a small company, the solution is to teach the devs by showing them what mistakes they made. I don’t think that’s a bad thing; if you write code, that code is your responsibility, and you shouldn’t be sensitive about people telling you your code is broken.

Also, focusing on the code itself, for me at least, easily leads to thoughts like “this function is crap! What idiot wrote this!?”. Finding out who broke it leads to thoughts like “I see John introduced this buggy function. I should go check with him, maybe he had a good reason.”

gjvc8y ago

Mishaps occur on a spectrum, and may be categorised from mistakes, carelessness, recklessness, through to malicious intent, and any combination of the above all along said spectrum.

Though these categories may seem like they are orientated on individuals' actions, they may be used to determine where the risk lies in systems (and people's use thereof) and how measures can be taken to avoid the same problems being repeated.

Much of the time, the complexity of systems (using the term in the widest possible sense) is under-estimated, and automated integrity checks are not used as religiously as they may be.

red_admiral8y ago

I'm 90% in agreement. Her workplace definitely sounds like somewhere I'd consider working myself (if I were looking for a job).

There are some things that I consider basic competence standards, like not storing passwords in plain text in any system you're building. I wouldn't fire an intern for getting that wrong but I also wouldn't let an intern near a production authentication system without some oversight.

If someone is a security engineer with a responsibility to know these kinds of things as part of their job role and certification, then if they'd implemented passwords-in-clear to cut corners somewhere, even if it's to meet a really important deadline, I'd be extremely unhappy. Of course I'd establish the general pattern of what had gone wrong first, and if it was a superior being abusive to the security engineer to get the product launched on time I'd still be really unhappy but not at the engineer.

Occasionally one does follow the chain of causes back though and finds not the organisation's culture but an individual who really should have known better.

rachelbythebay8y ago

If you can go back in time, join me in 2013 and you can enjoy the ride for a few years, too. I'm sorry to say that I don't think you'll get the same experience in 2018.

jancsika8y ago

The answer requires context, at least for FLOSS projects.

If unlucky dev #13 broke something because humans can no longer reason about the relevant part of the system, then it doesn't matter that #13 was the one who broke something. What really matters is that people get busy removing the sandtraps from their software.

However, many FLOSS projects run on the sheer joy and freedom that comes with maintaining a particular subsystem or area of the code. Most devs have a quick understanding of the responsibilities associated with that. But in cases where that responsibility doesn't come naturally, who broke becomes the focus. Addressing that issue will determine whether or not future breakages occur.

koliber8y ago

It isn't about who broke it. But if there is a person on the team who continually breaks things, does not learn from their mistakes and repeats them, or is not truthful when they break things, the team should react appropriately.

hennsen8y ago

It’s also about how it broke. And who broke it is sometimes the person who can say a lot if not most about that. Therefore i don’t recommend teaching to never talk about tge person who took an action that lead to a disaster, but rather encouraging a culture where admitting having taken a wrong step doesn’t lead to punishment, neither financial or social. Who broke it is an important part of the analysis, helping the organization to learn from each other’s errors. Making it a taboo talking about it is missing a chance for development...

pronoiac8y ago

Ooh, this is good. Part of it's covered under the name of "blameless post-mortems," but I don't remember searching for similar breakage, which is a great idea.

iramiller8y ago

This seems like a classic case of applying the Five Whys [https://en.m.wikipedia.org/wiki/5_Whys] methodology for root cause analysis.

drdeadringer8y ago

I don't see how this is not "better mousetrap, better mouse". Phrases from "they build a better fool" to "they build a better US Navy crewman" are a hundred a penny, and yes I've experienced the other side of this.

The best programmer vs the worst user, and every mix in between, shall produce situations needing attention this article addresses.

I've been in this situation on both sides. "Of course it should be clear what this phrase means, how could they fuck this up?" ... and ... "I have on idea what this means, both choices could mean what I want but either choice ends me up on the wrong page of this bullshit 'choose my own adventure' that I'll have to repeat if I'm wrong".

I'm interested in finding out if I'm understanding this wrong, and//or other thoughts.

gowld8y ago

The SRE Book teaches a lot of the lessons that this blog teaches. https://landing.google.com/sre/book.html

donttrack8y ago

I totally agree. Its usually the hallmark of a good team, if they have the "we are in this together" attitude.

lkrubner8y ago

There is the risk of conflating two separate types of problem. There are problems that arise from the complexity of the code, and problems that arise from particular people.

If a programmer has a habit of sloppy code, or violates the team's standards in some ways, then a good leader will keep track of the fact that one person is responsible for a recurring pattern of mistakes.

I absolutely agree with Rachel By The Bay, that many bugs arise from the complexity of the situation, and it would be wrong to blame the person who just happens to trip over that bug. But a good leader should take action against anyone who repeatedly screws up, and who seems unwilling to improve.

I've written about this before. This is from "How To Destroy A Tech Startup In Three Easy Steps":

----------------------

Wednesday, July 15th, 2015

I got to work at 11:00 a.m. John announced that our demo had stopped working. Sipping my coffee, I logged into the server to find out what the problem was. I looked at the error log for the API app, but it seemed okay. Then I checked the error log for the NLP app.

java.lang.StringIndexOutOfBoundsException: String index out of range: -1 at java.lang.String.substring(String.java:1955) at Celolot.nlp.Extractor.fuckBitchesGetMoney.java:87

What the hell was this?

“FuckBitchesGetMoney”?

What kind of name is that for a function?

A computer programmer can name their functions anything, but there are some “best practices” regarding names, and this particular function name violated all of them.

I asked Sital why he had given this name to his function. He looked at me straight, shrugged, and stated that the name was from the 1995 song by The Notorious B.I.G., “Get Money.” I replied that rap lyrics were not part of our naming conventions. He promised that he would change it.

Coming from anyone else, I might have interpreted the function name as an act of angry rebellion, but Sital was too forthright for that. Apparently, he thought the name was funny and went with it because he wanted to add some humor to his code. Never did he stop to think it might be unprofessional.

I looked through his code and found several other functions that had inappropriate names. I sent him a list and asked him to change their names to something standard.

A week later the function was still there. FuckBitchesGetMoney. Yet I don’t think that any of this was a deliberate act of rebellion. He was just oddly forgetful and disorganized.

https://www.amazon.com/Destroy-Tech-Startup-Easy-Steps/dp/09...

itronitron8y ago

if the function was still there, I think it is also likely that the old jar or class file (with the function) was still lurking in the classpath or your version control and build system weren't using his revision

lkrubner8y ago

The point is, he failed to make any revisions. He was oddly disorganized. Even with quite a bit of coaching, he was unable to do what we needed.

staticelf8y ago

lol, I wish there was a book with just those kind of stories.

teddyh8y ago

What’s that old saying; “Fix the problem, not the blame”?

nstj8y ago

I like this site and hadn't really read much from it - it's interesting how much it's been front paged over the last couple of weeks: https://news.ycombinator.com/from?site=rachelbythebay.com

krallja8y ago

Rachel is an excellent writer who was on a long break from writing. Seems like HN is happy to read her posts again.

rachelbythebay8y ago

Thanks! I was working a "real job" from about mid 2013 and am no longer, so my cycles are now all mine again. I was too tired to write most of the time before.

Also, there are many more stories to be told now!

2 more replies

BrissyCoder8y ago

I don't know. Where I work no discernible pattern can be found with the "what" that broke.

It's always the same f*ing people that break it though!

realusername8y ago

> It's always the same f*ing people that break it though!

Sometimes that's just the people who change things the most and work the hardest. It's harder to break anything when you don't actually change anything.

voltagex_8y ago

If I had a developer who was breaking stuff often enough to worry:

* Do they have too much access to systems?

* Is there something really wrong with the deployment system?

* What training can be provided?

All of that is more constructive than your comment, as cathartic as it may be.

saagarjha8y ago

Is there a reason why it's breaking, though? Is it really because the person breaking it is incompetent, or is it because there wasn't enough documentation or education or safeguards in place to prevent this from happening?

pbhjpbhj8y ago

It amuses me that the sibling comments appear unable to imagine the possibility that someone is incompetent.

Of course there are other possibilities - the people breaking things are doing the hard bits that no one else dare to.

jspash8y ago

But wouldn't that imply that the "daring, thing-breaking" people are actually incompetent to some degree? Otherwise they would mitigate the risk before performing any dangerous operations on a live system.

"Bravado is no excuse for lack of preparation." - Leeroy Jenkins

TheCoelacanth8y ago

Even in that case the overall system is still at fault for not recognizing their incompetence and either training them to be competent or getting rid of them.

erikb8y ago

It makes sense for a logical perspective, but in practice that's not how it works.

In reality if something breaks, and you are stupid enough to mention it, then (a) you are considered an a-hole for blaming <responsible-person-for-topic> even if you didn't and (b) responsible for fixing it.

So your main job is somehow make your stuff work despite all the other stuff that doesn't work and all the other people that try to stop you, silently. The less you criticize the better. What you get in return is that if you fuck up, people will try to avoid blaming you as well. Also if you don't succeed at making anything happen you get a little arrogant smile from your manager and a mediocre feedback round. But otherwise nothing happens.

The only change to that pattern happens when you piss off your manager or your manager's manager. Then suddenly each and everyt activity you do will be scrutinized and if there's a problem it will be used against you. The best hope they have is that you go away by yourself.

al2o3cr8y ago

"The best hope they have is that you go away by yourself."

I'd recommend you satisfy their hope maximally by running the hell away from that dumpster fire of bullshit office politics.

j / k navigate · click thread line to collapse

100 comments

_n_b_8y ago

lostcolony8y ago

Well, that's just it, the fact that underperformers aren't recognized is also a symptom of the process, and the fix is fixing the process, not throwing away the process.

Every problem is organizational, even those caused by individuals, because it's the organization's job to recognize and remove those individuals where appropriate.

lev998y ago

In a bad organization many people that would perform to satisfactory levels begin to underperform. One signal of this is when individual responsibility is removed from the equation.

commandlinefan8y ago

> where people underperforming to the point of negligence

Well (to play devil's advocate just a bit) - isn't the ultimate end-goal of an end-state robust process one in which people can not just underperform, but be replaced completely?

MR4D8y ago

Only if the technology exists to do so.

All of those have to be checked, tested, maintained, and fixed on a periodic basis over their lifetimes.

An incompetent person (or group) will eventually be the cause of something.

Maybe 100 years in the future we'll have self-operating nuke plants, but doubtful in my lifetime because of the incredible scale of complexity.

advisedwang8y ago

What are the tools you you to assess if a problem is organizational or personal?

nevatiaritika8y ago

smallbigfish8y ago

We don't touch production. We don't upgrade. We are are a X million company we can't afford the risks.

These are some of the excuses they put up.

And then they sit 10 years or more with that bad stuff in there, build even uglier ways around it.

But the time comes to actually do something about. And what was once a one day job becomes "we will hire a consultancy firm to guide us".

iamdave8y ago

Are you my coworker? Because you sound like him /s

God help them because I can't do it anymore, and writing on the wall says they'll be closing up shop this fall. I'm out the door for good at the end of the week.

bigiain8y ago

Ha. "Outsourcing of blame"!

Outsourcing of blame - as a Service. Where's my VC???

1 more reply

itronitron8y ago

code rots, whether it is being used in production or not, and should be consistently refactored and updated to accommodate the current status quo

1 more reply

ashleyn8y ago

This must be how data leaks/vulnerabilities happen.

zer00eyz8y ago

Im going to say this -

When it is all said and done, if you fucked up, you should get some shit for it. However this should be good natured, YOU should be laughing at it and everyone else laughing WITH you.

watwut8y ago

No one has right to demand what I laugh about or not. There is enough shit I have to take regularly to have zero desire to have to pretend laugh to it.

Relaxed laughing at mistakes is result of good teamwork, but you don't get to good work by demanding that people accept being laughed at or mocked.

1 more reply

iovrthoughtthis8y ago

In the quiet isolation of a one2one, perhaps.

Publicly i.e. I front of the team? No. It serves no purpose than to stroke some egos and reduce the "Overton window" of development discussion and experimentation.

1 more reply

bjpbakker8y ago

> if you fucked up, you should get some shit for it

If you're intending that you (and the rest of your team) should learn from your mistakes, then I fully agree

1 more reply

hennsen8y ago

The importance of this person is the part where i agree.

fwip8y ago

Sounds like it's past time to go job-shopping.

magic_beans8y ago

Unless you’re on a visa, I don’t understand why anyone would tolerate this. Why not leave this company?

csours8y ago

I took down an assembly plant by clicking on a Network status icon from a particular hardware supplier.

Over the weekend, firmware patches were applied, and the server rebooted. After reboot, everything worked fine, so the tech marked the change successful and went home.

That loss of TCP connection killed the parent system, and then all the other children systems also died when the parent died.

So who would you even blame there? The guy who set the tripwire? The guy who tripped on the tripwire? The guy who designed a system that could be brought down by a momentary loss of connection?

I'm lucky that my boss wasn't the type to point fingers, because I was the guy who was there when it happened, and it sure got a lot of attention.

dozzie8y ago

> [...] not all settings were applied until you opened the UI provided by the vendor. [...] the NICs would reboot, just long enough to kill TCP connections.

> So who would you even blame there?

The idiots who designed the system to run on non-industrial-grade operating system. Windows was never a good choice to control industrial installations.

dfox8y ago

It is not about some small and well defined set of "idiots", it is essentially industry-wide design mistake.

1 more reply

csours8y ago

> In Windows, if a NIC goes down, all the TCP connections that use the NIC get closed immediately.

Yes, that seems more likely.

The parent system was *nix based, but there was a flaw in a communications protocol that lead to the channel bouncing between two boxes, and eventually bringing down the parent system.

My lesson from that was that you can have flaws on any system, no matter how solid the OS.

itronitron8y ago

csours8y ago

Shhhh... stop telling secrets.

aytekin8y ago

Every time someone breaks something, we get harder to break.

taneq8y ago

Sounds like your system is antifragile.

gowld8y ago

Robust. The word is "robust". We don't need to promote buzzwords.

1 more reply

CoolGuySteve8y ago

If the tasks were difficult that would be one thing, but I'm talking about stuff like committing code to prod that was clearly never even executed once.

vvanders8y ago

Sure and those people exist. However what you're really looking for is remorse + understanding of the magnitude of the issue.

pnathan8y ago

jerrre8y ago

You could also have a counter of problematic area's: some parts are easier to break than others, and could be improved/made more robust...

2 more replies

altano8y ago

Sounds like you have a code review and automated testing problem and not a bad coworker problem.

lawn8y ago

Sounds like both.

CoolGuySteve8y ago

We have both of those setup, the guy is just an asshole.

uoaei8y ago

In that case, the "what broke" was the hiring process and the fix is them leaving that role.

ComputerGuru8y ago

Unfortunately both very demoralizing and very common.

silveroriole8y ago

I just don’t really get it. Even when I was a junior, if I overheard “this thing is broken,” I was the first to pop up and say “oh, I bet that was me, let me have a look.”

ComputerGuru8y ago

Imagine if Apple came out and said “yeah, that blank root password bug, it was all because of John Smith and his crap patch that caused this.”

1 more reply

mieseratte8y ago

Two things:

First, it's generally best to praise publicly, and criticize in private.

1 more reply

foobarchu8y ago

commandlinefan8y ago

userbinator8y ago

I had to then tell them that this person still worked there.

The old IBM story is worth mentioning in relation to this: http://www.mbiconcepts.com/watson-sr-and-thoughtful-mistakes...

kosei8y ago

*Obviously with the caveat that some people are repeat offenders who are careless or just not good employees

lev998y ago

In other professions some mistakes cost the professional real money (doctor malpractice) or cause them to lose their license (drinking and driving with a commercial vehicle license).

As an industry we don't have a response to a truly neglectful mistake yet.

ashleyn8y ago

Reminds me of when someone ran "rm -rf /" at Pixar and deleted all of Toy Story 2.

The backups were crap and the only reason it survived was because someone took a server to work from home.

When all was said and done, they never really found who did it, they just made organisational changes to ensure it didn't happen again. No blame game.

andrewmcwatters8y ago

When I worked with my first non-remote team in Phoenix, I basically did this to our mobile app codebase with an in-house git repository due to some faulty rsync changes to a grunt task.

To the old NPL team, sorry about that. Culture is important.

partycoder8y ago

If in soccer the opposing team scores, who is to blame? the goalkeeper, defenses? the coach? the whole team? the referee? nobody?

If the last line of defense fails that's not an individual failure but a team failure, coach included, since the coach selects who gets to play, when and their roles.

Same in software: bad management passes the burden to developers, bad development passes the burden to testers, bad testing passes the burden to release management.

partycoder8y ago

Now, there are cases when everyone knows what to do, steps are taken so everyone is informed of it, but someone still decides to go against it. In that case the individual is at fault.

zer00eyz8y ago

It's not about whats broken, its about what you DO when it is broken.

This my favorite interview question to ask candidates:

1 more reply

dancek8y ago

Of course I write enough stupid bugs myself that I'm bound to think this way.

tzhenghao8y ago

PeterStuer8y ago

silveroriole8y ago

gjvc8y ago

Mishaps occur on a spectrum, and may be categorised from mistakes, carelessness, recklessness, through to malicious intent, and any combination of the above all along said spectrum.

Much of the time, the complexity of systems (using the term in the widest possible sense) is under-estimated, and automated integrity checks are not used as religiously as they may be.

red_admiral8y ago

I'm 90% in agreement. Her workplace definitely sounds like somewhere I'd consider working myself (if I were looking for a job).

Occasionally one does follow the chain of causes back though and finds not the organisation's culture but an individual who really should have known better.

rachelbythebay8y ago

If you can go back in time, join me in 2013 and you can enjoy the ride for a few years, too. I'm sorry to say that I don't think you'll get the same experience in 2018.

jancsika8y ago

The answer requires context, at least for FLOSS projects.

koliber8y ago

hennsen8y ago

pronoiac8y ago

Ooh, this is good. Part of it's covered under the name of "blameless post-mortems," but I don't remember searching for similar breakage, which is a great idea.

iramiller8y ago

This seems like a classic case of applying the Five Whys [https://en.m.wikipedia.org/wiki/5_Whys] methodology for root cause analysis.

drdeadringer8y ago

The best programmer vs the worst user, and every mix in between, shall produce situations needing attention this article addresses.

I'm interested in finding out if I'm understanding this wrong, and//or other thoughts.

gowld8y ago

The SRE Book teaches a lot of the lessons that this blog teaches. https://landing.google.com/sre/book.html

donttrack8y ago

I totally agree. Its usually the hallmark of a good team, if they have the "we are in this together" attitude.

lkrubner8y ago

There is the risk of conflating two separate types of problem. There are problems that arise from the complexity of the code, and problems that arise from particular people.

I've written about this before. This is from "How To Destroy A Tech Startup In Three Easy Steps":

----------------------

Wednesday, July 15th, 2015

java.lang.StringIndexOutOfBoundsException: String index out of range: -1 at java.lang.String.substring(String.java:1955) at Celolot.nlp.Extractor.fuckBitchesGetMoney.java:87

What the hell was this?

“FuckBitchesGetMoney”?

What kind of name is that for a function?

A computer programmer can name their functions anything, but there are some “best practices” regarding names, and this particular function name violated all of them.

I looked through his code and found several other functions that had inappropriate names. I sent him a list and asked him to change their names to something standard.

A week later the function was still there. FuckBitchesGetMoney. Yet I don’t think that any of this was a deliberate act of rebellion. He was just oddly forgetful and disorganized.

https://www.amazon.com/Destroy-Tech-Startup-Easy-Steps/dp/09...

itronitron8y ago

lkrubner8y ago

The point is, he failed to make any revisions. He was oddly disorganized. Even with quite a bit of coaching, he was unable to do what we needed.

staticelf8y ago

lol, I wish there was a book with just those kind of stories.

teddyh8y ago

What’s that old saying; “Fix the problem, not the blame”?

nstj8y ago

I like this site and hadn't really read much from it - it's interesting how much it's been front paged over the last couple of weeks: https://news.ycombinator.com/from?site=rachelbythebay.com

krallja8y ago

Rachel is an excellent writer who was on a long break from writing. Seems like HN is happy to read her posts again.

rachelbythebay8y ago

Thanks! I was working a "real job" from about mid 2013 and am no longer, so my cycles are now all mine again. I was too tired to write most of the time before.

Also, there are many more stories to be told now!

2 more replies

BrissyCoder8y ago

I don't know. Where I work no discernible pattern can be found with the "what" that broke.

It's always the same f*ing people that break it though!

realusername8y ago

> It's always the same f*ing people that break it though!

Sometimes that's just the people who change things the most and work the hardest. It's harder to break anything when you don't actually change anything.

voltagex_8y ago

If I had a developer who was breaking stuff often enough to worry:

* Do they have too much access to systems?

* Is there something really wrong with the deployment system?

* What training can be provided?

All of that is more constructive than your comment, as cathartic as it may be.

saagarjha8y ago

pbhjpbhj8y ago

It amuses me that the sibling comments appear unable to imagine the possibility that someone is incompetent.

Of course there are other possibilities - the people breaking things are doing the hard bits that no one else dare to.

jspash8y ago

"Bravado is no excuse for lack of preparation." - Leeroy Jenkins

TheCoelacanth8y ago

Even in that case the overall system is still at fault for not recognizing their incompetence and either training them to be competent or getting rid of them.

erikb8y ago

It makes sense for a logical perspective, but in practice that's not how it works.

al2o3cr8y ago

"The best hope they have is that you go away by yourself."

I'd recommend you satisfy their hope maximally by running the hell away from that dumpster fire of bullshit office politics.

j / k navigate · click thread line to collapse