The day I locked everyone out of the company intranet (opens in new tab)

(dancowell.com)

175 pointsdfcowell2y ago125 comments

125 comments

114 comments · 32 top-level

freetanga2y ago· 25 in thread

Always do a select with your criteria before doing a Delete or update.

Don’t ask me how I learned this.

At a previous place I worked, if they were working on the cli (eg in psql or similar) they'd always use these two steps, either of which would provide adequate protection:

1. Start a transaction before even thinking of writing the delete/update/etc (BEGIN; ...)

2. Always write the WHERE query out first, THEN go back to the start of the line and fill out the DELETE/UPDATE/etc.

It worked well, and it's a habit I've since tried to keep on doing myself as well.

_nalply2y ago

Once I wanted to do `rm -fr *~` to delete backup files, but the `~` key didn't register...

Now I have learnt to instinctively stop before doing anything destructive and double-check and double-check again! This also applies to SQL `DELETE` and `UPDATE`!

I know that `-r` was not neccessary but hey that was a biiiig mistake of mine!

tetha2y ago

For entirely critical systems, I by now rather generate a reviewable script and run that. Something along the lines of:

    find /backups/ -mtime '-30' -printf 'rm -f %p\n' > very_scary_deletes.sh

This gives you a static script with a bunch of rm's. You can read that, check it, give it to people to validate and when you eventually run it, it deletes exactly those files.

_nalply2y ago

That's clever, only do really dangerous things if you have a way to carefully review the steps.

Another way is sometimes: `echo *.txt~` and when you like the result, replace `echo` with `rm`.

1 more reply

ht852y ago

If you ever type really dangerous commands, it is good practice to prefix them with a space (or whatever your favorite shell convention is) to make sure they not saved in your history.

One of my "oopsies" involved aggressive <up> and <enter> usage and a previous `rm -rf *` running in /home...

CoastalCoder2y ago

I didn't know about the space thing!

Though for me, I think the greater risk would be from not having a record of what I'd run.

_shantaram2y ago

One time I was debugging the path resolver in a static site generator I was writing. I generated a site ~/foo, thinking it would do /home/shantaram/foo, but instead it made a dir '~' in the current directory. I did `rm -rf ~` without thinking. Command took super long, wondered what was going on, ctrl-c'd in horror... that was fun.

xp842y ago

I’m really curious: without cheating by using the GUI, what would be the proper way to delete such an obnoxiously-named directory? Would “$PWD/~” work?

1 more reply

laurensr2y ago

You remind me of that time I wanted to `rm -rf ./*` but the dot hadn't registered... I now avoid that statement.

lostmsu2y ago

I click in Explorer. Have to confirm too. Never been confused.

jojobas2y ago

>Don’t ask me how I learned this.

Joke's on you, we all learned it this exact way.

seanthemon2y ago

This gives my heart pains I can't explain to my therapist..

maccard2y ago

Luckily I learned from someone else. I did figure out start with a transaction the hard way though

fernandotakai2y ago

i can still feel the pain in the pit of my stomach when i saw the amount of rows affected.

thankfully my boss at the time was a amazing db admin and he helped me fixing my mistake.

gumballindie2y ago

Came here to say exactly that! I do a select count and then a limit against the returned count. At least it may reduce the blast radius.

menacingly2y ago

In most of my scenarios, I'd actually rather cause a catastrophic global change than a silent subtle corruption of a handful of rows

gumballindie2y ago

That may be fun in a trivial setup such as op’s but when millions of customers or billions of transactions are affected it’s a nightmare. A competent engineer runs queries against a local and then a uat db, verifies results and then on prod. But if you must do it in prod then it must be limited in scope.

1 more reply

sgarland2y ago

This can also bite you if your dataset is larger than the buffer pool (or whatever other RDBMS calls it), and the particular table you’re querying isn’t commonly accessed.

Turns out when you start loading millions of rows of useless data into memory, the useful data has to get kicked out, and that makes query latency skyrocket.

mjcohen2y ago

Same thing on a Unix/Linux level: When using find, I always do it first with -print until I see the files I want. Only then do I add the actual action I want.

tgv2y ago

Then start a transaction, and only commit when the row numbers match.

chrisandchris2y ago

That, and some DB tools (line JetBrains DataGrip) block UPDATEs and DELETEs without a WHERE condition.

iJesus2y ago

Gone are the days of hard deletes in my approach; it's exclusively soft deletes now!

okeuro492y ago

In the EU soft deletes can violate GDPR, so not always an available strategy.

vntok2y ago

Soft delete items immediately, purge the soft deleted rows automatically after 30 days or so.

lemper2y ago

how did you learn it?

fukawi22y ago· 14 in thread

I was working on an old old old "ERP" system written in D3 PICK. It's a database, programming language and OS all in one with roots in tracking military helicopter parts in the 1960's. I was working on it in the mid-2000s.

It had SQL like syntax for manipulating data, but it was interactive. So you would SELECT the rows from the table that you wanted, then those rows would be part of your state. You would then do UPDATE or DELETE without any kind of WHERE, because the state had your filter from the previous SELECT.

It has a fun quirk though - if your SELECT matched no rows, the state would be empty. So SELECT foo WHERE 1=2 would select nothing.

UPDATE and DELETE are perfectly valid actions even without a state...

Working late one night, I ran a SELECT STKM WHERE something that matched nothing, then before I realised I realised my state had no rows matched, I followed up with DELETE STKM.

Yep, the entire Stock Movements table four the last 20+ years of business were gone.

The nightly backup had not run, and I didn't want to lose an entire day of processing to roll back to the previous night.

I spent the entire night writing a program to recreate that data based on invoices, purchase orders, stocktake data, etc. I was able to recreate every record and got home about 9am. Lots of lessons learnt that night.

cvoss2y ago

> if your SELECT matched no rows, the state would be empty

> UPDATE and DELETE are perfectly valid actions even without a state

Some may call this a fun quirk :) but I'd call it a horrible mistake in the design of the system! It should have been conceptually obvious to the designer that an empty set of rows is a perfectly valid state and is fundamentally different from "no state".

bombolo2y ago

At university I had to use a programming language to do some specific calculations.

It was obviously designed by someone who had not taken any courses in formal languages, compilers and algorithms.

It had a fun quirk, the final result would change depending on which names you used for your variables.

When the phd student guiding our lab session told me to not use underscores in my variable names, I thought i'd humor her just to show her it was a stupid suggestion. But she knew more than me.

knodi1232y ago

I remember when I was an intern at a large telco company, and an older woman who was VP of something or other called me into her office to answer some questions about what parts we needed based on supplies I had just inventoried.

I told her one, and she didn't recognize the part number, so she said she'd look at the distributor's website. I watched her go to google.com, type in the url, and then I "helpfully" interrupted to explain that the address bar was where urls went, and that google was for searching for other things. She proceeded to click the "view cached version" in the google results, and told me that distributor's website was down and this was her workaround.

I learned a lot from that job. Including when to keep my mouth shut!

nikau2y ago

A lot of older systems were like that before they became more commodity items with guard rails.

Some of the early SAN disk arrays used to use an assembly like config interface with no sanity checks, type in the wrong command and you could wipe the whole array.

The company I worked for at the time had a blanket policy that any change had to be implemented by the vendor to shift liability for any mistakes.

fukawi22y ago

You're probably right, but back then it was assumed you knew what you were doing I guess. The old idiom of "Linux won't stop you shooting yourself in the foot" was even more true of this system.

dylan6042y ago

>It's a database, programming language and OS all in one with roots in tracking military helicopter parts in the 1960's. I was working on it in the mid-2000s.

A friend of mine worked for a company repairing transmissions on drilling platforms. To say it was a specialty shop is an understatement. However, they had been in business for 40 years. While not a computer system from the 60s, theirs was a custom written solution from the 80s running on a DOS 286 PC. It was stuck on the top shelf of a closet with a UPS. The developer had long been "permanently" unavailable for at least a decade, and they were too scared to loose the data in having it rebuilt. I asked how scared, and the answer was "it's the only computer in the building with a UPS". I'm laughing at the insanity just retelling the story

albert_e2y ago

You recreated a production database of 20 years data by hand overnight?

You deserve an award !!!

fukawi22y ago

Yep, the entire table. I figured since I had to write it to do the day at a minimum, doing the whole table would help me find corner cases and errors since it was a much larger range of data

testemailfordg22y ago

Only that day's data was added on top of backup from last night...

unnouinceput2y ago

he only recreated the ones not saved by the nightly backup, meaning he used the previous nightly backup and recreated from invoices the last ~23 hours.

fukawi22y ago

No, I did the entire table. That helped me find corner cases and errors in my logic having such a bigger range of source data.

3 more replies

vivekv2y ago

Whe. I first learned SQL on an oracle cli client I was told to type begin; then any other command. For a brief while until i learned about transactions, I assumed that it was part of the cli startup :)

EricBetts2y ago

PICK is giving be flashbacks of my tenure at ADP’s dealership services (I think it became CDK)

nivethan2y ago

I deal with D3 pick and the fear I have of doing select bp and catalog bp is insane. The entire system is a mess so wiping the md would be catastrophic. :)

croo2y ago· 10 in thread

My chooosen database explorer is Dbeaver. Horrible name but great app.

You can set colours for local/test/prod servers and a red colored tab will scream at you to be cautious. And with red color every edit will pop up an "are you sure?" question. And autocommit is off.

I sorta stopped making unrecoverable mistakes.

nucleardog2y ago

Autocommit being off took prod down for me.

Hop in the test database, run some SQL, update a bazillion records, run some selects to verify, open the test app and verify there. All good! Someone else checks… LGTM.

So I do the same thing in prod. Make changes, validate in the DB, looks good. Open the prod app… down. Requests are hanging.

Our software is shit so I’m not surprised. I start frantically digging through trying to figure out where and why it’s hanging. Eventually chase it down to some lock wait timeouts in SQL. Start looking at the database… no queries seem to be running against the table in question. Eventually dig a little deeper and find… the call was coming from inside the house! The locks are held by my connection. Go hit “commit” in dbeaver and prod comes right back up.

And that’s the boring story of the day I learned that dbeaver used transactions in prod connections by default.

reaperducer2y ago

My chooosen database explorer is Dbeaver

I just checked it out, and it looks promising.

Con: There are too many licensing options. I'm not even sure which one I would need.

Pro: They'll send out an invoice, like a real company. I work for a billion-dollar healthcare company, and if your company only takes PayPal, or Venmo, or something else that makes it look like two guys in a basement, it'll never get through our purchase approval process. Big companies do business with other big companies. Startups have to learn to interface with businesses on a business level if they ever want to grow out of being a startup.

fbdab1032y ago

Start with the Community Edition and see where it takes you.

loa_in_2y ago

As if two guys in a basement couldn't send out an invoice

Lvl999Noob2y ago

The parent isn't saying that they couldn't. They are saying that they shouldn't advertise their two-guy-ness in the financial parts of their company.

fbdab1032y ago

My only complaint about DBeaver is how powerful it is -I feel like I have only ever scratched 1% of the functionality it offers. I need a tutorial class on all of the goodies the platform can do.

All of the tutorials I seem to find online are basic level and/or mixed with introducing SQL.

jareklupinski2y ago

i'm using it on Mac, and it seems to want to update every time I turn it on... i get that having timely updates is great, but the way to update the app on a mac is:

click ok on the update dialog, wait until the download finishes, shut down the app, drag the new app over to Applications, click Replace, wait for the actual moving dialog to pop up, wait for the moving dialog to finish, minimize all your windows so you can see the desktop, unmount the dbeaver image, click dbeaver again

and you can get back to work! now if only I could remember all the details that were in my head when i opened the app in the first place...

dbeaver team: can you do what paint.net did please? that is, only bother me to update as i'm turning the app off, not on?

throitallaway2y ago

This is more of a Mac problem than anything, and why I'm such a big fan of package managers.

acomjean2y ago

DBeaver is great. I came from a different tool, and had to get used to persisting the changes. I’m totally use to it.

I run a local copy of the database I connect to remotely and I have to be careful about which database/table I’m connected too.

psnehanshu2y ago

I love DBeaver too.

cesaref2y ago· 8 in thread

Back in the 90s I remember a work colleague asking 'can you rollback a drop table?', to which I replied 'no', and all the blood drained from his face in seconds. It's one of those things you've heard happens to people, but until you see it, you can't quite believe it.

jdsalaro2y ago

> can you rollback a drop table

>_< , something similar happened to me when I did an rm -Rf * in the old pictures directory on my system ... Well ... It wasn't the old pictures directory, it was the backup directory with all pictures from my older phone !! To say that blood drained from my face would be an understatement, thankfully I was able to recover most of it.

The first thing I did after the recovery marathon was to alias rm so that it instead works by moving stuff to /tmp

eCa2y ago

In the late 90’s I wanted to try this Linux thing, so I followed a tutorial. First step:

    fdisk

Yes, that was my Windows partition going bye-bye.

CoastalCoder2y ago

So... success? ;)

2 more replies

euroderf2y ago

My boss was doing housecleaning in his PDP-11 (Unix v.6) account and did a "rm *" in the bin directory but OOPS it was the system bin.

noAnswer2y ago

inb4: /tmp partition gets full. Causing things to brake. :-)

kubanczyk2y ago

No problem, just use rm ;)

msravi2y ago

Since this seems to be such a common occurrence, why haven't databases evolved to maintain some limited form of history that can be rolled back? Have no idea of the complexity, so forgive me if that sounds daft.

kayodelycaon2y ago

Long story short, you’re holding multiple transactions open. That’s a lot of bookkeeping overhead if there’s a lot of changes. A system with a few million rows and moderate load on a reasonably-sized server for that load will slam to a halt.

Then come the “site is down and the world is ending” tickets, messages, emails, and phone calls.

philderbeast2y ago· 3 in thread

"All of your colleagues have done something dumb. Don't be afraid to tell us when you make a mistake. We all remember our first screw up and will be happy to help."

Never have truer words been spoken.

As I tell all the new juniors at work doing sysadmin type tasks, everyone has deleted the production database at least once. Mistakes will always happen, it's how you deal with them that defines how good you are at the end of the day.

russfink2y ago

It’s also a measure of how well management does in response. We were transferring a software production repository from one machine to another, Lord knows why, when a junior admin I was supposed to supervise had arrived a half hour early and started the operation without me. He got the source and destination arguments reversed in the file transfer; we were using DD for this one part of it.

Management reacted pretty well. They assured both of us that while we made the mistake, it was not our fault that data was lost: the problem was that backups were not being checked, which caused us to lose the resulting three days (120 developer days) of work. The manager in charge of the folks doing the backup got taken to task - but nobody else did.

I_am_uncreative2y ago

I did something like this once my senior year of high school. I was using XCOPY to copy a copy of Counter Strike: Source that someone had placed on the U:/ drive, but reversed the order of arguments so that instead of copying it to my desktop, I copied everything on my desktop to the directory.

For whatever reason, we had write access to that directory, but not deletion. I had to get the teacher to delete it for me.

KronisLV2y ago

> Don't be afraid to tell us when you make a mistake. We all remember our first screw up and will be happy to help.

This is very much dependent on the circumstances - sometimes people won't be supportive or encouraging, but cold at best and toxic (rude, making fun of mistakes, letting their egos run wild) at worst. This is more likely to happen in some work cultures/companies than others, and there can also be individuals that are allowed to persist with their problematic conduct in otherwise okay environments.

If you are ever in such environments, acknowledge the fact, possibly push back against stuff like that and definitely be on the lookout for alternatives, where you'd be able to prosper.

Thankfully, my personal experiences have mostly been okay, but I've definitely seen both attitudes and choices that can make everyone's lives worse, to the point where I wrote the satirical article "The Unethical Developer's Guide to Personal Success": https://blog.kronis.dev/articles/the-unethical-developers-gu...

matsemann2y ago· 3 in thread

I wish a DELETE or UPDATE only affected a single row by default (and perhaps even wouldn't commit if it would hit multiple rows), unless a keyword for MANY or something similar was added.

Aka DELETE ALL where x == y or DELETE MANY where x == y or perhaps you need an explicit limit for it to not be 1, so DELETE where x == y LIMIT ALL

somat2y ago

The way a DELETE without a WHERE clause gets the whole table makes sense conceptually but it always gives me the creepy crawlies when typing the statement. While I normally dislike superfluous syntax I would welcome changing the sql grammar to enforce a where clause on delete, that is, to delete all rows in a table would require "delete from table where true;"

sgarland2y ago

For the MySQL CLI, you can start it with —-i-am-a-dummy to get this behavior. Or —-safe-updates if you’d rather, but the former is more fun.

sangriafria2y ago

There are some SQL GUIs that require confirmation before running an update/delete query without a where clause

user67232y ago· 3 in thread

These kinds of scenarios happen when money is "cheap", and highlight why the current recession, and coming 2nd great depression aren't really a bad thing.

dfcowellOP2y ago

This was 13 years ago in a small business with no significant investment. No “cheap money” was involved, just the realities of a small business with chronic NIH syndrome.

CoastalCoder2y ago

I think you'll need to flesh out your logic if you want to convince people that a depression is somehow a net positive.

rsynnott2y ago

... Okay, I've seen people blame monetary policy for a lot of things, but I'm really struggling to see how this one works.

quijoteuniv2y ago· 3 in thread

Use chatGPT to double check too.

throwbadubadu2y ago

"Yes, you can delete those items via that query."

"But now all has been deleted??!?! WTF HEEELP"

"I apologize for my previous answer, you are correct, you now deleted everything!"

quijoteuniv2y ago

Mmm… you can ask to explain what the query will do… yes you can mess thing up if you do not understand, therefore you try to understand. There is a lot to learn, i did not recommend running code in production frOm chatGPT, funny with downvoting

archerx2y ago

Thank you for the laugh, I can imagine someone running untested chatGPT code on prod and this actually happening.

stuaxo2y ago· 2 in thread

My manager had got me to look at backups.. but for cheap.

I decided on bacula - I had the clients installed on all the computers in the office, and it worked for some small tests.

My manager decided we would try this with a USB drive attached to one of the servers (somehow this didn't seem like a bad idea).

In the morning, very uncaffinated he sent me to the other site - an unmanned basement office with the servers.

Being uncaffinated I forgot the door password and set off the alarm.

I had to go into the office and phone him with the alarm going to get the code to turn the alarm off.

OK, that was stressful but sorted out at least.

I plugged in the hard drive to the selected server and headed back.

Once I got back it turned out all the websites on that server had gone down - trying to send all the backups to this poor USB harddrive had overwhelmed the IO on that-era Linux server and the poor thing just froze.

Fairly soon after I was let go, and joined my friends at a much more fun company making mobile games.

sacnoradhq2y ago

It's the responsibility of the technical person to uphold engineering ethics, especially in the face of potentially inadequate recovery and security solutions.

I was once let go from a big name university for refusing to weaken and rush changes to a payment processing network (PCI-DSS) when there was "no time" to review them in detail. That's a future FBI press conference sort of thing when it all comes crashing down. Not long after, all SS#s, DOBs, and deets for every employee was stolen from a "rogue" laptop taken by a consultant, likely to be sold on carder and identity theft forums because of an utter failure at data protection processes. That place was a shitshow because they didn't have the professional ethics or leadership backbone to do what was prudent and necessary.

kubanczyk2y ago

Yeah, for a surprisingly long time, linux had a big write cache problem. You were okay if you used all fast-writing devices. You were okay if you used all slow-writing devices. But if you mixed them, the slow writes could totally fill the write cache and starve everything else.

Not only an USB (1?) could cause it but also a cifs transfer over a 100 Mbps link.

mootzville2y ago· 2 in thread

I was taught to:

1. Write your WHERE clause first 2. Return to the beginning of the line to finish writing the statement 3. Check your statement 4. If it looks good, then -- and only then -- add your closing semicolon

Having said that, once during my second week at a new company, I plugged in an ethernet cable to an APC UPS, so I could set up networking on it. It shut down production. Why? APC makes (for that model at least) proprietary ethernet cables for networking, and if you plug in a regular cable it does an autoshutdown...an engineers attempt at marketing perhaps!? I did RTFM before, and after out of confusion, and there was no mention of this.

sokoloff2y ago

I will typically do something like this when writing SQL in an interactive tool:

  SELECT * from table
  WHERE id = 12345

and once that's giving me the selection I want, insert the update statement into the middle:

  SELECT * from table
  -- UPDATE table set c1 = v1, c2 = v2
  WHERE id = 12345

Then, accidentally running the entire buffer doesn't do anything destructive, but selecting the query from update to the end of the statement lets me do the update. (It's still imperfect, because selecting only the update line will still be destructive.)

(Most of the RDBMS tools that I've used would happily ignore the lack of a closing semi-colon and that will not save you for a single-statement case.)

mootzville2y ago

True, thought I didn't include the lookup first, it is how it happens in practice. Also, we use the mysql client, so semicolon is a must, and I would avoid anything that let you submit the statement without it...that's your safety net.

hahamaster2y ago· 2 in thread

Decades ago my ex colleague was supposed to enter a command handwritten on a piece of paper saying "rm -rf /var/log/blah/" which she typed in as "rm -rf /var / log/blah / ". Everyone knows it's awesome to insert white space to increase legibility. It was a production database server.

jorgesborges2y ago

That’s an insane command to pass to an untrained person on a piece of paper to type into the terminal of a production server.

hahamaster2y ago

Agreed. It was "only" supposed to truncate the log folder. ;-)

headline2y ago· 2 in thread

I've just started my career - I wonder what my first fuck-up will be :)

james-skemp2y ago

Own up to it when it happens. Can only get worse if you don't.

Hopefully you work with people that follow that as well.

I can still remember a more senior coworker copying a directory from a network share (Ctrl + C), deleting the directory, and then trying to paste it somewhere else.

I didn't speak up (he also rarely touched the mouse so flew through it), so we both got a chuckle when he realized his mistake and pinged network folks for a backup from tape.

It happens. The real world is messy. Can always learn something new. Can always do something you know not to do.

jdsalaro2y ago

Don't wonder! Plan it and execute it so you can be more chill !!

rrrrrrrrrrrryan2y ago· 1 in thread

A few months into my first job out of college, I brought down the main production server in the middle of the workday. It took us about an hour to recover. Afterward, I was very embarrassed and apologetic, but my boss just shrugged and said:

"You're not a real technology worker until you've brought the company down. Welcome."

Might not be the best words to live by, but it was exactly what I needed to hear at that time early in my career.

ahazred8ta2y ago

Frequently heard in batcaves: "Hey, the FNG børked Prod."

"One of us!! One of us!!" :-)

IronWolve2y ago· 1 in thread

My first sysadmin job at a call center, the call center reps use the same directory for all the users. And, I'm working tickets to delete old users accounts...

The old grey haired sysadmin backs up the directory so he can instantly restore it. Seems this happens all the time.

Whew.

limaoscarjuliet2y ago

When you really mess up a call center, it is called a gopher event - entire floor having lost their audio suddenly stands up in their cubicles and looks around. Been there, caused that.

Nextgrid2y ago· 1 in thread

In the top bar of the site:

> Sorry! Subscriptions were broken last week, but are now working. If you tried to subscribe and ran into issues, please try again!

I wonder if a similar incident involving a “UPDATE subscriptions” query happened recently.

dfcowellOP2y ago

Ha, no, that was a far more mundane issue. The CMS I'm using requires double opt-in to subscribe, meaning you need to enter your email address and click the confirmation link.

It also apparently requires double email configuration, meaning it has two places where you can configure your mailer. I had only set it up in one place, meaning the confirmations never got sent.

Bit of a facepalm moment.

m348e9122y ago· 1 in thread

We had this saying: "If you fix it before anyone realized it was broken, you didn't break it."

throwaway36182y ago

Depending on the situation it can be risky to start thinking like this, trying to fix stuff with lots of stress, in secret, maybe working around access restrictions, it's easy to make further mistakes

justsomehnguy2y ago· 1 in thread

pfft

   rmdir . /s /q

only to notice what I am in the wrong folder, way up in the hierarchy.

And repeat the same error years later when the batch file failed to cd to the destination folder. Added

   if %CD% == %DESTDIR%

to avoid that problem.

theduder992y ago

lol thanks.

drsim2y ago

I wrap all of my production manipulations in a transaction, and commit only if the results are expected. Yes, it may take locks that block customer-facing transactions, so I have selects ready to go in the transaction to minimise this.

15 years and counting since wiping out a large production table and taking a day to restore from backup.

lormayna2y ago

I did something worst many years ago: I was working for a regional ISP and during a major incident, I had to reroute traffic through a different path. Under big pressure, I did the infamous Cisco mistake "switchport trunk allowed vlan 50" instead that "switchport trunk allowed add vlan 50" and I locked out myself and all the customer from our broadband customers. We had to call a DC technician and ask him to share a console through a local console server.

Lesson learned: even if you are under big pressure take your time to plan and review the modification 15 minutes can save hours.

nonameiguess2y ago

Early on when I'd first started making the transition from pure developer role working only on product to a platform role running the development environment, I was encountering problems with build scripts on CI servers leaving behind a bunch of dead symlinks. Tired of tracking them down manually, I wrote a nice script that automatically found all dead symlinks and deleted them.

It turned out, for some arcane reason I still don't understand, our production instance of Artifactory was running on top of Docker Compose with host path volume mounts, and somehow, symlinks that were not valid from the perspective of the host actually were valid from inside the container, and doing this on all of our servers broke Artifactory. For some even stupider reason, we weren't doing full filesystem-level snapshots at any regular interval (which we started doing after this), so instead I needed to enlist the help of the classic wizard ninja guy who had been acting as a mostly unsupervised one-man team for the past six years who had hacked all of this mess together, documented none of it, and was the only person on the planet who knew how to reconstruct everything.

This was probably still only the second-stupidest full on-prem lab outage I remember, behind the time the whole network stopped working and the only person who had been around long enough remembered they had trialed a temporary demo hardware firewall years earlier, management abandoned the evaluation effort, and it somehow remained there as the production firewall for years without ever being updated before finally breaking.

alexmolas2y ago

The other learning I get from this story is "never hide your errors" or "own your errors as you own your victories".

If the author had decided to say nothing the problem would have been bigger - an unhappy boss and probably fired.

thombat2y ago

One day my colleague was wondering whether RegEdit used some private API for renaming keys or just copied + deleted them. "Try a rename with a big tree, see if it's still instantaneous" I helpfully proposed. But what's a really big tree? How about System\Windows? The rename completed instantaneously - "told you it's a private API" I said happily, just as the machine crashed in twenty ways at once.

_cenw2y ago

I did worse.

We had a redis with sessions. Early on, someone decided every write to redis should also cause a write to S3 as backup. My first task was to get rid of this 4-digits a month extra cost in PUT requests. I decided to instead write all changed session objects into a set keyed by half-hour timestamps and then write only those sessions every 30 minutes. Unfortunately initially I used a KEYS to find the set corresponding to my half-hour stamp, not having read up exactly on what it does. It's not exactly advisable to do on a redis with a million or so objects. A later version of the archiver wrote the last emptied set to a stable key instead and then checked the set keys between then and now instead...

nytesky2y ago

For some reason your title made me think of this classic IT Crowd

https://youtu.be/Vywf48Dhyns

makach2y ago

A charming story which almost everyone can relate to! Only one of your rules will ever save you. “Don't run updates directly in the database console”. Whole methodologies are crafted around this rule/principle to not do development in production environment.

alarge2y ago

Roughly 10 years ago, I was working for a startup that offered a live conversational video service where you could also have hundreds (or eventually, thousands) of near-live watchers - with recording and later playback. The founder pitched the service to news orgs and celebrities. Anderson Cooper had a regular "show" there for a while, and we had a number of interviews with mostly 2nd-tier celebrities.

When the service started, they made the decision to not actually delete any content (delete just set a flag which disabled the content but didn't actually remove it).

Fast forward a year or so, and it became clear that a real delete was needed. So they had a junior engineer write up a sort of delayed sweep - delete all the videos with the delete flag set. But then, for some reason, they decided put the implementation behind a delay. Something like "actually delete all soft-deleted videos, but don't start doing it until 30 days from now". However, unbeknownst to the team, there was a bug in the implementation that deleted everything, regardless of whether the 'delete' flag was set.

So one night, roughly a month later, all the content started disappearing from the site. One guy heroically tried to stop the process, but I think he was too late. The engineering director happened to be on a vacation down in South America somewhere and I think the founder fired him in a fit of pique. I managed to reclaim a small bit of content (basically the videos that were cached on the actual recording servers before they were uploaded to S3).

You can imagine the technical over-reaction:

  * Delete switched back to a soft delete
  * Turned on S3 object versioning
  * Started redundantly copying content onto a totally different hosting service

This was fine (hah!) until we had to start taking down the inevitable child porn that always shows up on services like this - I got stuck with writing the takedown code and it took me forever to track down all the various tendrils of stuff.

As you might expect, we lost a ton of users over mass content deletion and the service never really rebounded. The company held on for a couple more years, pivoting a couple of times, but eventually folded.

Animats2y ago

If you locked out everyone today, many would assume they had been laid off. Some of them would have other jobs before the login problem was fixed.

FreshStart2y ago

5. Deadlines do more damage than good, but the costs they produce get swiped under the rug by company internal accounting.

liampulles2y ago

I switched companies, and moving from a MySQL cli to pgadmin is a godsend. Would still like a confirmation dialogue, but having to click a button seems less error prone than pressing enter too quickly.

simonblack2y ago

Ah yes, the old lack of 'where' clause.

Did much the same, but only on my own system (thankfully!) and yes, I had an up-to-date backup MySQL dump on hand.

iJohnDoe2y ago

FTA - Make friends with your sysadmin, and stay in their good books at all costs.

I think this goes against every principle most on HN have.

Gordonjcp2y ago

I had a thing recently where someone was updating some entries on a system I look after, decided the changes hadn't applied properly, and clicked "Roll back" to put it back to its original state.

Whatever had gotten into it, it rolled back to 2009. It rolled everything back, including user accounts.

No-one who worked there in 2009 still worked there, so no-one had a valid password any more.

Fortunately it was easy enough to copy the last-but-one backup over the top and lose the day before's config updates, and cure its Flowers for Algernon state, but it was a pretty hairy afternoon.

j / k navigate · click thread line to collapse

125 comments

114 comments · 32 top-level

freetanga2y ago· 25 in thread

Always do a select with your criteria before doing a Delete or update.

Don’t ask me how I learned this.

justinclift2y ago

At a previous place I worked, if they were working on the cli (eg in psql or similar) they'd always use these two steps, either of which would provide adequate protection:

1. Start a transaction before even thinking of writing the delete/update/etc (BEGIN; ...)

2. Always write the WHERE query out first, THEN go back to the start of the line and fill out the DELETE/UPDATE/etc.

It worked well, and it's a habit I've since tried to keep on doing myself as well.

_nalply2y ago

Once I wanted to do `rm -fr *~` to delete backup files, but the `~` key didn't register...

Now I have learnt to instinctively stop before doing anything destructive and double-check and double-check again! This also applies to SQL `DELETE` and `UPDATE`!

I know that `-r` was not neccessary but hey that was a biiiig mistake of mine!

tetha2y ago

For entirely critical systems, I by now rather generate a reviewable script and run that. Something along the lines of:

    find /backups/ -mtime '-30' -printf 'rm -f %p\n' > very_scary_deletes.sh

This gives you a static script with a bunch of rm's. You can read that, check it, give it to people to validate and when you eventually run it, it deletes exactly those files.

_nalply2y ago

That's clever, only do really dangerous things if you have a way to carefully review the steps.

Another way is sometimes: `echo *.txt~` and when you like the result, replace `echo` with `rm`.

1 more reply

ht852y ago

If you ever type really dangerous commands, it is good practice to prefix them with a space (or whatever your favorite shell convention is) to make sure they not saved in your history.

One of my "oopsies" involved aggressive <up> and <enter> usage and a previous `rm -rf *` running in /home...

CoastalCoder2y ago

I didn't know about the space thing!

Though for me, I think the greater risk would be from not having a record of what I'd run.

_shantaram2y ago

xp842y ago

I’m really curious: without cheating by using the GUI, what would be the proper way to delete such an obnoxiously-named directory? Would “$PWD/~” work?

1 more reply

laurensr2y ago

You remind me of that time I wanted to `rm -rf ./*` but the dot hadn't registered... I now avoid that statement.

lostmsu2y ago

I click in Explorer. Have to confirm too. Never been confused.

jojobas2y ago

>Don’t ask me how I learned this.

Joke's on you, we all learned it this exact way.

seanthemon2y ago

This gives my heart pains I can't explain to my therapist..

maccard2y ago

Luckily I learned from someone else. I did figure out start with a transaction the hard way though

fernandotakai2y ago

i can still feel the pain in the pit of my stomach when i saw the amount of rows affected.

thankfully my boss at the time was a amazing db admin and he helped me fixing my mistake.

gumballindie2y ago

Came here to say exactly that! I do a select count and then a limit against the returned count. At least it may reduce the blast radius.

menacingly2y ago

In most of my scenarios, I'd actually rather cause a catastrophic global change than a silent subtle corruption of a handful of rows

gumballindie2y ago

1 more reply

sgarland2y ago

This can also bite you if your dataset is larger than the buffer pool (or whatever other RDBMS calls it), and the particular table you’re querying isn’t commonly accessed.

Turns out when you start loading millions of rows of useless data into memory, the useful data has to get kicked out, and that makes query latency skyrocket.

mjcohen2y ago

Same thing on a Unix/Linux level: When using find, I always do it first with -print until I see the files I want. Only then do I add the actual action I want.

tgv2y ago

Then start a transaction, and only commit when the row numbers match.

chrisandchris2y ago

That, and some DB tools (line JetBrains DataGrip) block UPDATEs and DELETEs without a WHERE condition.

iJesus2y ago

Gone are the days of hard deletes in my approach; it's exclusively soft deletes now!

okeuro492y ago

In the EU soft deletes can violate GDPR, so not always an available strategy.

vntok2y ago

Soft delete items immediately, purge the soft deleted rows automatically after 30 days or so.

lemper2y ago

how did you learn it?

fukawi22y ago· 14 in thread

It has a fun quirk though - if your SELECT matched no rows, the state would be empty. So SELECT foo WHERE 1=2 would select nothing.

UPDATE and DELETE are perfectly valid actions even without a state...

Working late one night, I ran a SELECT STKM WHERE something that matched nothing, then before I realised I realised my state had no rows matched, I followed up with DELETE STKM.

Yep, the entire Stock Movements table four the last 20+ years of business were gone.

The nightly backup had not run, and I didn't want to lose an entire day of processing to roll back to the previous night.

cvoss2y ago

> if your SELECT matched no rows, the state would be empty

> UPDATE and DELETE are perfectly valid actions even without a state

bombolo2y ago

At university I had to use a programming language to do some specific calculations.

It was obviously designed by someone who had not taken any courses in formal languages, compilers and algorithms.

It had a fun quirk, the final result would change depending on which names you used for your variables.

When the phd student guiding our lab session told me to not use underscores in my variable names, I thought i'd humor her just to show her it was a stupid suggestion. But she knew more than me.

knodi1232y ago

I learned a lot from that job. Including when to keep my mouth shut!

nikau2y ago

A lot of older systems were like that before they became more commodity items with guard rails.

Some of the early SAN disk arrays used to use an assembly like config interface with no sanity checks, type in the wrong command and you could wipe the whole array.

The company I worked for at the time had a blanket policy that any change had to be implemented by the vendor to shift liability for any mistakes.

fukawi22y ago

You're probably right, but back then it was assumed you knew what you were doing I guess. The old idiom of "Linux won't stop you shooting yourself in the foot" was even more true of this system.

dylan6042y ago

>It's a database, programming language and OS all in one with roots in tracking military helicopter parts in the 1960's. I was working on it in the mid-2000s.

albert_e2y ago

You recreated a production database of 20 years data by hand overnight?

You deserve an award !!!

fukawi22y ago

Yep, the entire table. I figured since I had to write it to do the day at a minimum, doing the whole table would help me find corner cases and errors since it was a much larger range of data

testemailfordg22y ago

Only that day's data was added on top of backup from last night...

unnouinceput2y ago

he only recreated the ones not saved by the nightly backup, meaning he used the previous nightly backup and recreated from invoices the last ~23 hours.

fukawi22y ago

No, I did the entire table. That helped me find corner cases and errors in my logic having such a bigger range of source data.

3 more replies

vivekv2y ago

EricBetts2y ago

PICK is giving be flashbacks of my tenure at ADP’s dealership services (I think it became CDK)

nivethan2y ago

I deal with D3 pick and the fear I have of doing select bp and catalog bp is insane. The entire system is a mess so wiping the md would be catastrophic. :)

croo2y ago· 10 in thread

My chooosen database explorer is Dbeaver. Horrible name but great app.

You can set colours for local/test/prod servers and a red colored tab will scream at you to be cautious. And with red color every edit will pop up an "are you sure?" question. And autocommit is off.

I sorta stopped making unrecoverable mistakes.

nucleardog2y ago

Autocommit being off took prod down for me.

Hop in the test database, run some SQL, update a bazillion records, run some selects to verify, open the test app and verify there. All good! Someone else checks… LGTM.

So I do the same thing in prod. Make changes, validate in the DB, looks good. Open the prod app… down. Requests are hanging.

And that’s the boring story of the day I learned that dbeaver used transactions in prod connections by default.

reaperducer2y ago

My chooosen database explorer is Dbeaver

I just checked it out, and it looks promising.

Con: There are too many licensing options. I'm not even sure which one I would need.

fbdab1032y ago

Start with the Community Edition and see where it takes you.

loa_in_2y ago

As if two guys in a basement couldn't send out an invoice

Lvl999Noob2y ago

The parent isn't saying that they couldn't. They are saying that they shouldn't advertise their two-guy-ness in the financial parts of their company.

fbdab1032y ago

My only complaint about DBeaver is how powerful it is -I feel like I have only ever scratched 1% of the functionality it offers. I need a tutorial class on all of the goodies the platform can do.

All of the tutorials I seem to find online are basic level and/or mixed with introducing SQL.

jareklupinski2y ago

i'm using it on Mac, and it seems to want to update every time I turn it on... i get that having timely updates is great, but the way to update the app on a mac is:

and you can get back to work! now if only I could remember all the details that were in my head when i opened the app in the first place...

dbeaver team: can you do what paint.net did please? that is, only bother me to update as i'm turning the app off, not on?

throitallaway2y ago

This is more of a Mac problem than anything, and why I'm such a big fan of package managers.

acomjean2y ago

DBeaver is great. I came from a different tool, and had to get used to persisting the changes. I’m totally use to it.

I run a local copy of the database I connect to remotely and I have to be careful about which database/table I’m connected too.

psnehanshu2y ago

I love DBeaver too.

cesaref2y ago· 8 in thread

jdsalaro2y ago

> can you rollback a drop table

The first thing I did after the recovery marathon was to alias rm so that it instead works by moving stuff to /tmp

eCa2y ago

In the late 90’s I wanted to try this Linux thing, so I followed a tutorial. First step:

    fdisk

Yes, that was my Windows partition going bye-bye.

CoastalCoder2y ago

So... success? ;)

2 more replies

euroderf2y ago

My boss was doing housecleaning in his PDP-11 (Unix v.6) account and did a "rm *" in the bin directory but OOPS it was the system bin.

noAnswer2y ago

inb4: /tmp partition gets full. Causing things to brake. :-)

kubanczyk2y ago

No problem, just use rm ;)

msravi2y ago

kayodelycaon2y ago

Then come the “site is down and the world is ending” tickets, messages, emails, and phone calls.

philderbeast2y ago· 3 in thread

"All of your colleagues have done something dumb. Don't be afraid to tell us when you make a mistake. We all remember our first screw up and will be happy to help."

Never have truer words been spoken.

russfink2y ago

I_am_uncreative2y ago

For whatever reason, we had write access to that directory, but not deletion. I had to get the teacher to delete it for me.

KronisLV2y ago

> Don't be afraid to tell us when you make a mistake. We all remember our first screw up and will be happy to help.

If you are ever in such environments, acknowledge the fact, possibly push back against stuff like that and definitely be on the lookout for alternatives, where you'd be able to prosper.

matsemann2y ago· 3 in thread

I wish a DELETE or UPDATE only affected a single row by default (and perhaps even wouldn't commit if it would hit multiple rows), unless a keyword for MANY or something similar was added.

Aka DELETE ALL where x == y or DELETE MANY where x == y or perhaps you need an explicit limit for it to not be 1, so DELETE where x == y LIMIT ALL

somat2y ago

sgarland2y ago

For the MySQL CLI, you can start it with —-i-am-a-dummy to get this behavior. Or —-safe-updates if you’d rather, but the former is more fun.

sangriafria2y ago

There are some SQL GUIs that require confirmation before running an update/delete query without a where clause

user67232y ago· 3 in thread

These kinds of scenarios happen when money is "cheap", and highlight why the current recession, and coming 2nd great depression aren't really a bad thing.

dfcowellOP2y ago

This was 13 years ago in a small business with no significant investment. No “cheap money” was involved, just the realities of a small business with chronic NIH syndrome.

CoastalCoder2y ago

I think you'll need to flesh out your logic if you want to convince people that a depression is somehow a net positive.

rsynnott2y ago

... Okay, I've seen people blame monetary policy for a lot of things, but I'm really struggling to see how this one works.

quijoteuniv2y ago· 3 in thread

Use chatGPT to double check too.

throwbadubadu2y ago

"Yes, you can delete those items via that query."

"But now all has been deleted??!?! WTF HEEELP"

"I apologize for my previous answer, you are correct, you now deleted everything!"

quijoteuniv2y ago

archerx2y ago

Thank you for the laugh, I can imagine someone running untested chatGPT code on prod and this actually happening.

stuaxo2y ago· 2 in thread

My manager had got me to look at backups.. but for cheap.

I decided on bacula - I had the clients installed on all the computers in the office, and it worked for some small tests.

My manager decided we would try this with a USB drive attached to one of the servers (somehow this didn't seem like a bad idea).

In the morning, very uncaffinated he sent me to the other site - an unmanned basement office with the servers.

Being uncaffinated I forgot the door password and set off the alarm.

I had to go into the office and phone him with the alarm going to get the code to turn the alarm off.

OK, that was stressful but sorted out at least.

I plugged in the hard drive to the selected server and headed back.

Fairly soon after I was let go, and joined my friends at a much more fun company making mobile games.

sacnoradhq2y ago

It's the responsibility of the technical person to uphold engineering ethics, especially in the face of potentially inadequate recovery and security solutions.

kubanczyk2y ago

Not only an USB (1?) could cause it but also a cifs transfer over a 100 Mbps link.

mootzville2y ago· 2 in thread

I was taught to:

sokoloff2y ago

I will typically do something like this when writing SQL in an interactive tool:

  SELECT * from table
  WHERE id = 12345

and once that's giving me the selection I want, insert the update statement into the middle:

  SELECT * from table
  -- UPDATE table set c1 = v1, c2 = v2
  WHERE id = 12345

(Most of the RDBMS tools that I've used would happily ignore the lack of a closing semi-colon and that will not save you for a single-statement case.)

mootzville2y ago

hahamaster2y ago· 2 in thread

jorgesborges2y ago

That’s an insane command to pass to an untrained person on a piece of paper to type into the terminal of a production server.

hahamaster2y ago

Agreed. It was "only" supposed to truncate the log folder. ;-)

headline2y ago· 2 in thread

I've just started my career - I wonder what my first fuck-up will be :)

james-skemp2y ago

Own up to it when it happens. Can only get worse if you don't.

Hopefully you work with people that follow that as well.

I can still remember a more senior coworker copying a directory from a network share (Ctrl + C), deleting the directory, and then trying to paste it somewhere else.

I didn't speak up (he also rarely touched the mouse so flew through it), so we both got a chuckle when he realized his mistake and pinged network folks for a backup from tape.

It happens. The real world is messy. Can always learn something new. Can always do something you know not to do.

jdsalaro2y ago

Don't wonder! Plan it and execute it so you can be more chill !!

rrrrrrrrrrrryan2y ago· 1 in thread

"You're not a real technology worker until you've brought the company down. Welcome."

Might not be the best words to live by, but it was exactly what I needed to hear at that time early in my career.

ahazred8ta2y ago

Frequently heard in batcaves: "Hey, the FNG børked Prod."

"One of us!! One of us!!" :-)

IronWolve2y ago· 1 in thread

My first sysadmin job at a call center, the call center reps use the same directory for all the users. And, I'm working tickets to delete old users accounts...

The old grey haired sysadmin backs up the directory so he can instantly restore it. Seems this happens all the time.

Whew.

limaoscarjuliet2y ago

When you really mess up a call center, it is called a gopher event - entire floor having lost their audio suddenly stands up in their cubicles and looks around. Been there, caused that.

Nextgrid2y ago· 1 in thread

In the top bar of the site:

> Sorry! Subscriptions were broken last week, but are now working. If you tried to subscribe and ran into issues, please try again!

I wonder if a similar incident involving a “UPDATE subscriptions” query happened recently.

dfcowellOP2y ago

Ha, no, that was a far more mundane issue. The CMS I'm using requires double opt-in to subscribe, meaning you need to enter your email address and click the confirmation link.

It also apparently requires double email configuration, meaning it has two places where you can configure your mailer. I had only set it up in one place, meaning the confirmations never got sent.

Bit of a facepalm moment.

m348e9122y ago· 1 in thread

We had this saying: "If you fix it before anyone realized it was broken, you didn't break it."

throwaway36182y ago

justsomehnguy2y ago· 1 in thread

pfft

   rmdir . /s /q

only to notice what I am in the wrong folder, way up in the hierarchy.

And repeat the same error years later when the batch file failed to cd to the destination folder. Added

   if %CD% == %DESTDIR%

to avoid that problem.

theduder992y ago

lol thanks.

drsim2y ago

15 years and counting since wiping out a large production table and taking a day to restore from backup.

lormayna2y ago

Lesson learned: even if you are under big pressure take your time to plan and review the modification 15 minutes can save hours.

nonameiguess2y ago

alexmolas2y ago

The other learning I get from this story is "never hide your errors" or "own your errors as you own your victories".

If the author had decided to say nothing the problem would have been bigger - an unhappy boss and probably fired.

thombat2y ago

_cenw2y ago

I did worse.

nytesky2y ago

For some reason your title made me think of this classic IT Crowd

https://youtu.be/Vywf48Dhyns

makach2y ago

alarge2y ago

When the service started, they made the decision to not actually delete any content (delete just set a flag which disabled the content but didn't actually remove it).

You can imagine the technical over-reaction:

  * Delete switched back to a soft delete
  * Turned on S3 object versioning
  * Started redundantly copying content onto a totally different hosting service

Animats2y ago

If you locked out everyone today, many would assume they had been laid off. Some of them would have other jobs before the login problem was fixed.

FreshStart2y ago

5. Deadlines do more damage than good, but the costs they produce get swiped under the rug by company internal accounting.

liampulles2y ago

simonblack2y ago

Ah yes, the old lack of 'where' clause.

Did much the same, but only on my own system (thankfully!) and yes, I had an up-to-date backup MySQL dump on hand.

iJohnDoe2y ago

FTA - Make friends with your sysadmin, and stay in their good books at all costs.

I think this goes against every principle most on HN have.

Gordonjcp2y ago

I had a thing recently where someone was updating some entries on a system I look after, decided the changes hadn't applied properly, and clicked "Roll back" to put it back to its original state.

Whatever had gotten into it, it rolled back to 2009. It rolled everything back, including user accounts.

No-one who worked there in 2009 still worked there, so no-one had a valid password any more.

Fortunately it was easy enough to copy the last-but-one backup over the top and lose the day before's config updates, and cure its Flowers for Algernon state, but it was a pretty hairy afternoon.

j / k navigate · click thread line to collapse