Tell HN: Docker just ate 19GB of production data

75 pointsfhackenberger6y ago53 comments

Be very careful with the live-restore feature of docker. Running 'docker volume prune' just removed all my named volumes, which were used by running containers.

See https://github.com/moby/moby/issues/38883

53 comments

gervu6y ago

Automation of any sort will sometimes accidentally your data, whether due to periodic hiccups, system instabilities and bugs, operator misunderstandings or errors, or random cosmic ray strikes.

The exact reason it blows up isn't even necessarily all that important, other than in its effect on what you should be doing to reduce the probability of downtime. Well-engineered systems are routinely developed from less than completely reliable parts. Stuff fails, we design for it.

It's certainly not reason not to use it, if it's resulting in a net positive gain in your ability to get things done and maintain control and transparency over your deployed systems.

But it's certainly a good reason (among a long list of good reasons) to make sure you have a good backup routine in place, including regular testing of both their integrity and your ability to restore a working prod system from them quickly.

dsfyu404ed6y ago

>Automation of any sort will sometimes accidentally your data,

Distributed scalable automation will accidentally your data slightly more often. The more stuff you have the more edge cases and bugs you have.

Scale big fail big as I like to say.

boobsbr6y ago

Accidentally what? =)

TrinaryWorksToo6y ago

Delete, but the word delete is deleted. Yo dawg.

aukaost6y ago

accidentally the whole thing

goalieca6y ago

You weren’t around in the slashdot days were you?

hinkley6y ago

your data.

jtchang6y ago

This definitely sounds like a bug.

docker volume prune says:

"Remove all unused local volumes. Unused local volumes are those which are not referenced by any containers"

If it removed a local volume that was being used by a container that is kinda bad.

orf6y ago

1. Why are you running docker volume prune in production?

2. Why are you running docker on ad-hoc machines you need to prune?

3. Why do you even need root access on production machines to fiddle around with docker commands?

While this is obviously a bad bug (and there are many with Docker), it seems more of an operational procedures failure than anything else. You could be saying:

“Beware of rm -rf /, it just deleted 20gb of production data”

Ok. Sure. But why are you tools and procedures putting yourself in a position to make that mistake?

reaperducer6y ago

One of the most bothersome part of HN is when someone tells us about something that happened, and out come a ream of second-guessing replies. "Why didn't you just do this?" and Why didn't you just do that?" and any number of "It's so easy to just thing instead!"

We don't know his environment. We don't know his company's policies. We don't know his hardware, connectivity, or budget issues. These kinds of passive aggressive responses are almost never helpful.

orf6y ago

When you reduce it down the title here is “giving people access to running arbitrary, manual and presumably unrestricted maintenance commands in production leads to issues”.

That’s not a surprise, and maybe the issue at the core here is not really Docker. That’s all.

james_s_tayler6y ago

I agree with both of you. It's not helpful to not know the context and the op wasn't necessarily in control of it. But at the same time if you are someone in control of the context (which you aren't really if you are a line level employee) you should be aware this is a bad pattern. If you are a line level employee and this is being imposed on you for some reason or other you should sound an alarm if you know to "hey, for the record - this is a monumentally bad idea - just saying".

I've seen plenty of stuff in my career where I've gone on record to say "hey - we really shouldn't do this". Nothing got done about it. But hey, I did what I could.

Recently I learned about Rasmussen's dynamic safety model. I think this is a very handy mental model to have. It's the human factors that make what we do really hard. Often line level practitioners know better than they are allowed to do in practice and trying to fight organizational politics to Do The Right Thing can be an uphill battle.

q_queue6y ago

Sure, but regardless of people doing dumb things, it's still worth asking "why did docker delete non-orphaned named volumes?" -- though you could also question whether someone was actually mistaken about them not being "orphaned" - you could probably arrange an unfortunate timing collision between someone running prune and a container being respawned.

1 more reply

int0x806y ago

Well, no. rm -rf / is a completely different beast. It is documented and expected for starters.

They may have valid reasons to do that, even if not common.

sethammons6y ago

thinking of the `rm -rf` one, here is a fun take:

  export $WORKDIR=Home/me/proj
  ...
  rm -rf /$WORKDIR

If something unsets $WORKDIR or does not set it at all, wave bu-bye to your everything. And before you say "who would do that?!" -- I believe I heard that happened to a build of RedHat that also had some kind of force push and auto-pull and build on their version control so every connected person had their version of the software nuked. If not for the non-connected individuals, the entire software would be gone apparently. Or so the legend goes.

remram6y ago

Also the Steam Linux client.

sz4kerto6y ago

I really-really hope you are not relying on Docker only when protecting 19G of data. Docker volume operations are the equivalent of playing with sudo rm -rf, shit's going to happen once in a while.

scarface746y ago

I am a Docker newbie but I thought it wasn’t considered best practice to use Docker for anything where you care about the data in the container. I’ve only used for API’s batch jobs, etc.

markbnj6y ago

You're correct that container root file systems should be considered ephemeral, and writing anything that needs to be persisted to them is a smell. However you can mount persistent volumes to a container specifically for the purpose of deploying stateful applications and referencing persistent data. How safe you can consider that data to be depends on the underlying tech that is provisioning the storage. For example a GCE persistent disk with the retain flag set is probably pretty damn durable. I would still back it up, however.

scarface746y ago

What advantage does Docker really give you for long live mostly stable resources like databases?

For batch processing, my usual pattern has always been to move the data from (slower) network storage to local storage, process, move results back to network storage.

1 more reply

cheez6y ago

Docker volumes are persistent, unless they're not :-)

q_queue6y ago

I've never been willing to consider docker volumes persistent. In the big picture, a requirement of "posix filesystem semantics" and "persistent" is a pretty inconvenient and/or expensive requirement.

1 more reply

scarface746y ago

Well, I guess it also doesn’t help that I’ve only used Docker with AWS Fargate (aka “Serverless Docker”) - where nothing is persistent.

praseodym6y ago

In the Moby issue you mention that you are using live restore (https://docs.docker.com/config/containers/live-restore/) which is most likely where the problem is. Docker daemon restarts, existing containers are kept alive, but the restarted Docker daemon doesn’t know about those existing containers yet and thus thinks their volumes are unused.

RocketSyntax6y ago

Not sure what kind of company you work at, but I'd export a copy of your logs so you don't get canned

stcredzero6y ago

This makes it sound like it's quite common to use docker containers operating in a heavily stateful fashion. Is that indeed common nowadays? (Though, the state in this case is only counted on to persist in the named volumes.)

sz4kerto6y ago

That's completely fine given that the important state is on volumes (named, persistent, bind mounted, whatever).

johnchristopher6y ago

Well, you are supposed to be able to delete all your volumes, containers and networks and then regenerate it by running the recettes (edit:I mean.. recipes :D) (Dockerfile, docker-compose, kubernetes, volume backup, etc.).

stcredzero6y ago

Well, you are supposed to be able to delete all your volumes, containers and networks and then regenerate

So then Docker is designed to treat all of those as disposable.

I just searched "recette" and only came up with French cooking references.

johnchristopher6y ago

> So then Docker is designed to treat all of those as disposable.

I don't really know if it's designed like that but I treat them as disposable and unreliable so I must have a way to resuscitate the thing when something bad happens.

ashtonbaker6y ago

> I just searched "recette" and only came up with French cooking references.

Perhaps a phonetic spelling of "resets" by a French person :)

1 more reply

dunham6y ago

It's french for "recipe".

wiredfool6y ago

You just won the “I dropped the production db” achievement.

It’s surprisingly easy with docker, especially when dealing with .... legacy systems.

acid3036y ago

My browser ate 16GB of ram while I've been reading this. The system crashed but the tabs were here there after a reboot. I'm not even mad anymore.

LiamPa6y ago

> please assign this bug to an engineer.

The joys of open source users..

ironmagma6y ago

Seems like a reasonable ask.. I’m not sure what the problem with this is?

LiamPa6y ago

Maintain an open source project and you will understand the problem of users having the ‘just fix it’ attitude.

ironmagma6y ago

I have maintained open source projects before. Docker is a venture-backed company trying to sell its product which is mostly just hosting and support for their free product... and in addition, the user did ask nicely, they did not make demands. There's a big difference between asking that an engineer be assigned to a task and demanding something be fixed immediately for no bounty.

DannyB26y ago

Computers are wonderful. They can do the same work that would require a thousand people to accomplish in the same amount of time.

Flip side . . .

Computers are terrible. They can screw things up so bad it would require a thousand people to accomplish in the same amount of time.

frenchman996y ago

`docker volume prune` is specifically there to remove volumes, so backing up before using it seems to be mandatory, just in case. But yeah, if this is a bug, it's a nasty one.

clinta6y ago

This bug specifically says it affects anonymous volumes. If you had it delete a named volume that sounds like a new issue.

j / k navigate · click thread line to collapse

53 comments

gervu6y ago

Automation of any sort will sometimes accidentally your data, whether due to periodic hiccups, system instabilities and bugs, operator misunderstandings or errors, or random cosmic ray strikes.

It's certainly not reason not to use it, if it's resulting in a net positive gain in your ability to get things done and maintain control and transparency over your deployed systems.

dsfyu404ed6y ago

>Automation of any sort will sometimes accidentally your data,

Distributed scalable automation will accidentally your data slightly more often. The more stuff you have the more edge cases and bugs you have.

Scale big fail big as I like to say.

boobsbr6y ago

Accidentally what? =)

TrinaryWorksToo6y ago

Delete, but the word delete is deleted. Yo dawg.

aukaost6y ago

accidentally the whole thing

goalieca6y ago

You weren’t around in the slashdot days were you?

hinkley6y ago

your data.

jtchang6y ago

This definitely sounds like a bug.

docker volume prune says:

"Remove all unused local volumes. Unused local volumes are those which are not referenced by any containers"

If it removed a local volume that was being used by a container that is kinda bad.

orf6y ago

1. Why are you running docker volume prune in production?

2. Why are you running docker on ad-hoc machines you need to prune?

3. Why do you even need root access on production machines to fiddle around with docker commands?

While this is obviously a bad bug (and there are many with Docker), it seems more of an operational procedures failure than anything else. You could be saying:

“Beware of rm -rf /, it just deleted 20gb of production data”

Ok. Sure. But why are you tools and procedures putting yourself in a position to make that mistake?

reaperducer6y ago

We don't know his environment. We don't know his company's policies. We don't know his hardware, connectivity, or budget issues. These kinds of passive aggressive responses are almost never helpful.

orf6y ago

When you reduce it down the title here is “giving people access to running arbitrary, manual and presumably unrestricted maintenance commands in production leads to issues”.

That’s not a surprise, and maybe the issue at the core here is not really Docker. That’s all.

james_s_tayler6y ago

I've seen plenty of stuff in my career where I've gone on record to say "hey - we really shouldn't do this". Nothing got done about it. But hey, I did what I could.

q_queue6y ago

1 more reply

int0x806y ago

Well, no. rm -rf / is a completely different beast. It is documented and expected for starters.

They may have valid reasons to do that, even if not common.

sethammons6y ago

thinking of the `rm -rf` one, here is a fun take:

  export $WORKDIR=Home/me/proj
  ...
  rm -rf /$WORKDIR

remram6y ago

Also the Steam Linux client.

sz4kerto6y ago

I really-really hope you are not relying on Docker only when protecting 19G of data. Docker volume operations are the equivalent of playing with sudo rm -rf, shit's going to happen once in a while.

scarface746y ago

I am a Docker newbie but I thought it wasn’t considered best practice to use Docker for anything where you care about the data in the container. I’ve only used for API’s batch jobs, etc.

markbnj6y ago

scarface746y ago

What advantage does Docker really give you for long live mostly stable resources like databases?

For batch processing, my usual pattern has always been to move the data from (slower) network storage to local storage, process, move results back to network storage.

1 more reply

cheez6y ago

Docker volumes are persistent, unless they're not :-)

q_queue6y ago

1 more reply

scarface746y ago

Well, I guess it also doesn’t help that I’ve only used Docker with AWS Fargate (aka “Serverless Docker”) - where nothing is persistent.

praseodym6y ago

RocketSyntax6y ago

Not sure what kind of company you work at, but I'd export a copy of your logs so you don't get canned

stcredzero6y ago

sz4kerto6y ago

That's completely fine given that the important state is on volumes (named, persistent, bind mounted, whatever).

johnchristopher6y ago

stcredzero6y ago

Well, you are supposed to be able to delete all your volumes, containers and networks and then regenerate

So then Docker is designed to treat all of those as disposable.

I just searched "recette" and only came up with French cooking references.

johnchristopher6y ago

> So then Docker is designed to treat all of those as disposable.

I don't really know if it's designed like that but I treat them as disposable and unreliable so I must have a way to resuscitate the thing when something bad happens.

ashtonbaker6y ago

> I just searched "recette" and only came up with French cooking references.

Perhaps a phonetic spelling of "resets" by a French person :)

1 more reply

dunham6y ago

It's french for "recipe".

wiredfool6y ago

You just won the “I dropped the production db” achievement.

It’s surprisingly easy with docker, especially when dealing with .... legacy systems.

acid3036y ago

My browser ate 16GB of ram while I've been reading this. The system crashed but the tabs were here there after a reboot. I'm not even mad anymore.

LiamPa6y ago

> please assign this bug to an engineer.

The joys of open source users..

ironmagma6y ago

Seems like a reasonable ask.. I’m not sure what the problem with this is?

LiamPa6y ago

Maintain an open source project and you will understand the problem of users having the ‘just fix it’ attitude.

ironmagma6y ago

DannyB26y ago

Computers are wonderful. They can do the same work that would require a thousand people to accomplish in the same amount of time.

Flip side . . .

Computers are terrible. They can screw things up so bad it would require a thousand people to accomplish in the same amount of time.

frenchman996y ago

`docker volume prune` is specifically there to remove volumes, so backing up before using it seems to be mandatory, just in case. But yeah, if this is a bug, it's a nasty one.

clinta6y ago

This bug specifically says it affects anonymous volumes. If you had it delete a named volume that sounds like a new issue.

j / k navigate · click thread line to collapse