undefined | Better HN

0 pointsfabian2k5d ago0 comments

> This difference is particularly noticeable with multiple images sharing the same base layers. With legacy storage drivers, shared base layers were stored once locally, and reused images that depended on them. With containerd, each image stores its own compressed version of shared layers, even though the uncompressed layers are still de-duplicated through snapshotters.

This seems like a really weird decision. If base images are duplicated for every image you have, that will add up quickly.

0 comments

kodama-lens5d ago

I think there is an Issue/PR right now to change this. See: https://github.com/containerd/containerd/issues/13307

epistasis5d ago

Oh, very glad to see this, ML applications that were mentioned in it are exactly why I was thinking this was such a disastrous change.

However, the tedium of the reply chain reminds me why I tend to focus most energy on internal projects rather than external open source...

Docker may have been built for a specific type of use case that most developers are familiar with (e.g. web apps backed by a DB container) but containerization is useful across so much of computing that are very different. Something that seems trivial in the python/DB space, having one or two different small duplicates of OS layers, is very different once you have 30 containers for different models+code, and then ~100 more dev containers lying around as build artifacts from building and pushing, and pulling, each at ~10GB, that the inefficient new system is just painful.

The smallest PyTorch container I ever built was 1.8GB, and that was just for some CPU-only inference endpoints, and that took several hours of yak shaving to achieve, and after a month or two of development it had ballooned back to 8GB. Containers with CUDA, or using significant other AI/ML libraries, get really big. YAGNI is a great principle for your own code when writing from scratch, but YAGNI is a bit dangerous when there's been an entire ecosystem built on your product and things are getting rewritten from scratch, because the "you" is far larger than the developer making the change. Docker's core feature has always been reusable and composable layers, so seeing it abandoned seems that somebody took YAGNI far too extreme on their own corner of the computing world.

a_t485d ago

Docker(Hub) just isn't built for this use case. I've built https://clipper.dev to better handle ML/large images. It consists of a registry+pull client that breaks apart layers and does content addressing of individual chunks by _uncompressed_ hash, so that content can be better shared. My pull client has better parallelization and wastes much less bandwidth. It annoys the heck out of me when I change one file in a layer and have to redownload bytes my device already has. By sharing across layers I've seen 80-90% improvements in pull times for "patches".

I'm also in the process of building a BuildKit builder, I'm seeing large improvements on the speed of exporting images. The same image that takes Docker >3 minutes to export and push takes me under a minute. https://github.com/clipper-registry/benchmarks/actions/runs/...

pjmlp4d ago

> ...containerization is useful across so much of computing that are very different....

So much that containerization in general predates Linux, and UNIX, all the way back to System 360.

Also it got introduced into Tru64, HP-UX, BSD and Solaris, before landing into Linux.

epistasis5d ago

This is hell for a lot of ML containers, that have gigabytes of CUDA and PyTorch. Before at least you could keep your code contained to a layer. But if I understand this correctly every code revision duplicates gigabytes of the same damn bloated crap.

a_t485d ago

It's even worse when you end up installing PyTorch as a separate package in some other layer. It's not shared between layers at all with regular Docker.

spwa45d ago

If you have problems with 13 (I believe) GB of docker layers ... how do you deal with terabytes or petabytes of AI training data?

epistasis5d ago

Petabytes of training data is only one application of PyTorch, which is going to use tens of thousands of containers, but...

Inference, development cycles, any of the application domains of PyTorch that don't involve training frontier models... all of those are complicated by excessive container layers.

But mostly dev really sucks with writing out an extra 10GB for a small code change.

1 more reply

StableAlkyne5d ago

You don't even need MB of training data for some ML applications. AI is the sexy thing nowadays, but neural networks (Torch is a NN library) are generally useful for even small regression and clarification problems.

For some problems you might even be able to get away with single digit numbers of training points (classic example of this regime being Physics-Informed Neural Networks)

1 more reply

Normal_gaussian5d ago

the training data is on a separate drive; or the training data isn't that large for this use case; or they aren't training.

0cf8612b2e1e5d ago

You don’t train petabytes on your laptop.

IsTom5d ago

Docker is already hogging a lot of disk space and needs to be pruned regularly. I can't imagine what's it's going to be like now.

embedding-shape5d ago

"really weird decision" seems like an understatement, I thought the entire point of the specific storage design with the whole layering shebang was so things could be shared? If you remove that, just get rid of layers as a whole, what's the point otherwise?

tetha5d ago

It does. It's also very nice that this moves storage usage from /var/lib/docker over to /var/lib/containerd.

Due to that, a careless installation of a few new dev-systems under the new docker version immediately blew up storage usage on the root-disk, while happily ignoring hundreds of gigabytes on a volume on /var/lib/docker.. because that's where it needs the storage, right? A few older systems also were upgraded but didn't, which was quite confusing at first.

Sorry for being salty, but that was a pretty hectic afternoon with those new agents trashing builds, and now we have a pretty annoying migration plan to plan for the rest. And yes yes it's just a reinstallation, but we have other things to do as well.

j / k navigate · click thread line to collapse

0 comments

kodama-lens5d ago

I think there is an Issue/PR right now to change this. See: https://github.com/containerd/containerd/issues/13307

epistasis5d ago

Oh, very glad to see this, ML applications that were mentioned in it are exactly why I was thinking this was such a disastrous change.

However, the tedium of the reply chain reminds me why I tend to focus most energy on internal projects rather than external open source...

a_t485d ago

pjmlp4d ago

> ...containerization is useful across so much of computing that are very different....

So much that containerization in general predates Linux, and UNIX, all the way back to System 360.

Also it got introduced into Tru64, HP-UX, BSD and Solaris, before landing into Linux.

epistasis5d ago

a_t485d ago

It's even worse when you end up installing PyTorch as a separate package in some other layer. It's not shared between layers at all with regular Docker.

spwa45d ago

If you have problems with 13 (I believe) GB of docker layers ... how do you deal with terabytes or petabytes of AI training data?

epistasis5d ago

Petabytes of training data is only one application of PyTorch, which is going to use tens of thousands of containers, but...

Inference, development cycles, any of the application domains of PyTorch that don't involve training frontier models... all of those are complicated by excessive container layers.

But mostly dev really sucks with writing out an extra 10GB for a small code change.

1 more reply

StableAlkyne5d ago

For some problems you might even be able to get away with single digit numbers of training points (classic example of this regime being Physics-Informed Neural Networks)

1 more reply

Normal_gaussian5d ago

the training data is on a separate drive; or the training data isn't that large for this use case; or they aren't training.

0cf8612b2e1e5d ago

You don’t train petabytes on your laptop.

IsTom5d ago

Docker is already hogging a lot of disk space and needs to be pruned regularly. I can't imagine what's it's going to be like now.

embedding-shape5d ago

tetha5d ago

It does. It's also very nice that this moves storage usage from /var/lib/docker over to /var/lib/containerd.

j / k navigate · click thread line to collapse