I'm surprised nobody came up with this idea till now. It's brilliantly simple.
Teehee. The method is not new at all, for example compressors like xzip do this out of the box, and almost the exact thing they're doing is basically how ZIP files work
The trouble with discarding state on every file is that it really hurts performance with small files, or when using anything like a modern codec, which gzip/deflate is not. Gzip maintains a 32kb dictionary which is quite easy to exceed with contemporary data, but with a modern compressor (like lzma2) losing that window will absolutely devastate ratios
The usual solution is so-called 'solid' compression, where the uncompressed input is partitioned into blocks spanning file boundaries. It configurably trades seek efficiency for reliably preserving compressor context -- including allowing seeking within files. Their format could be modified to support this while retaining backwards compatibility as good as the current method. What they have is already pretty much solid compression, except it only chunks large files. This is basically a weird special case of a simpler and more general design everyone else uses.
Finally on the compatibility angle, the end of stream is visible at an API level, so this isn't going to be 100% perfect. I'd expect one or more obscure implementations (maybe Windows apps? Java?) to potentially break
> This makes images a few percent larger (due to more gzip headers and loss of compression context between files), but it's plenty acceptable.
Compressing an entire image is generally great. Compressing all of the individual files in an image, is generally not great.
About 7.6% bigger: https://github.com/golang/build/commit/8a5a4d227f08eb1d889fa...
If they just wanted a S3/GCS fuse filesystem there are plenty of open source options out there.
>Currently, however, starting a container in many environments requires doing a pull operation from a container registry to read the entire container image from the registry and write the entire container image to the local machine's disk. It's pretty silly (and wasteful) that a read operation becomes a write operation.
What's silly is to claim that this is the problem. Any read is going to be a write operation, at multiple levels, thanks to systems of transparent caching: To a nearby CDN, to local disk, to local memory, to your CPU cache, etc. These are optimizations, they aren't making your container startup any slower.
The real problem, which this tool indeed helps to solve, is that reading the entire image must complete before you're able to start further processes which read specific parts of the image. Not anything to do with "reads causing writes".
The unnecessary writes I care about are to my cloud VM's small block device, which is I/O limited. The best way to not wait for those is to not do the writes in the first place.
I just moved this to https://github.com/google/crfs if people want to track that repo instead of Go's build system (which is relatively boring for most people).
Currently docker build compressed everything in the working directory on every build. This is fine for building images for deploy/upload but is annoying for a local dev situation where you're frequently rebuilding.
Seems like it wouldn't be too hard to write an alternate docker build that checks a previously built "Stargz" and just sends the additional files? (There would be some complexity here reassembling a valid tar within hyperkit).
I might be missing something here, it might be misplacing the bottleneck during build, but every time I'm annoying by this problem it seems part of the issue is the single fat tar that needs to be created every time.
edit: this strategy could also work with docker-machine building on remote machines
Edit: " For isolation and other reasons, we run all our containers in a single-use fresh VMs." So they had no caching for the base layers unless those were primed in the vm image?