I must say that LZ4 definitely stands out for both compression and uncompression speed, while still being able to cut the data size in half, making it probably quite suitable for life filesystems and network protocols. Particularly interesting was also comparing Zstd and LZ4[3], the former being substantially slower, but at the same time achieving a compression ratio somewhere between zlib and xz, while beating both in time (in my benchmark at least).
[1] https://github.com/AgentD/squashfs-tools-ng/blob/master/doc/...
[2] https://github.com/AgentD/squashfs-tools-ng/blob/master/doc/...
[3] https://github.com/AgentD/squashfs-tools-ng/blob/master/doc/...
Actually, no. lz4 is less suitable than zstd for filesystems.
BTW, lz4 is present in many mozilla tools like thunderbird: it's represented by its bastard child lz4json, which is diverging by just the headers don't work with regular lz4 tools
> achieving a compression ratio somewhere between zlib and xz, while beating both in time (in my
Your observation is correct: zstd is now standard and the default on openzfs 2.0, replacing lz4.
The 19 compression variants offer more flexibility than just lz4- another strength is the decode time is not a function of the compression factor, which is something good on coldish storage that's rarely updated.
Are you sure? The default compression level has always been "off", but when switched on - the default has been lz4 for about 5 years. Zstd support was added less than a year ago and there are still a lot of things that need to be fixed before one could even suggest that it might be a sane default. I like zstd, but I like my uncorrupted data more. I know that compatibility between compressor versions and pools is a concern, and there are also the compression performance problems with the way zstd handles zfs block sizes. Thankfully lz4 works great for zfs and has for many years now.
https://github.com/openzfs/zfs/blob/master/include/sys/zio.h...
Why's that? What benefit would I get from switching? Is it workload-dependent?
EDIT: To be clear, I'm not disagreeing; if zstd will work better, I want to know about it so that I can switch my pools to use it.
>
>Why's that? What benefit would I get from switching? Is it workload-dependent?
Presumably because Zstd has much better compression, while still being quite fast.
I don't see however how that invalidates any of my observations. Some filesystems like e.g. UBIFS support LZ4, but now also support Zstd, because both are suitable for the task (and LZ4 was around earlier).
In the end it is a classic space vs. time trade-off and there is AFAIK no generic right or wrong answer (except that some algorithms are too slow to even be considered).
Still zstd offers way better compression and got variable difficulty factor: https://github.com/facebook/zstd Decompression is always fast, but you can trade off compression vs. ratio factor.
In general if send data over network zstd is quite profitable. Even network attached disk AWS EBS or AWS S3 it can be a hugely profitable.
This is the case in ClickHouse: if data is compressed, we decompress it in blocks that fit in CPU cache and then perform data processing inside cache; if data is uncompressed, larger amount of data is read from memory.
Strictly speaking, LZ4 data decompression (typically 3 GB/sec) is slower than memcpy (typically 12 GB/sec). But when using e.g. 128 CPU cores, LZ4 decompression will scale up to memory bandwidth (typically 150 GB/sec) as well as memcpy. And memcpy is wasting more memory bandwidth by reading uncompressed data while LZ4 decompression reads compressed data.
I'm not into C/C++ for years though and now I wouldn't grok the code, sadly.
LZ4 is one of the algorithms supported by Zram in Linux. It's fairly popular for people using things like a Raspberry PI that have a smaller amount of RAM.
https://www.techrepublic.com/article/how-to-enable-the-zram-...
I configure my own kernel on Arch and Zswap is enabled by default there, too.
I can't quite see why - perhaps the logic to decide which pages to compress is different, or there is too much code in the swap subsystem that slows down the compression/decompression process...
[0] https://klarasystems.com/articles/openzfs1-understanding-tra...
This article refers to the padding on the final record as "slack space" and states that you need to enable compression to eliminate the slack space.
https://arstechnica.com/information-technology/2020/05/zfs-1...
See also:
https://old.reddit.com/r/zfs/comments/gzcysy/h264_h265_media...
For image, video and audio there are more efficient compressions taking advantage of those formats.
That said it can be quite tricky to rewrite something to efficiently work on compressed data.
I always assume S3 storage was compressed on the fly by AWS regardless of how the client chooses to store his/her data.
Per the table at [0], zstd provides only a slight improvement in compression ratio, and in exchange is about half the speed of lz4.
They both have their place.
zstd, brotly, snappy were seemingly all made with high end x86 capabilities in mind.
To put it simplistically, if you have a file which is a (good) random mix of an equal number A and B characters, LZ4 won't be able to compress it significantly, while Zstd will compress it 8:1 converging to an encoding where a '1' bit is A, and a '0' bit is B.
I checked it. LZ4 is still reducing the size to half, no idea why half. So for 10 MB file it compresses to 5 MB.
Edit: checked with highest compression and it compresses 1MB file to 185KB. So what the parent wrote is false.
Yes, however there is usually no facility to train your compression algo with most tools using ZSTD.
The way this would probably work without this facility though, say, in a database, is that the dictionary is maintained internally and constructed on the fly from the field data and not exposed to users. Although, I don't know if you'd have to keep every version of the dictionary in order to successfully decompress old data? If so then perhaps this is a niche feature
And yes, totally, I know at least RocksDB supports exactly that behavior [0].
[0] https://github.com/facebook/rocksdb/blob/12f11373554af219c51...
lz4 >> snappy
zstd >> brotli
https://corecursive.com/frontiers-of-performance-with-daniel...
Some people are just really good at performance sensitive stuff.
It makes it very clear where LZ4 fits into comparisons with compression speed, decompression speed and compression ratio
(Linking to this more for the overview than the Windows tool in itself.)
So there's a bunch of forks with useful features that'll never be adopted because there's no collaboration.
At least that's what I could tell when I looked into it
The extra decompression on top of the data fetch latency can be quite noticeable. Sometimes that can be offset if the compression ratio is affecting a hitrate, and thereby decreasing the latency. The problem of course is that even with 10M IOP storage devices frequently it is really latency and an inability to keep 100k requests outstanding that limit perf to one's IO turnaround latency.
Put another way, compressed ram and disk are really popular in systems which are RAM constrained, or bandwidth limited because the cost of fetching 2x the data vs 1x and decompressing it is a win (think phones with emmc). The problem is that this doesn't really make sense on high end NVMe (or for that matter desktops/servers with a lot of RAM) where the cost of fetching 8k vs 4k is very nearly identical because the entire cost is front loaded on the initial few bytes, and after than the transfer overhead is minimal. Its even hard to justify on reasonable HD/RAID systems too for bandwidth tests since any IO that requires a seek by all the disks will then tend to flood the interface. AKA it takes tens of ms for the first byte, but then the rest of it comes in at a few GB/sec and decompressing at faster rates takes more than a single core.
edit: And to add another dimension, if the workload is already CPU bound, then the additional CPU overhead of compress/decompress in the IO path will likely cause a noticeable hit too. I guess what a lot of people don't understand is that a lot of modern storage systems are already compressed at the "hardware" layer by FTL's/etc.
What bothers me about Zfs is that it uses a different caching mechanism (ARC) than Linux page cache. With ARC you actually see the memory used in tools like htop and gnome system monitor (it is not cool seeing half your memory being used when no programs are running). ARC is supposed to release memory when needed (never tested though), so it might not be an issue.
After about an year of playing with both filesystems on my Linux laptop, I decided the checksumming is not worth the performance loss and switched back to ext4, which is significantly faster than both filesystems. Still use ZFS on backup drives for checksumming data at rest and easy incremental replication with `zfs send`.
https://github.com/strigeus/ipzip - TCP/IP Packet Compressor with LZ4 support
https://github.com/centaurean/density - Extremely fast de/compression
I was thinking of using LZ4, but it doesn't really work that great on floating point, and images are already compressed (png, jpg, and even BCn, can't be compressed much further). So idk. Good thing about lz4 is that it's very simple[0] and probably faster then memcpy().
S3TC is block compression, so if there is repeating data in images it will compress quite well.
Just re-ran it (with -hc, version 1.9.3(latest now)) and: "Compressed 5592544 bytes into 5413783 bytes ==> 96.80%".
7z with "7z a -mx=9 -t7z asd.7z albedo_PNG_BC7_1.KTX" does 5592544 bytes to 4844707 bytes (down to ~87%).
Original file is 10227047 bytes (PNG, RGBA), i can't remember if the ktx has mipmaps.
EDIT: Note that the image is fairly noisy (gravel). Could/should be better with more "artificial" textures.
I don't know if ktx does some extra compression, but, looking at it, i doubt it.
PS I think that BC could be massaged at compression to be better compressible, and i think i read something about that. Don't remember.
You could further optimize by profiling access patterns during QA testing. There wouldnt be 1 global ideal ordering of assets if you had multiple scenes involved using varying subsets, but you could certainly group the most commonly used together using some implicit insert ordering during creation. This would help to minimize the total number of filesystem block accesses you require.
I think one other important IO trick is to make sure you vacuum the sqlite database before you publish it for use. Presumably, these should be read-only once authored in this context of usage. This will clear out empty pages and de-fragment the overall file.
https://store.steampowered.com/news/app/991270/view/18064466...
I've yet to really understand what kind of footprint LZ4 uses, and if it's dependent on dictionary size used to compress. What if I have, say, 4KB that I could use to store in-place decompression. Is that related to the compression ratio?
[1] https://community.arm.com/developer/ip-products/processors/b...
Although implementations arein rust, I assume the provided benchmarks are representative of any optimised implementation...
Many compressor algorithms are compared on several data sets.
The results tables show compressed size, compression and decompression times for a number of normal and pathological cases.
Get a good feel about strengths and weaknesses.
Some algs really go downhill in pathological cases, such as with random data.
Do consider encryption too though you probably want to do that on the compressed data set where possible.
Sometimes external encryption means you will be stuck with something close to pathological...