Some data, like plain text, JSON, or XML, compresses extremely well. You're almost certainly looking at compression rates of up to 90%. Dense binary formats, especially formats that are already compressed like most image formats won't compress barely at all (sometimes they can even get bigger), and attempting to compress them is an enormous waste of processing power.
ZSTD has its own estimator to avoid doing unnecessary work, but I haven't found a comparison of these estimators (which is cheaper).
compress results in max extent size of 128K for compressed extents, and 128M for uncompressed data. Whereas compress-force results in 512K max extent size for uncompressed data. While there's a cost to tracking more extents, it might be less than the cost of frequent modification of large extents, particularly if snapshots are used. Of course there's no one answer, it'd be workload and hardware dependant. Hard drives have significant latency impact from non-contiguous extents. NVMe much less impact.
Basically you could heuristically find interesting files, then train a compression dictionary for e.g. each file or each directory. Then you'd compress 32kB blocks of each file with the corresponding dictionary. Note that this would be pretty different from how existing file system compression works (by splitting everything into blocks of e.g. 128kB and just compressing those normally).
I think it would be hard to find heuristics that work well for this method. The result would be pretty different than what I'm doing here with different tradeoffs. sqlite-zstd is able to use application-level information of which data is similar and will compress well together. You couldn't do that within the filesystem. It also works "in cooperation" with the file format of the database instead of against it - e.g. you probably wouldn't want to compress the inner B-tree nodes.
On the other hand, it would then work for any file format not just SQLite databases. E.g. a huge folder of small json files
Databases do a lot of seeking into database files and typically only read a small part of them. That’s not a scenario that’s a good fit with whole-file compression.
Compressing individual pages is a bit better, but I think this can still be better.
Your file system probably would cache the decompressed database file, but that would use memory, lots of it if your database is large.
It's also worth noting that compression on the application layer works way better on column stores than row stores.
Sometimes? You've gotten a ton of replies digging into the "why" and "when", and I only wanted to add that your question brought to mind this quote from the official SQLite website:
> "Think of SQLite not as a replacement for Oracle but as a replacement for fopen()"
I use a similar approach for compressing data that go in a couple of database/store. Having control on when and how to compress data can significantly improve performance, also, it sometimes makes sense to keep compress data even in memory since it can allow you to make better use of it.
Also keep in mind that often you can't control the filesystem.
One use case for which I am working right now is that I have is storing highly repetitive data on a very slow flash storage, (that technically support compression, but it is slow and with not great compression ratio), I can significantly improve performance.
edit: grammar
That said, it looks like you could get the same results by doing client side compression and decompression of data yourself, and storing compressed blobs in sqlite. It only compresses text column data, not entire pages.
The details are described here: https://phiresky.github.io/blog/2022/sqlite-zstd/
https://www.sqlite.org/vfs.html
Briefly search for SQLite compressed vfs and you will find many extensions. Some are quite mature.
I get the impression that the author doesn't know about the vfs, as it's not even mentioned in the readme. The clickbait title similarly seems amateurish. Caveat emptor.
https://phiresky.github.io/blog/2021/hosting-sqlite-database...
You're right that a comparison to compression VFS is completely missing. I knew about their existence but I have to admit I didn't look too closely.
Note that the main "novelty" here is training a shared / partitioned compression dictionary specifically to take advantage of redundancy that wouldn't appear within only one row or even within a database page / block . The compression happens at the row level and can additionally use application knowledge - you couldn't do that in the VFS level. For example, you can have separate compression dictionaries per different columns and per groups of rows with some commonality.
I'll have to compare to a compression vfs (do you have a favorite?) and see if maybe these two methods can even be combined.
Edit: I see that https://github.com/mlin/sqlite_zstd_vfs does actually also train dictionaries. It's still at the database-page level so can't take application knowledge into account or compress only parts of the data, but that's still pretty smart then.
There's no need to be this dismissive.
I found 3 with a brief search - CEVFS, sqlite_zstd_vfs, and ZIPVFS (not free, therefore discounting.)
Neither CEVFS nor sqlite_zstd_vfs support WAL mode (which means, e.g., no litestream replication).
Can you recommend a mature one that does support WAL mode?
I also like how simple and powerful the custom partitioning is.
storing raw JSON in sqlite means the entire blob would be compressed and live in a single column, right?
It would look something like this:
CREATE TABLE t1(
json TEXT,
data INT GENERATED ALWAYS AS (json_extract(body, '$.d')) STORED
);
If all you need is fast querying you could make `data` a `VIRTUAL` column and set up an index on it.You can read more about this kind of technique here: https://dgl.cx/2020/06/sqlite-json-support
So I experiment with btrfs on another device and both are working well after some daily usage.
archivemount mydbimage.tgz /mnt/mydbimage -o big_writes
// -o readonly disable write support
// -o nosave do not save changes upon unmount.
//open/cache /mnt/mydbimage/mydbimage.db
//flush
fusermount -u /mnt/mydbimage
//No LGPL/Rust/cargo dependency issues...
//as SQLite's best use-case is embedding ;)
FUSE is cool and all but given it's "userspace" and we're just reading a single file here, I don't find this compelling versus reading from a block device in-process.
TFA is compelling because it's compression over specific data sets without being in the hot path for every operation. Putting a sqlite.db in a tarball is... problematic. There is no random access, so if your database doesn't fit in memory/cache, performance will be abysmal.
I think we can agree it depends on the use-case. =)
My point was it doesn't contaminate the parent projects license.
That is, while saving rows to the database, pick out every (say) 10th row, and compress it by itself but keep the compression state in memory. Then for the next 9 rows, compress that row based on that dictionary, and a reference to the row it's based on. (For example, in Python's zlib module, you'd use the zdict parameter [1] of compress and decompress.) A bit like keyframes in video compression.
You can potentially get better results this way than a generic solution like the article, because you might have some application knowledge about what rows are likely to be similar to others. But it obviously requires application level code (e.g. fetching a logical row requires fetching two physical rows so you have enough information to decode), and it would be a nightmare if you need to remove or update the rows.
[1] https://docs.python.org/3/library/zlib.html#zlib.compressobj
The linked project actually does exactly this - you can basically specify an SQL expression (such as strftime(date, '%Y-%m')) that is then used to group rows together in order to decide what to train dictionaries on. What you're describing as picking every 10th row would be the expression (id/10). So you can use application knowledge, you just don't have to.
The difference is that in my project the dictionary is stored separately so not affected by row deletions, and the trained dictionary is based on all samples not just one of them. Since the dictionary is stored separately it's only worth it if you combine at least maybe 100-1000 rows under one dictionary.
> Python's zlib module, you'd use the zdict parameter
Yes, the training feature of zstd is basically a better version of keeping another sample of data around and using it as a prefix. In fact you can use a data sample instead of a trained dictionary as the `-D` parameter of the zstd command line, zstd just has the additional feature of reducing a set of samples down to the most essential (data-saving) parts.
I did mention that this is what you're doing, I just thought it wasn't clear from the blog post. But I did miss that you can use an SQL expression to choose the grouping. This is ideal for the application I'm thinking of, which a "message type" field that corresponds pretty closely with which rows are similar.
Plugin/extension/module development is severely underused in our field. Extensibility, if possible, is often the best way to handle edge cases or increments in a dependency, without forking it.
See "Benefits of Being an Extension to PostgreSQL": https://www.citusdata.com/blog/2017/10/25/what-it-means-to-b...
Some great software that is extensible in brilliant ways through plugins, that comes to mind, is:
postgres: https://www.postgresql.org/docs/current/external-extensions....
nginx: https://www.nginx.com/resources/wiki/modules/
sqlite: https://www.sqlite.org/loadext.html
Would this work well with compression here?
ie. from 50 bytes on average for these columns... to a few bytes only?
From your last link:
> The result of this training is stored in a file called "dictionary", which must be loaded before compression and decompression.
Also, what happens if a column has been compressed based on 100 typical values, and then later we insert thousands of new rows with 500 new frequent values. Does the compression dictionary get automatically updated? Then do old rows need to be rewritten?
PS what is compact mode?
This would be in contrast to e.g. compressing the data in your code before inserting it into the database, then decompressing after you load it.
>Depending on the data, this can reduce the size of the database by 80% while keeping performance mostly the same (or even improving it, since the data to be read from disk is smaller).
"Even improving it" is a bold claim!
Modern CPU's are often hungry for data, aren't they? While being incredibly fast and mostly idle?
Is reading a smaller compressed file and decompressing it in the CPU's hyperfast cache before use faster than reading it uncompressed?
Maybe you can speed up an entire server, even one with lots of free space and no need for compression, by adding full disk compression anyway!
Unless the CPU's are at 100% utilization this could always be used. Since the CPU will then use the data, it might be fast enough to uncompress it at home faster than the bus can keep dropping it off at file read speeds.
This chart I found that goes up to 2000 suggests cache access had an increasing performance gap with even RAM bandwidth, to say nothing of SSD's[1]
It is from here[2]
(However, compressing a full disk can only help if there are extra CPU cycles.
For activity spikes when the CPU doesn't have extra cycles or the cache for it, both compressed and uncompressed versions could be available, at most doubling disk usage, so that the uncompressed file could be read instead.
This can ensure the CPU isn't burdened with a decompression task, since this only helps if the fastest caches are used for it and they might have better things to do than uncompress files.
However, whenever the CPU has idle cores and lots of cache, which is most of the time, it could request the compressed version. If it is busy it could ask for the original version which saves the decompression work.)
Can this help solve the problem of waiting for the disk?
[1] http://www.extremetech.com/wp-content/uploads/2014/08/CPU-DR...
[2] https://www.extremetech.com/extreme/188776-how-l1-and-l2-cpu...
Yes. Although not all filesystems support compression. One of the filesystems that does support and this is regularly benchmarked, is ZFS. It is also interesting that while you are definitely using more CPU to read a compressed stream, if that causes you to finish your task earlier, then it can be a net gain even from a CPU usage standpoint.
eg.
https://www.servethehome.com/the-case-for-using-zfs-compress...
It's pleasantly surprising how much data you can fit within a ZStandard-compressed SQLite database. The two technologies fit very well together.
[1]: https://github.com/andrewmcwattersandco/github-statistics
Hopefully this will encourage SQLite to support compression natively in some fashion.