Do you think the results would have been similar if you were to use no-compression-ZFS instead of ext3 on a proper database hardware?
Basically trying to figure out if the low performance of uncompressed dataset is specific to AWS/ext3. Thanks.
Using a 2 disk striped volume for PostgreSQL 9.2, I get an average of 2.5X compression (as reported by ZFS), and a 1.5 to 2X time reduction in database restores (single threaded or 8 jobs in parallel).
Given this development box has relatively slow 7200 RPM disks, the tradeoff of more CPU time for less disk transfer makes sense.
Edit: My use case is an OLAP server. I can't state how the tradeoffs affect OLTP performance.
That aside, I thought this was a wonderful article with non-intuitive findings. Very interesting, CirtusDB [edit: er, CitusDB]. :-)
As far as AWS goes, we have noticed ephemeral disks connected to the same instance can exhibit fairly large performance differences, and attempted to control for that in our tests by reusing the same disk for each test run.
You have no idea how busy the real server is, (noisy neighbors, etc), so it's impossible to have comparable results from benchmark to benchmark.
Simply ratcheting off a set of features and stating that Btrfs is "better" is dubious at best, and perhaps mis-leading. As the OP stated in his blog post, ZFS has a rich feature set -- which we find invaluable in our own postgres stack -- features such as incremental snapshots, a real copy on write filesystem, etc.
I found 2 recent Phoronix benchmarks which compare Btfrs with Ext4 and Ext4 with ZFS respectively. You can't really combine them as it seems the hardware used is different but if you use Ext4 as a rough translation key it seems ZFS on linux (which is what the OP used) is slower then Ext4 and Btfrs. Transparent compression speed would depend on cpu and is comparable.
April 18, 2013 Ext4 vs ZFS http://www.phoronix.com/scan.php?page=news_item&px=MTM1N...
February 18, 2013 Btfrs (and others) vs Ext4 http://www.phoronix.com/scan.php?page=article&item=linux...
Unreliable Mashup which gives some indication: * fs-walk 1000 files 1 mb zfs 46.20 ext4 72.50 vs 78.67 btfrs 66.37 btfrs
* fs-walk 5000 files 1 mb 4 threads zfs 25.63 files/s ext4 79.73 vs 99.60 btfrs 94.63
* fs-mark 4000 files 32 subdir 1 mb zfs 7.78 ext4 74.07 vs 78.80 btfrs 65.17
* dbench 1 client count zfs 27.29 MB/s ext4 167.29 MB/s vs 195.24 btfrs 165.37
I'm also interested in a Btfrs benchmark vs ZFS on Illumos, this way you can determine which is the best or fastest system for this specific scenario (even thought the OP used Linux).
Incremental snaphots is a nice feature for a Postgresql stack, what is the significant or as you put it 'real' difference between the CoW and snapshot functionality of Btfrs compared to ZFS? Are there things you cannot do with Btfrs in a Postgresql stack compared to ZFS?
ZFS is a fantastic file system, but I can't help wondering if part of the issue is the fact that the benchmarking was conducted on a virtual machine. ZFS is better suited for raw disks than virtual devices (again, just my anacdotal evidence. I've never ran benchmarks myself).
...and so is Hans. :-) (Sorry, I couldn't resist :-))
No FUD intended, but I don't consider ZFS on Linux production ready. Wanting to use ZFS, I recently started regularly reading their GitHub issues.
There are deadlocks and un-importable pools in certain situations (hard-links being one: think rsync). I would not want production boxes in the same predicaments experienced by several bug reporters. Moreover, applying debug and hot-fix (hopefully) kernel patches and the associated downtime in production is a no-go for me.
Mind you, the project leads are very responsive and it's making great strides.
In addition, I believe the Linux implementation currently lacks the L2ARC (which can make ZFS really fly, caching to SSDs).
However, I would absolutely run ZFS on Illumos or Solaris; for the stability and article-mentioned compression benefits.
Hard to say if it's better than some sort of linux with zfs Frankenstein system. Would love for Oracle to make ZFS more linux friendly though, seems like a win for everybody and there are tons of users that would love for it to happen.
I don't know if I'd call Reiserfs "maintained" and I couldn't recommend it to anyone. If it is maintained seriously, my recommendation would be to rename it.
/shiver/
It shouldn't surprise most people that enabling transparent compression gives these benefits. Why you ask? Well what is the largest bottleneck in a system? Disk IO - by far. So all ZFS is doing is transferring workload to a subsystem you likely have plenty of(CPU) from one that you have the least of(Disk IO/latency)
But it's downright misleading to show the vertical axis from something other than 0.0 to 1.0 when comparing ratios. They start it at 0.2. In reality, LZJB is saving 50% of the space whereas gzip saves 70%. But a naive glance at the graph implies gzip look roughly 3 times smaller/better than LZJB.
Classic "How to Lie with Statistics" stuff.* I would have expected better from an "analytics" database.
* Not saying they intend to lie here but it's representative of the classic text https://en.wikipedia.org/wiki/How_to_Lie_with_Statistics
I agree that it's kind of counterintuitive.
The licensing problems only apply to distributing CDDL and GPL code that have been compiled into the same binary, not running a CDDL-licensed module in a GPL kernel - I think. My experience with ZFS (which is awesome, btw) comes from FreeBSD.
In practice, at least part of your working set gets served from memory, and compression doesn't help with the pages that are already in memory.
I agree that it sounds too good to be true though.
Consider the analogous (if simplified) case of logfile parsing, from my production syslog environment, with full query logging enabled:
# ls -lrt
...
-rw------- 1 root root 828096521 Apr 22 04:07 postgresql-query.log-20130421.gz
-rw------- 1 root root 8817070769 Apr 22 04:09 postgresql-query.log-20130422
# time zgrep -c duration postgresql-query.log-20130421.gz
19130676
real 0m43.818s
user 0m44.060s
sys 0m6.874s
# time grep -c duration postgresql-query.log-20130422
18634420
real 4m7.008s
user 0m9.826s
sys 0m3.843s
EDIT: I'm not sure why time(1) is reporting more "user" time than "real" time in the compressed case.It actually ran faster double-spaced (stacker) and had nearly 12MB of available space... didn't have any problems with programs loading, surprisingly enough.. which became more of an issue when moving onto a 486.
Yeah, when your storage is so relatively slow, the CPU can run compression, you can get impressive gains in space and performance.
However, we're not showing how bad ext3 is, but that the end result still shows the stellar performance, compression or not.
Each of the seven queries we used in our benchmark required a sequential scan of the 32GB dataset. It's unlikely that the ARC had any impact on the results since the EC2 instance had only 7GiB of memory.