Replacing a core system component with an out-of-repo version is always going to hurt, yes.
> I switched to btrfs; it just working is worth the few extra warts over ZFS.
I'm not sure I'd call "catastrophic failure and data loss" a "wart". In all my years of distro hopping, I've had 3 root filesystems become unbootable: 1 F2FS system early on, which I actually did manage to fsck out of, and 2 on an openSUSE tubleweed system using BTRFS as root.
How long ago was that? and have you been using other fully checksummed filesystems (like ZFS) on that hardware since then? I'm asking because if you're using btrfs without any raid features (or with simple RAID modes like 1/0) for the past several years and it breaks, if you dig deep enough into the problem, often the hardware is found to be at fault.
And ext4 or xfs either don't find corruption at all (if it's data corruption), or have better error recovery if the FS's own metadata got trashed (which is a strong argument in favor of them, I agree, but I wouldn't trust such a filesystem anyway and would restore from backups right away).
Edit: it's a strong argument for storing data on them which is checksummed by some higher component in your software stack, like the database. Otherwise, you're just asking for silent data bitrot.
That's not really good enough though. Next gen file systems are supposed to be resilient even if hardware fails. That's the whole point of raiding and checksumming. ZFS was very much intended to be resilient when faced with bad hardware. Heck, even in the 90s this was a known problem hence chkdsk on DOS marking bad sectors to somewhat mitigate data corruption on FAT file systems. If Btrfs only works when hardware is behaving then that is absolutely a problem with Btrfs.
As for my experience with ZFS, it's kept consistency when disks have died. It's worked flawlessly when SATA controllers have died (one motherboard would randomly drop HDDs when the controllers experienced high IOPS -- which would be enough to trash any normal file system but ZFS survived it with literally no data loss). Not to mention frequent unscheduled power cuts, kernel panics (unrelated to ZFS), and so on and so forth. I'm sure it's possible to trash a ZFS volume but it's stood strong on some pretty dubious hardware configurations for me and where most other file systems would have failed.
Also, I'm going to somewhat mirror sibling comments: Even if the hardware is faulty, that should produce a filesystem with explicit checksum errors, not an unreadable filesystem. There is certainly an upper limit to what it could catch, but you'll have to forgive my skepticism that only one of the 2 filesystems on the system was affected and only after months of use, and then the corruption was so complete that it couldn't even tell me what was wrong and try to fix it.
Well with ZFS I've had hardware break and still not experienced any data loss. I've had cables getting lose multiple times, I've had several disks dying[1], I've had unstable SATA controllers (hello JMicron) and plenty of unexpected power losses and hard resets.
Yet ZFS has sailed through it all with my data intact. Sure ZFS ain't bulletproof. It can get messed up. But for the most part it takes a lot of beating without a dent.
[1]: As a matter of fact, I just finished resilvering a RAID-Z1 pool in my NAS after a WD Red 3TB died after almost 7 years of 24/7 operation (barring a few accidental power outages).
https://gist.github.com/xenophonf/76fd44ae24772e457cb63d00c0...
`apt-get update && apt-get dist-upgrade -y` works as expected. I plan to switch to a similar config on my Lenovo laptop when I upgrade it to the next Ubuntu LTS release.
As someone using new kernel version as they are released, I'm not willing to use a filesystem that may break with a kernel update. It also seems openzfs only supports up to kernel 5.6, according the the github release. I'm on 5.9, so its not even an option.
https://wiki.archlinux.org/index.php/ZFS
I would need a package that depends on zfs and provides linux-kernel at an appropriate version. Can't have something so critical break because of an upgrade, and I don't want to pin it and forget to upgrade it (also fairly anti-arch).
There have been a couple cases where I had to wait a week or two for compatibility fixed to get merged into zfs git, but otherwise staying up to date has not been a problem.