"This is allowed; its always safe to say there’s data where there’s a hole, because reading a hole area will always find “zeroes”, which is valid data."
But I recall reading elsewhere a discussion about some userspace program which did depend on holes being present in the filesystem as actual holes (visible to SEEK_HOLE and so on) and not as runs of zeros.
Combined with the holes being restricted to specific alignments and sizes, this means that the underlying "sequence of fixed-size blocks" implementation is leaking too much over the abstract "stream of bytes" representation we're more used to. Perhaps it might be time to rethink our filesystem abstractions?
"treatment of on-disk segments as "what was written by programs" can cause areas of 0 to not be written by bmaptool copy":
https://github.com/intel/bmap-tools/issues/75
IMO, the issue here isn't filesystem or zfs behavior, it's that bmap-tool wants an extra "don't care bit" per block, which filesystems (traditionally) don't track, and programs interacting with filesystem don't expect to exist.
Some of the comments I've made in this issue describe options to make things better.
(FWIW: the original hn link discusses a different issue around seek hole/data, and the bmap-tool issue is backwards from the issue the parent posits: bmap-tool relies on explicit runs of zeros written not being holes, and particular behavior from programs writing data)
> Incidentally, that’s why this isn’t “corruption” in the traditional sense (and why a scrub doesn’t find it): no data was lost. cp didn’t read data that was there, and it wrote some zeroes which OpenZFS safely stored.
[0] https://bugs.gentoo.org/917224
[1] https://github.com/openzfs/zfs/issues/15526 (referenced in the article)
> zpool get all tank | grep bclone
> kc3000 bcloneused 442M
> kc3000 bclonesaved 1.42G
> kc3000 bcloneratio 4.30x
> My understanding is this: If the result is 0 for both bcloneused and bclonesaved then it's safe to say that you don't have silent corruption.
It's also not worth checking for because this bug has existed for many years. Your data probably wasn't affected. None of the massive ZFS storage companies out there ran into it by now either.
Your data is fine. Sleep easy.
OpenZFS is a mighty filesystem hobbled by an absolutely detestable license (the CDDL.) Its greatest single contribution was in all likelyhood to BSD, although it didnt seem to make the OS more popular as a whole.
the latest and greatest from the OpenZFS crowd seems to be bullying Torvalds semi-annually into considering OpenZFS in Linux...which will never happen thanks to CDDL and so the forums devolve into armchair legal discussions of the true implications of CDDL. You'll see a stable BTRFS and a continued effort to polish XFS/LVM/MDRAID before openZFS ever makes a dent.
One could argue OpenZFS is a radioactive byproduct of one of the most lethal forces in open source in the past 20 some years: Oracle. They gobbled up openoffice and MySQL, and went clawing after RedHat just shortly after mindlessly sending Sun to the gallows. Theyre an unmitigated carbunkle on some of the largest corporations in the entire world, surviving solely on perpetual licensing and real-world threat of litigation. That they have a physical product at all in 2023 is a pretty amazing testament to the shambling money-corpse empire of Ellison.
Ultimately the FOSS community under Torvalds is on the right track. Just because Shuttleworth thinks he cant be sued by Oracle for including ZFS in Ubuntu with some hastily reasoned shim doesnt mean Oracle wont nonchalantly send his entire company to the graveyard just for trying. Oracle is a balrog. stay as far away as you can.
OpenZFS devs have openly declared that no, they are not pushing to include OpenZFS into Linux kernel, and that separate arrangement is just fine, especially since it allows different release cadence and keeps code portable.
Mainly there's an issue with certain Linux Kernel big name(s) that like to use GPL-only exports (something that has uncertain legal status) in a rather blunt way, and sometimes the reasoning is iffy.
Canonical has been shipping the kernel with ZFS for more than 7 years and so far they have not been sued by Oracle.
Right now I would put my money on bcachefs[1] rather than BTRFS. bcachefs is currently in the process of being merged into the kernel and will be in the next kernel release. Doesn't currently quite offer everything ZFS does, but it's very close and already appears more reliable than BTRFS, and once stuff like Erasure Coding is stable, it'll be more flexible than ZFS.
My fix: spin up a .onion to host my distribution of the Linux kernel containing ZFS integrated and BtrFS excised, do not answer abuse/legal emails, don't even have email to receive aforementioned emails. What's the pencil-necked shrimpy IP lawyer at Kernel Foundation going to do? Shut down Tor?
Don't have the grizzled old storage admin / DBA test the backup. They know a million and one weird necessary workarounds and just execute them. However, if you need a restore and they are currently exploring caves or something, things turn dire. Have a chipper junior restore something based off of the documentation (and prepare to spend a few days updating documentation...)
And make sure to test backup you don't regularly touch. And very much test those backups you really don't want to test.
If you don’t delete files as you go, now you have an unknown partial backup state that basically doubles your needed space.
If you delete as you go, sometimes something happens and the process stops or corrupts so your data is now split and you may have lost something.
Even trying to log all the failures during the process is amazingly difficult and solutions to work around that specific problem, themselves, somehow introduce more and new types silent failure in some type of irony.
I do think that file systems that support snapshots - like ZFS, but I think LVM can be used for stuff like ext4, and Apple APFS does too - is the way to go. Not sure how well NTFS's Shadow Copies/Volume Shadow Service work, I heard horror stories, but not sure if those are one-off freak accidents. Probably worth considering ReFS anyway these days on a Windows Server. But with a Snapshot, you're at least insulating yourself mostly from changes to the data you're backing up. At the expensive of managing snapshots, that is, getting rid of old ones after a while because they keep taking up space.
I like to keep a separate database of what files I expect to have along with their hashes. The off-the-shelf tooling for this is weak, to say the least. Even S3’s integrity checking support is desultory at best, and a bunch of S3 clones don’t implement it at all (cough minio cough).
Do you have any advice as to HOW the average home NAS user can affordably backup modern NAS devices?
The last time I looked it could easily cost hundreds of dollars per month to back up as little as 40TB to the cloud.
Backup what you value the most, ignore what you don't and apply tiers depending on what needs to be kept but you can deal with transferring it back home slowly and what you need immediately in case of a failure.
My rules of thumb are:
- always invest 3x the price of your hot live NAS storage in backups. If you can't afford buying 40TB of storage, you can't afford having 10TB of live storage. Period. Goal is to have at least one copy locally and one externally and have more space to on the backup storages to account for retentions, changes and help with migrations.
- if you can't afford 3 redundant storages(RAID), favor having 3 times non redundant storage (no RAID) over having less copies of redundant one.
Additional tip to reduce cost and avoid expensive cloud offering is to find a reliable and trustable relative or friend that can host your external copy of your backup. Nebula or Tailscale now makes it very easy without having to configure routers and stuff. In exchange you can offer that person to host his/her backup storage.
Also digitalizing material stuff is nice, but printing digital photos is also a great way to preserve copies. I'd rather save the photos I cherish the most than having 3 backup copies of 10TB of blurry or non outstanding photos. After years of having them all digitally, I am inveting back in printing photos and making albums. You can also print photobook multiple times and have some stored at a relative's place.
My NAS has something like 5 TB used. It's all synced to an old server that can hold about 8 TB and that's off most of the time (no fun living next to a jet engine). This cold server lives at my parents' house.
My "really important stuff" on the NAS, which is a few hundred GB of pictures and such, is regularly backed up to a bucket with object locking.
My "super important stuff", which is my company's accounting and other such documents, and lives on my laptop, is backed up to the live NAS and handled there as the really important stuff. I also back up my laptop to two normally offline external drives, one of which lives in my apartment and the other at my parents' house.
Everything non-cloud is ZFS, so after each backup to an external drive or "cold NAS", I run a scrub to make sure it is still operational. The live NAS runs a scrub every Monday morning.
Granted, this is not a "modern NAS" environment, since it made no sense to me to forego the free servers that my employer was going to send to the trash and buy some expensive off-the-shelf solution without the guarantees of ZFS (despite the issue TFA talks about). I know about power usage, but my live NAS eats less than 50W at idle (which is 99% of the time), so breaking even with the electricity prices in France would take forever.
Which is how it should be. Yes bugs can happen but there's only so many 9s most of us can chase on our budgets. And "always test backups" in particular adds cost. Testing means restoring onto hardware that you can then use live, separate from your actual primary hardware or at a minimum on primary hardware with >2x the set size and enough performance to squeeze it in during downtime or around work. So yet another big increase in cost. "Testing backups" isn't trivial.
I know you can use ZFS snapshots but I'm not experienced enough to trust that I could make a 20-40tb snapshot without screwing something up. Plus it's all video files so I can roughly keep track of what's what and I can ignore the stupid LTO compression.
It takes days, its noisy, and very tedious. But thats #hoarderLyfe lol
Downside to tape backup is you need throughput, or the ability to do disk-disk backups
If you can figure out how to split the data into categories, you could save money as well. E.g., which of this data is truly irreplaceable - stuff like personal photos, source code, whatever it is that can never be re-created. If you're running a business, then stuff that needs to be available immediately in order to keep the lights on. Those things needs to be on storage that also gets backed up daily, preferably in full, and preferably to multiple clouds.
Stuff that can be re-created from sources (e.g., rendered outputs) are less critical because in the worst case, you can just spend some days/weeks to re-create it.
Also consider regular offline backups - put it on a tape drive or on some hard disks/SSDs or even optical media (yes, it would take something like 400 BDXL disks to back up 40 TB, but I assume the data doesn't rapidly change) and put it in some offsite storage facility in case your place burns down.
For sensitive data I would run something else that can be a Restic target so backup data is encrypted, I currently use a cloud drive that supports WebDAV for that.
Also, you can probably tier your data. Maybe you don’t need same level of backup for all your 40TB.
You only have to backup the data that is important to you and you don't want to lose in case your house gets robbed, floods, burns down, etc.
If you don't mind losing 40T of data, you don't have to back it up at all.
Otherwise get another NAS, installed it at family/friend's house, and set up a VPN between the two: then use rsync/zfs-send/whatever.
Glacier vaults in S3 are quite affordable these days.
Also afaik tapes need a stable storage environment: how do you manage that?
I did upgrade to an LTO-5 drive last year so so, after finding a new-in-box from a liquidation sale for something like $450. The nice thing about LTO is that it's 2 Generations R/W and 3 Generations Read - so the LTO-5 drive will Read/Write LTO-5 (1.5 TB) and LTO-4 tapes, and read LTO-3 tapes. I think with one of the new standards (LTO-8?) it's a bit more muddy, so check compatibility.
I think that LTO-4 and LTO-5 is the sweet spot for hobbyists: You still need to spend some money on a drive or two and buy brand new tapes, but it's reasonably affordable. That said, for a business, I'd just bite the bullet and buy a new drive. Dell sells an external SAS LTO-7 drive brand new for $3700 list price, but I think there might be cheaper options. Together with some tapes and a SAS Controller, I'd say that for $5000 you can get a decent, brand new setup.
I put the tapes in Turtle LTO Cases (https://turtlecase.com/products/lto-20-black), and they sit in a closet. It's not climate controlled or anything, but the place is roughly at a similar temperature year round. The tapes aren't THAT sensitive, but I'd definitely not store them in the garage where I might get a 50+ degree temperature difference throughout the year. That said, there are companies that offer off-site storage options with climate controlled environments. I haven't looked into their pricing since I didn't need it, but the nice thing about tapes is that you can just backup to two tapes and send the second tape off-site. LTO has built-in encryption support, so that's an option.
Twice a year or so, I run a restore of the tape and compare it to the SHA256 that I took while backing up the file (I did build myself some rudimentary cataloging system to SHA256 hash every file, then back it up to tape with tar, and make a record of what file with what SHA256 got backed up when on what tape). I've yet to encounter any bit rot/defective tape issues, but YMMV.
I do use Backblaze's B2 service as well for cold-ish storage. Though I only back up truly irreplaceable or inconvenient to recreate data into B2. That way, I have multiple copies of truly important stuff, I have stuff readily available where I am, and I have terabytes of stuff that isn't worth the expense for the cloud since I can re-create it, but nice to have a copy of.
Tape Drives may be overkill for many and external hard drives (plural!) might be a better option for many. What I like about tape drives is that the media isn't "hot". If I have ransomware running wild, connecting an external hard drive puts everything on it at risk (hence the need for multiple drives), whereas with a tape, it would have to specifically try to rewind the tape and start overwriting, and I would notice it. But YMMV, I never had a ransomware problem myself, but I do have stuff I really don't want to lose, so multiple backups of it in multiple ways (Daily .tar archive on a hard drive, backed up to tape, and backed up to the cloud) should hopefully give defense in depth and the ability to at least recover some older state.
Backup that isn't checked isn't done.
One will expect more seriosity from filesystem maintainers and serious regression testing before a release.