story
I have a local Time Machine backup that is going continually. This allows me to quickly recover from most problems. However, if my house ever blew up, it wouldn't be sufficient, so in addition...
I have a Backblaze account. $5/month for everything on one computer, which is fine because Dropbox syncs my laptop to that computer. That backup runs continuously and keeps 30 days history (so if I delete a file and need to get it back, I can). Since I only care about disaster recovery here, if my house does blow up, they'll send me a hard drive with everything for around $100.
The combination means I feel pretty comfortable. I just hope my house doesn't blow up!
Once a month do a full image on both Windows and OSX systems. Save the images to another backup drive and take that backup to a safety deposit box at your bank.
If you want to be really paranoid, find a provider that offers fireproof storage for media. If you can't, go to your local storage facility and rent a small unit (maybe $30 to $75 per month, depending on where you live). Buy a fireproof safe and stick it in there. Once a month (or whatever) rotate drives through the safe.
You can also avoid the storage unit rental expense and locate a fire-proof safe at a trusted friend's or family members home or garage. Be creative.
In other words, creating a system that will virtually ensure the security of your data isn't hard or expensive --particularly when compared to the value of the data you are trying to protect.
You need to have context before you say something like that. Just ONE of our engineering machines probably takes two or three weeks to setup from scratch due to the amount of software and configuration it requires. It has some 600GB of valuable data, without counting the OS and applications installed.
So, in that context, taking the time to make multiple redundant disk images is invaluable and, yes, very efficient. If the system drive dies and no other hardware failure is present, you can be back up and running within an hour or two. That's worth money.
> over kill for most people
Absolutely true. Every situation is different.
On Windows you can get the same effect using VSS - http://en.wikipedia.org/wiki/Shadow_Copy
Also, since you seem to be knowledgeable about Linux, why do you use Dropbox at all, instead of just git or rsync or scp or whatever?
Works fine for me. I use it on every system and have it on a desktop, a laptop, a server and a HTPC. It is used on SSD, HDD (SATA and USB), RAID 0, RAID 1 and over dmcrypt. The only place I do not use it is one filesystem for MongoDB, and it would probably work fine there too if I disabled COW.
The builtin checksums (and scrub) functionality are crucial to me. I had a drive start to fail with ext4 and the only reasonable way to do a scrub would be to take it offline which I calculated would take 23 hours (plus a huge amount of extra work to map bad sectors back to relevant files). At some point a failed sector had also lead to ext4 giving back zeroes when the containing file was copied. btrfs has two copies of metadata by default so an unfortunately placed bad sector is less likely to wipe out knowledge of entire trees or files (happened to me with ext4 as well when a directory disappeared).
The volume management is great too. It is really easy to add and remove devices/partitions without having to take systems down, change between raid levels etc. You can do the same thing with LVM (which I was using before) but it is a lot of fiddly work to get the right commands and options, plus deal with physical and logical. It is one command to add or remove a partition with btrfs and it is trivial to work out what the command is from 'btrfs --help'. (LVM requires several different commands to be run which I always had to lookup and often had a lot of difficulty with such as running RAID 0 with partitions of different sizes.) btrfs also only does operations on the actual used portions. eg if I change from RAID0 to RAID1 on a filesystem where I am using 1GB of 1TB space then it will only worry about that 1GB not the whole 1TB. LVM can't see into the filesystem to know what is actually used.
I have compression (LZO) turned on everywhere. You can't (currently) find out how effective it has been, but I do have a lot of data files that are highly compressible (eg CSV files, SQL dumps). It is more convenient to have them expanded than to teach every single program that accesses them how to decompress on the fly.
I used to use rsync and hard links to make snapshots. These completely hammered the machine when run causing large amounts of I/O since all metadata has to be scanned plus all changed/new files. Consequently I only made daily snapshots. With btrfs making snapshots is virtually instantaneous and is unaffected by how much data has changed, size of volume, I/O speeds etc. I now make snapshots hourly.
Ultimately I have things such that I will proactively find out about bad sectors and similar low level corruption, have snapshots to deal with issues over time, and have data replicated over machines (mostly via Dropbox and git/hg), both local and remote. I am not relying on the filesystem of any one machine to always be perfect.
> Also, since you seem to be knowledgeable about Linux, why do you use Dropbox at all, instead of just git or rsync or scp or whatever?
Because Dropbox just works. Many of the alternatives haven't figured that out yet, assuming they even support Linux. The importance of actually working can't be overstated. It also requires no administration or maintenance. Sync automatically happens when machines are on and there is appropriate network connectivity without requiring any baby sitting from me.
I use git/hg for source which is their sweet spot. Using something like rsync is a pain once you have more than two machines, and it requires a full blown system accessible offsite for an offsite copy which is yet more administration and maintenance.
git/rsync/scp aren't usable from mobile devices. I do things like put documents, ebooks, photos, and music into dropbox which makes them be present on everything, no messing around needed. Dropbox behaves very much like DVCS for that kind of content doing an N-way sync, having a history (you can get last 30 days by default, more if you pay more or use a team account).
Finally Dropbox allows collaboration. You can easily share files and directories. I can do a software build for Android, put the APK in a shared folder and a colleague can install the app on their device without hassle. Various tools run periodic reports and put their output in shared folders which makes it available to everyone even if they then happen to jump on a plane.
There is no other simple option that "just works". So my choice is Dropbox + encfs (to encrypt everything). But I also keep another local copy of the entire dropbox folder in case the sync fails and decides to delete everything.
I used to use Mozy but when I tried a trial restore it was taking forever and missing some files. Also it inserts these .PART files into your system that can almost double the size of the folders you're trying to back up.
I looked at backblaze but it doesn't work with server computers.
So I'm now trying crashplan. I'm really interested in the feature of crashplan which allows automated backups to other computers. Only problem is that I've not been able to get this to work so far. Backing up to the cloud is working well though. Also its very competitively priced. I've yet to try a restore.
For servers, I've heard good things about Tarsnap, but never used it myself.
No, that's a tough one, especially with a back up the size you're doing.
Everytime you do a restore that's just confirmation that a particular backup was/was not sound.
Wouldn't it be neat if the back up program had functionality that allowed you to restore a random sample of your files ( and maybe then compare these to the originals ) and give you a likelihood that your entire backup is sound? A sort of sampling quality control process.
[1] though I'm switching off of them, not sure where to yet. There's some sort of issue in their client software that murders my home network, rendering the entire thing unusable while a backup is running. Their support was... less than helpful in trying to remedy the situation. For $5 a month and with the amount of stuff I was backing up, I can't blame them.
I don't have any QoS set up on my home network, I found two fixes: running the backup manually before going to sleep (maybe I can create an Apple Script and just cron this) or turning the Automatic Throttle off and giving it the minimum available (for me) 20kbps.