How to Automatically Backup a Linux VPS to a Separate Cloud Storage Service (opens in new tab)

(jarv.is)

164 pointsjakejarvis7y ago86 comments

86 comments

77 comments · 23 top-level

joshstrange7y ago· 8 in thread

Forgive me if I'm missing something but this appears to just backup files so it would be fine for source code (should be in version control and safe already) and static assets (like user uploads) but doesn't appear to address things like DB backups which I feel like is the number 1 thing lost if you lose access to your host (followed by user uploads). The problem with DB backups is you can't just backup the data directory (like /var/lib/mysql) unless you've shutdown the DB or you can do a dump (mysqldump) but backing that up hourly is not a good solution IMHO. I guess you could have a replica that you shut down at the top of the hour, backup the data directory, then start back up but all if this is to say this post is not a silver bullet to "Automatically backup a Linux VPS".

This is NOT a knock against the author, I just wanted to point out that "backups" are much more complicated than "copy files elsewhere". For DB's I'd probably consider running a replica on 1 or more other clouds. IDK the logistics of replication over the internet but I know for work we do replication from our datacenter down to our local servers and that's over a relatively slow connection so I assume it's possible to do it from cloud-to-cloud.

jakejarvisOP7y ago

Absolutely. Maybe I should have noted that this is more of a guide to make your existing backup procedures more redundant, which implies that you already have local "backups" being made of whatever you want to redundantly store in S3 or B2 or anywhere externally.

In that case, it does become as simple as just copying files elsewhere. (For example, using the Restic steps in my post to backup a folder of hourly database dumps, like you mentioned.) Replicating databases (and other methods made specifically for DBs) is certainly a much, much better route for mission-critical and/or enterprise data.

Covering every permutation of different types of data to backup would have made a long post much longer, but I'll add writing a part two to my to-do list covering rudimentary database backups since that has been brought up here a few times.

Thanks for the feedback! :)

joshstrange7y ago

Awesome! I really hope I didn't come across as attacking you/your post, I found it really useful. I just wanted to remind people that the DB wouldn't really be covered by this (except in the case you mentioned where you are dumping the data).

I am definitively looking at this through the lens of where I work where a mysqldump (or equivalent) could take days to complete in full (DB is nearing 2TB in size now). For a number of projects a mysqldump might only take seconds or minutes and would be a perfect candidate for this backup scheme.

2 more replies

anderspitman7y ago

FWIW if your DB supports dumping to stdout restic can import that: https://restic.readthedocs.io/en/latest/040_backup.html#read...

tomcam7y ago

Damn, son! Good catch! I have been trying to figure out how to deal with just this problem. Thank you.

tracker17y ago

For most DBs the simplest path is a db dump, assuming you don't have more data than can be backed up in a reasonable time frame. From there, there's a number of file-systems and other integrations for S3, sftp or similar. You can simply copy out from there. The backup utility in question could be used to target your dump file(s) directory.

Replica's aren't actually a backup, though it's probably a good idea to have them, and in the case of too much data to reasonably backup, as close as you are going to come.

Also, depending on data structures, for example if your data models fit into something like MongoDB or not, it's easy enough to trigger a dump for each record. I've setup systems where primary records are json+gz files in S3 or Azure blobs. In practice this has been as part of the process that will update ElasticSearch (or Mongo) from an RDBMS authority.

It was pretty easy to do <base>/collection/{ID}.json.gz .. from there, worst case, I'd have to create a re-population script from the hard files, and might lose a little less important data, but would always have a hard recovery path. YMMV of course.

I've got 3 side projects that I've been muddling on getting started and some of the things that have happened to the original author frankly scare me. I'm also working on legal structure (LLCs etc) before any real launch. There are so many edge cases to consider, and sometimes it's hard just getting started and accepting that you will make mistakes along the way.

fabian2k7y ago

A replica isn't necessarily a backup, if the problem isn't a dead server but a missing WHERE clause in your DELETE query, a replica probably won't save you.

For Postgres you can use something like WAL-E to continuously backup to S3 or other cloud storage providers. The underlying mechanism is explained in the Postgres documentation as "Continuous Archiving and Point-in-Time Recovery (PITR)"(https://www.postgresql.org/docs/11/continuous-archiving.html). Using this method you lose a minimal amount of live data when your main server goes down. And if you need to restore, you can also restore at an arbitrary point of time, so e.g. just before you accidentally deleted everything in a table or something like that.

joshstrange7y ago

100% agree, I was talking specifically about protecting against your hosting provider shutting down your account. And a replica coupled with hourly (or whenever) shutdowns, backup data directory, and restart will protect you against a missing WHERE clause on a DELETE.

SkyLinx7y ago

For MySQL you can use Xtrabackup to take online - non blocking - incremental backups to the filesystem before backing up everything off site with Restic or other.

fabian2k7y ago· 6 in thread

I've used rclone for a very similar purpose. Restic, which is used in this post looks very interesting as well.

It's not the topic of the post, but database backups deserve a special mention. You can't just naively copy the database folder this way in most cases, you have to make sure to backup a consistent snapshot of the database. This is still not hard to do at smaller scales, when you can just add an exported dump of the database to your regular backup. But it is a point that needs some attention if you host the database yourself.

tluyben27y ago

I have many servers with different (versions) of Linux distros on them and I found Duplicity | Restic very annoying to install. Vague (for me as non-Python expert) error messages and options randomly not working as a result. Rclone was absolutely painless to install everywhere.

KAMSPioneer7y ago

Were you getting Python errors from Restic? Not terribly familiar with Duplicity, but Restic is written in Go (Github is https://github.com/restic/restic).

2 more replies

anderspitman7y ago

I use restic (for dumping an encrypted deduped backup to a usb drive) and rclone (for pushing to backblaze B2). Both fantastic tools.

ngcc_hk7y ago

Very good point. (For source code sync with git is a must.)

And do rehearsal as well. Backup may not work.

fabian2k7y ago

The tricky thing with the naive copy as a database backup is that it actually could work if you test it while the database isn't writing at the moment. For example when you only tested this outside production on a test server without load.

But yes, you do have to test and verify that your backup works. It might be configured entirely wrong, the cron job might not be running for some reason, you set it up with encryption years ago but lost the passphrase. There are plently of ways this can potentially fail.

marceloneil7y ago

restic actually supports rclone!

a2tech7y ago· 6 in thread

Does anyone have a recommendation for a backup client that handles millions of tiny files? I'm using rsnapshot right now, which works but backing up to an NFS share is incredibly slow (most of the time is spent in iterating over the filesystem to get a list of changed files, then running the hardlink process from the previous snapshot).

rsync7y ago

You're going to have to walk all those inodes no matter what you do. rsync is as good as anything at that task.

A better way would be to unmount and send the filesystem with 'dd' or something like that, or, to use 'zfs send' but I have a suspicion that neither of those options are available to you ...

I will say that splitting the rsync job (rsnapshot runs rsync underneath) into multiple, smaller jobs, could save you some time if you're running into any resource limits while you walk that big set of inodes... so if you're lucky and you have 4 or 5 or 8 top level dirs that are all roughly the same size, you could do a handful of smaller jobs, one after the other, instead of one huge one ...

module00007y ago

>> A better way would be to unmount and send the filesystem with 'dd' or something like that

To add to that.. to avoid having to unmount your filesystem; use LVM. Then you can call `sync`, snapshot your main volume, and `dd` the clean snapshot. Once you're done, remove the snapshot. This strategy avoids downtime while backing up your volume.

dsign7y ago

Over the years, I've got burn so many times with Linux network filesystems (including SMB/CIFFS, which is not all Linux but still) that I would start by recommending not to touch a network filesystem in the same host where anything important happens.

The issue is with how network failures (which always eventually happen) interact with the "uninterruptible" Linux process state. Hell breaks loose, and the failure is anything but obvious.

kijin7y ago

Run backups to and from the hosts where the files actually reside. Running rsync over ssh is probably many times faster than running it over NFS.

realusername7y ago

I use duplicity for that personally, I backup my whole home with it (and there's so much small files like this) it works very well.

rebelpixel7y ago

I'm currently using restic and used duplicity before for a few years, sending backups to backblaze b2.

What I don't like about duplicity is how it spews weird error messages that I gathered were related to Python versions. It was easier to just start over with restic and haven't had any problems since.

krn7y ago· 5 in thread

Are there any reasons to prefer Restic over BorgBackup[1]?

A conclusion from one comparison (2017)[2]:

"Restic’s memory requirements makes it unsuitable for backing up a small VPS with limited RAM, and the slow backup verification process makes it impractical on larger servers. But if you are backing up desktop or laptop computers then this may not matter so much, and using Restic means that you don’t have to setup your own storage server."

Is this still true?

[1] https://www.borgbackup.org/

[2] https://stickleback.dk/borg-or-restic/

raimue7y ago

For remote backups, BorgBackup always needs to run a server process (usually over SSH). Restic works with a "dumb" storage that only provides get/put/list/delete operations. Therefore restic is way easier to set up with built in support for S3, B2, GCS, and similar services that only offer an API but not shell access.

witten7y ago

That's true, although there are now a handful of BorgBackup remote storage vendors (rsync.net, BorgBase, etc.) that you can pay to run the server-side hosting for you. Probably not nearly as cheap as, say, S3.. but it does get closer to "just point your client here and hit go". And they offer additional sauce on top that you'd have to roll yourself with S3.. Backup activity monitoring, etc.

1 more reply

Improvotter7y ago

If you can use something like samba or any other way of attaching a remote folder, Borg will work without SSH access. So you can also use Borg if you for example mount a Google Drive folder and use that as your repository. Correct me if I'm wrong.

bloopernova7y ago

Borgbackup is a fantastic piece of software. I've used it to backup so many different things over SSH, and it's always worked perfectly.

I'm still convinced that its dedupe is magical. I don't know if there's a backup app that is more frugal with disk space, but Borgbackup has served me well in a non-growing 1.5TB backup area for 3+ years now.

m3nu7y ago

They split larger files into segments and only back up new segments. This avoids a) uploading files it has seen before and b) re-uploading large files if only part of it has changed.

Neil447y ago· 5 in thread

I use Duplicity in a similar way to back my Linode stuff up to Backblaze. It does versioning really well and it's been very reliable. I'd still have to configure up a new server somewhere etc but at least I have the data. http://duplicity.nongnu.org/

raimue7y ago

I used duplicity in the past, but the main problem with its incremental backups is that in order to be able to prune the backup history, you need to do full backups regularly to start a new backup chain. That means transferring a full copy of the data.

I switched to restic now, which allows to take incremental backups, but can also remove any snapshot to prune the history. Although it does not support compression, due to its deduplication and removing the need to store multiple full backups, the restic repository takes less space now than duplicity before.

OJFord7y ago

That's a really good point I hadn't considered at all; I'm glad you mentioned it! I was looking at a benchmark (that i linked in another comment) that makes duplicity look slow, but so much more economical on storage space - i.e. cheaper.

But as you point out, if you don't need a long history, incremental eventually gets more expensive. Unless you could squash older than X, I suppose, but presumably that's so expensive to run (encryption & compression) that it's not supported.

petre7y ago

We still use Duplicity because of the ability to rsync the files to other hosts. A repo is sort of weird as it's not a date tagged single file or volune directory.

narag7y ago

Couldn't you backup the backup instead? More space, but transfer in a local network would be faster.

SkyLinx7y ago

I have never liked duplicity, too slow and requires full backups regularly. And did I mention that it is slow?

smnrchrds7y ago· 4 in thread

In a dockerized single-VPS environment, where should cronjobs live? Should they be part of the main Docker container that had the app code, or a separate container that only has all cronjobs, or simply on the host?

jakejarvisOP7y ago

Good question. I have the same setup on one server hosting GitLab, Pi-Hole, Plex, etc., and I have Restic (and its cronjob) installed on the host and only backup the files that I mount to each Docker container, which are all stored in /srv/docker.

In theory, you need to be ready to literally delete every container at any time and pull them from scratch and be 100% fine, since all of your actual data should be stored on the host and mounted as Docker volumes [0]. It's a good Doomsday test if you're looking for one. ;)

[0] https://docs.docker.com/storage/

tracker17y ago

As mentioned in another post... could manage the backups from another server (not in the network) with a cron job that grabs a snapshot from the docker server's shared volume directory and forwards it to it's final destination. Could be done on a really small instance, and this way your backup information and account details aren't on your production server itself.

OJFord7y ago

How are you orchestrating them?

With kubernetes (and no more specifics than you've mentioned) you should use a Job.

With docker-compose I think I'd be tempted to have a different service that isn't long-running, and a cron job on the host that runs it.

With swarm, unless it supports something like k8s Jobs, (I don't know if it does or not, only used it once briefly and in anger) I'd probably have a 'cronjob' service which was responsible for launching the short-lived services per compose suggestion above.

witten7y ago

I don't know about "should", but one way to do it is to put both the backup script and the cron job to run it into a single, separate, backup-only container. Then tell that container (via volume mounts, etc.) what volumes to backup from other containers. Example container (non-Restic) that does this: https://hub.docker.com/r/b3vis/borgmatic/

PStamatiou7y ago· 3 in thread

Related - I've been thinking about how to best backup my S3 buckets (some with 50k+ files) off of Amazon. Sure I can setup another bucket with that cross region duplication feature, and I have versioning.. but would really prefer a backup off of Amazon (ie not sending manually created zips in a lightsail/ec2 or something to glacier) in case it ever gets hacked or I accidentally nuke the buckets or something like that.

Currently just doing a combination of s3cmd for a local archive (takes forever to download and then it doesnt seem like incremental syncs are any faster), as well as having Google Console clone my bucket there (but I'm not sure if it's versioned, or as easy as downloading the whole archive).

Never used duplicity -- would it be fast for something like this? Guessing I should just cron it on a remote server instead of running off a local machine frequently.

padelt7y ago

Have you had a look at rclone? Pretty sure you can copy or even sync files from one remote storage to another. E.g. copy from S3 to B2. https://rclone.org/commands/rclone_sync/

anderspitman7y ago

+1 for rclone here. It can indeed copy between remote backends. Just keep in mind that that data all has to flow through the rclone process. You could probably get much better performance by running rclone itself on an ec2 instance. Just keep an eye on your throughput usage.

PStamatiou7y ago

Thanks, I haven't. Will take a look

kijin7y ago· 3 in thread

Meh, just another backup solution that requires AWS keys, ssh keys, etc. to be kept on the same server where your data is. What if that server is compromised? The attacker now has all the keys he needs to delete or modify your backups, too.

For maximum peace of mind, always pull backups from a separate server that is not exposed to the world. Don't let your primary server push arbitrary data to the backup store.

This rule is trickier to follow when your backup store can't run scripts, which is why so many tools designed to work with S3 tell you to keep the keys exposed. But if you really want to, you can use an intermediate host to pull backups before pushing them again to S3.

longwave7y ago

Borg has an append-only mode [1] that prevents clients from overwriting data.

[1] https://borgbackup.readthedocs.io/en/stable/usage/notes.html...

slig7y ago

Can't you set up keys that are only allowed to do GET/PUT?

I know that tarsnap [1] can work with `list`, `write` and `delete` keys.

[1]: http://www.tarsnap.com/man-tarsnap-keymgmt.1.html

akerl_7y ago

As the other commenter noted, this is why you’d give the backup keys the ability to upload data but not delete data.

Combine that w/ S3 Object Versioning and you’ve got a pretty solid approach.

monkeydust7y ago· 3 in thread

I am looking for something that can backup Dropbox, Google Drive and Amazon Cloud to a 3rd party service. What do people recommend?

tracker17y ago

In addition to rclone... you could run a VPS that has the various services as fuse mounts and otherwise sync between them in CRON ... however, probably best to only actually use one of them in practical terms.

It depends on what you want to do, but can probably be accomplished in a <= $5 VPS.

ac297y ago

rclone

monkeydust7y ago

Thanks but appears Amazon stopped issuing api keys for Amazon Drive so stuck, at least for a fully automated solution.

tickthokk7y ago· 2 in thread

Thanks for sharing! While the victims were being scorned by the internet for not having proper backups, nobody was sharing how to achieve that.

dymk7y ago

Really? A blog post about how to use a glorified `rsync` was needed to instruct people building services for Fortune 500 companies to back their user data up?

anderspitman7y ago

To be fair there are a dizzying array[0] of OSS backup solutions, and it's very much not apparently what features are most important. A simple post like this that outlines a single good enough solution with a modern tool is valuable IMO.

EDIT: Oh and restic has much more functionality than rsync, including deduplication and encryption. rclone is more of a "glorified rsync", but even then its array of backends makes it truly glorious.

[0] https://wiki.archlinux.org/index.php/Synchronization_and_bac...

cure7y ago· 2 in thread

+1 for restic. I use it, and it's awesome.

rsync7y ago

(I hope) You'll be happy to learn that restic works perfectly with rsync.net:

https://www.rsync.net/products/restic.html

One of the modes of restic is SFTP target and as we run stock, standard OpenSSH, it works perfectly.

EDIT: A sibling comment to yours mentioned 'rclone' and I am happy to informally announce that over the past few months we have rolled out the 'rclone' binary to all of our production fileservers (it requires a server-side binary exe to be in place) and it is being used by rsync.net customers to broker file transfers cloud to cloud to cloud (as rclone is apt to be used for). 'rclone serve' and 'rclone mount' are disallowed for (I think) obvious reasons, but otherwise everything works ...

nickcw7y ago

Nice one - I'd love to hear more about this (rclone author!).

1 more reply

heinrichhartman7y ago· 2 in thread

Does anyone here have experience with backing up ZFS pools in cloud storage like S3, B2, ...?

I have a bunch of snapshots (https://github.com/jakelee8/zfs-auto-snapshot) that I want to backup along with the active tree. But don't want to keep extra copies of the data.

- Do these services offer snapshotting? ...that can be automated?

- Is there zfs integration, e.g `zpool send | b2 receive`?

conception7y ago

https://www.rsync.net/ is the only one I know of.

rys7y ago

rsync.net will natively accept ZFS sends

ausjke7y ago· 2 in thread

trying to backup linode image using dd, which works but not easy, I hope VPS vendors can provide a way for customer to migrate when the time comes.

zerkten7y ago

Doesn't Linode have its own backup which lets you restore onto another Linode VPS? It'd be nice to use provider-agnostic tools, but this seems like the most pragmatic option. I'm guessing other VPS providers offer something similar.

ausjke7y ago

it can't deal with the case when linode locks up your account, similar to what DO did to the original post, you can't put all eggs in the same basket to be safe?

1 more reply

turrini7y ago· 1 in thread

Vultr AND Linode.

1) Upload a custom ISO with ZFS (https://github.com/beren12/zfs-iso/)

2) Create a new VPS without OS and boot to your uploaded ISO.

3) Create a ZFS root pool and bootstrap your Debian or another distribution.

4) Enable all cool features: compression, encryption, etc.

5) rsync your zfs snapshots from Vultr to Linode and vice-versa.

This is how I do. You can even use them as templates for newly VPS.

And for backups, BackBlaze B2 and WASABI with a zfs-snapshot-upload script.

SkyLinx7y ago

Isn't a backup of the whole os via snapshot overkill? I can bring up one or more completely configured new servers in a few minutes with Ansible (plus another few with Rancher for Kubernetes). I don't see the point in backing up anything other than the actual data.

z3t47y ago· 1 in thread

Don't forget about practicing restoration (catastrophe scenarios). So that you will know how long time it will take to restore, and if something is missing. Last time I did it I did not remember the password for the encryption key. Sure I had it written down on a piece of paper, but the scenario was that the building had burnt down.

SkyLinx7y ago

Good point on testing the backups.

OJFord7y ago· 1 in thread

Restic looks neat. I've been looking at using duplicity [0] for similar purposes recently, which does a similar job.

Just found a good comparison/benchmark of the two at [1] - tl;dr seems to be that restic is fast, and duplicity is small.

[0] - http://duplicity.nongnu.org

[1] - https://github.com/gilbertchen/benchmarking

SkyLinx7y ago

Duplicity is not particularly good imo. Like I said in another comment it us much slower than other options and requires full backups regularly, which is a problem with lots of data.

LinuxBender7y ago

My own personal preference is to simply make VM's on each VPS that has some storage space, then enable chroot sftp and rsnapshot. Then on the client side, I used LFTP (sftp mirror sub-system) which is compatible with chroot sftp and behaves like rsync.

Each VPS backs up to the other. RSnapshot makes daily diffs that use hardlinks to avoid taking up space. This also mitigates tampering, as only root have access to the snapshots.

Demo site using anon login for testing: [1]

[1] - https://tinyvpn.org/sftp/#lftp

Blackstone47y ago

One idea I had was to create a service with preconfigured images setup for personal use with VPN, email server and file sync/backup. It could be sold to privacy conscious individuals and could compete with ProtonMail.

The technical side could be hidden from less technical users and it sold as isolated servers so the data would be protected.

I don’t have the skills or the time to work on this so happy for others to use the idea

pnutjam7y ago

Good directions, everyone should do this.

themodelplumber7y ago

I actually like using CPanel's built-in backup settings on servers where I have CPanel installed. Amazingly simple to set up, really intuitive, and supports a variety of services. I have used Amazon and SFTP backups so far and they both work really well.

electriclove7y ago

Or use a paid service that handles files and databases for ~$30/year like https://www.dropmysite.com/

(I'm just a customer that has been generally pleased over the past many years)

SkyLinx7y ago

I'm surprised that no one has mentioned Duplicacy yet. It's another very solid, reliable and fast alternative. At the moment I use Restic on servers but use Duplicacy on the desktop. It can also be used on servers of course.

apitman7y ago

TLDR for comments: use rclone and/or restic

j / k navigate · click thread line to collapse

86 comments

77 comments · 23 top-level

joshstrange7y ago· 8 in thread

jakejarvisOP7y ago

Thanks for the feedback! :)

joshstrange7y ago

2 more replies

anderspitman7y ago

FWIW if your DB supports dumping to stdout restic can import that: https://restic.readthedocs.io/en/latest/040_backup.html#read...

tomcam7y ago

Damn, son! Good catch! I have been trying to figure out how to deal with just this problem. Thank you.

tracker17y ago

Replica's aren't actually a backup, though it's probably a good idea to have them, and in the case of too much data to reasonably backup, as close as you are going to come.

fabian2k7y ago

A replica isn't necessarily a backup, if the problem isn't a dead server but a missing WHERE clause in your DELETE query, a replica probably won't save you.

joshstrange7y ago

SkyLinx7y ago

For MySQL you can use Xtrabackup to take online - non blocking - incremental backups to the filesystem before backing up everything off site with Restic or other.

fabian2k7y ago· 6 in thread

I've used rclone for a very similar purpose. Restic, which is used in this post looks very interesting as well.

tluyben27y ago

KAMSPioneer7y ago

Were you getting Python errors from Restic? Not terribly familiar with Duplicity, but Restic is written in Go (Github is https://github.com/restic/restic).

2 more replies

anderspitman7y ago

I use restic (for dumping an encrypted deduped backup to a usb drive) and rclone (for pushing to backblaze B2). Both fantastic tools.

ngcc_hk7y ago

Very good point. (For source code sync with git is a must.)

And do rehearsal as well. Backup may not work.

fabian2k7y ago

marceloneil7y ago

restic actually supports rclone!

a2tech7y ago· 6 in thread

rsync7y ago

You're going to have to walk all those inodes no matter what you do. rsync is as good as anything at that task.

A better way would be to unmount and send the filesystem with 'dd' or something like that, or, to use 'zfs send' but I have a suspicion that neither of those options are available to you ...

module00007y ago

>> A better way would be to unmount and send the filesystem with 'dd' or something like that

dsign7y ago

The issue is with how network failures (which always eventually happen) interact with the "uninterruptible" Linux process state. Hell breaks loose, and the failure is anything but obvious.

kijin7y ago

Run backups to and from the hosts where the files actually reside. Running rsync over ssh is probably many times faster than running it over NFS.

realusername7y ago

I use duplicity for that personally, I backup my whole home with it (and there's so much small files like this) it works very well.

rebelpixel7y ago

I'm currently using restic and used duplicity before for a few years, sending backups to backblaze b2.

krn7y ago· 5 in thread

Are there any reasons to prefer Restic over BorgBackup[1]?

A conclusion from one comparison (2017)[2]:

Is this still true?

[1] https://www.borgbackup.org/

[2] https://stickleback.dk/borg-or-restic/

raimue7y ago

witten7y ago

1 more reply

Improvotter7y ago

bloopernova7y ago

Borgbackup is a fantastic piece of software. I've used it to backup so many different things over SSH, and it's always worked perfectly.

m3nu7y ago

They split larger files into segments and only back up new segments. This avoids a) uploading files it has seen before and b) re-uploading large files if only part of it has changed.

Neil447y ago· 5 in thread

raimue7y ago

OJFord7y ago

petre7y ago

We still use Duplicity because of the ability to rsync the files to other hosts. A repo is sort of weird as it's not a date tagged single file or volune directory.

narag7y ago

Couldn't you backup the backup instead? More space, but transfer in a local network would be faster.

SkyLinx7y ago

I have never liked duplicity, too slow and requires full backups regularly. And did I mention that it is slow?

smnrchrds7y ago· 4 in thread

jakejarvisOP7y ago

[0] https://docs.docker.com/storage/

tracker17y ago

OJFord7y ago

How are you orchestrating them?

With kubernetes (and no more specifics than you've mentioned) you should use a Job.

With docker-compose I think I'd be tempted to have a different service that isn't long-running, and a cron job on the host that runs it.

witten7y ago

PStamatiou7y ago· 3 in thread

Never used duplicity -- would it be fast for something like this? Guessing I should just cron it on a remote server instead of running off a local machine frequently.

padelt7y ago

Have you had a look at rclone? Pretty sure you can copy or even sync files from one remote storage to another. E.g. copy from S3 to B2. https://rclone.org/commands/rclone_sync/

anderspitman7y ago

PStamatiou7y ago

Thanks, I haven't. Will take a look

kijin7y ago· 3 in thread

For maximum peace of mind, always pull backups from a separate server that is not exposed to the world. Don't let your primary server push arbitrary data to the backup store.

longwave7y ago

Borg has an append-only mode [1] that prevents clients from overwriting data.

[1] https://borgbackup.readthedocs.io/en/stable/usage/notes.html...

slig7y ago

Can't you set up keys that are only allowed to do GET/PUT?

I know that tarsnap [1] can work with `list`, `write` and `delete` keys.

[1]: http://www.tarsnap.com/man-tarsnap-keymgmt.1.html

akerl_7y ago

As the other commenter noted, this is why you’d give the backup keys the ability to upload data but not delete data.

Combine that w/ S3 Object Versioning and you’ve got a pretty solid approach.

monkeydust7y ago· 3 in thread

I am looking for something that can backup Dropbox, Google Drive and Amazon Cloud to a 3rd party service. What do people recommend?

tracker17y ago

It depends on what you want to do, but can probably be accomplished in a <= $5 VPS.

ac297y ago

rclone

monkeydust7y ago

Thanks but appears Amazon stopped issuing api keys for Amazon Drive so stuck, at least for a fully automated solution.

tickthokk7y ago· 2 in thread

Thanks for sharing! While the victims were being scorned by the internet for not having proper backups, nobody was sharing how to achieve that.

dymk7y ago

Really? A blog post about how to use a glorified `rsync` was needed to instruct people building services for Fortune 500 companies to back their user data up?

anderspitman7y ago

EDIT: Oh and restic has much more functionality than rsync, including deduplication and encryption. rclone is more of a "glorified rsync", but even then its array of backends makes it truly glorious.

[0] https://wiki.archlinux.org/index.php/Synchronization_and_bac...

cure7y ago· 2 in thread

+1 for restic. I use it, and it's awesome.

rsync7y ago

(I hope) You'll be happy to learn that restic works perfectly with rsync.net:

https://www.rsync.net/products/restic.html

One of the modes of restic is SFTP target and as we run stock, standard OpenSSH, it works perfectly.

nickcw7y ago

Nice one - I'd love to hear more about this (rclone author!).

1 more reply

heinrichhartman7y ago· 2 in thread

Does anyone here have experience with backing up ZFS pools in cloud storage like S3, B2, ...?

I have a bunch of snapshots (https://github.com/jakelee8/zfs-auto-snapshot) that I want to backup along with the active tree. But don't want to keep extra copies of the data.

- Do these services offer snapshotting? ...that can be automated?

- Is there zfs integration, e.g `zpool send | b2 receive`?

conception7y ago

https://www.rsync.net/ is the only one I know of.

rys7y ago

rsync.net will natively accept ZFS sends

ausjke7y ago· 2 in thread

trying to backup linode image using dd, which works but not easy, I hope VPS vendors can provide a way for customer to migrate when the time comes.

zerkten7y ago

ausjke7y ago

it can't deal with the case when linode locks up your account, similar to what DO did to the original post, you can't put all eggs in the same basket to be safe?

1 more reply

turrini7y ago· 1 in thread

Vultr AND Linode.

1) Upload a custom ISO with ZFS (https://github.com/beren12/zfs-iso/)

2) Create a new VPS without OS and boot to your uploaded ISO.

3) Create a ZFS root pool and bootstrap your Debian or another distribution.

4) Enable all cool features: compression, encryption, etc.

5) rsync your zfs snapshots from Vultr to Linode and vice-versa.

This is how I do. You can even use them as templates for newly VPS.

And for backups, BackBlaze B2 and WASABI with a zfs-snapshot-upload script.

SkyLinx7y ago

z3t47y ago· 1 in thread

SkyLinx7y ago

Good point on testing the backups.

OJFord7y ago· 1 in thread

Restic looks neat. I've been looking at using duplicity [0] for similar purposes recently, which does a similar job.

Just found a good comparison/benchmark of the two at [1] - tl;dr seems to be that restic is fast, and duplicity is small.

[0] - http://duplicity.nongnu.org

[1] - https://github.com/gilbertchen/benchmarking

SkyLinx7y ago

Duplicity is not particularly good imo. Like I said in another comment it us much slower than other options and requires full backups regularly, which is a problem with lots of data.

LinuxBender7y ago

Each VPS backs up to the other. RSnapshot makes daily diffs that use hardlinks to avoid taking up space. This also mitigates tampering, as only root have access to the snapshots.

Demo site using anon login for testing: [1]

[1] - https://tinyvpn.org/sftp/#lftp

Blackstone47y ago

The technical side could be hidden from less technical users and it sold as isolated servers so the data would be protected.

I don’t have the skills or the time to work on this so happy for others to use the idea

pnutjam7y ago

Good directions, everyone should do this.

themodelplumber7y ago

electriclove7y ago

Or use a paid service that handles files and databases for ~$30/year like https://www.dropmysite.com/

(I'm just a customer that has been generally pleased over the past many years)

SkyLinx7y ago

apitman7y ago

TLDR for comments: use rclone and/or restic

j / k navigate · click thread line to collapse