I'm not an infrastructure engineer nor do I work in web, but I'm pretty comfortable with Linux. I realised I need to spin up a couple of home servers and VPSs to simplify and localise my digital life, and I have an RPi and an x86 NAS in my home network, and a VPS in the cloud. They run different hardware and distros, so I have to set them up a bit differently, which is a pain of itself, but what makes matters worse is a situation when I mess something up real bad or when there's another reason that essentially forces me to reinstall.
I tried Ansible and find it hard to use. E.g. at some point I decided to redeploy my server to a different VPS type in the same cloud, but I had to patch my Ansible scripts to do so, even though it was the same Rocky Linux distro (and it failed at some random docker compose networking config IIRC). I guess, Ansible scripts aren't reproducible and require constant work to keep them working. But I very much like them vs just SSH-ing into servers.
That leads to my question. Is there anything I can do to write config once and just deploy it more or less reliably? NixOS looks interesting, but learning another programming language just for this feels a bit too much for me. Or maybe there's another way to do stuff like this which I overlook as I'm in a different industry?
Waxing poetic about NixOS on HN is a horse well-beaten. Just try it, if you've got an extra machine laying around and a few hours to spare. I think it's a great halfway option for people that want complex server composition software without the Kubernetes built-in.
The hard part about NixOS is when you need to package something yourself. That can have a bit of a learning curve but since nixpkgs is the largest package repository you rarely need to do it.
If you are running custom stuff you can always start by just using NixOS to run a Docker container. At least that will be a reproducable OS and if you pin a specific image it will be fully reproducible. Then when you want to you can dip your toes into native Nix packages. (It really isn't that bad, just an extra thing to learn that you can defer to start)
My biggest issue with Nix (as is everyone's I hear) is the mediocre documentation. It takes a lot of wheel-spinning to get up-to-speed with PKGBUILD or makefile knowledge. A lot of that difficulty curve could be remediated with better, flake-focused packaging tutorials. Surely though, that too will be Coming Soon™.
note, there are two wikis now.
All my images are built and stored and served via tftp and a remote nix store for the servers to boot. Very easy to build the system and then do atomic upgrades. Best part is ... Rollbacks if the config is bad are easy and fool proof.
I would never use mutable distributions in production. Too scary.
My nix boxes need very little minding.
https://github.com/eh8/chenglab
I was a complete Nix beginner three months ago and thought Nix was terribly complicated and unnecessary. Glad to say I was wholly wrong and the transition was not that bad.
NixOS me provision my servers from scratch to functional file/media/home automation server in about 15 minutes using an entirely automated Nix installation process. It’s a beautiful OS for servers
Edit: But NixOS looks really good, I have to agree. I guess 'immutability' will let me just install and forget about it.
There is! It's a fucking nightmare: https://xiaoyehua.dev/posts/nixos-on-oracle-arm-machine
It works though, and thanks to the genius that wrote this script I've got a 4 core 24gb Oracle Always Free instance loaded up with NixOS at all hours of the day. I feel spoiled.
Also, why run different OSes on your machines? And why the need to reinstall stuff?
I just run Arch on everything, and I haven't had to reinstall a machine in many years.
And whilst I've used most provisioning tools (quite liking packer and terraform), for my own stuff I have a file with notes and snippets. Follow that file and within less than 10 minutes I've got a working server with Postgres, Nginx, and LetsEncrypt, and it's ready for Git push-to-deploy.
Simplicity is what you need, for as long as you can get away with it. Simplicity and backups.
All from Ansible, all to a private github that I just SSH into the machine long enough to run ansible pull.
That way, when I run into errors, it's just rinse and repeat to iterate my ansible playbook code until it is perfectly fit for this exact situation. Is my code maintainable and enterprisey like my work ansible? NOPE. But it's okay, it's my private code that nobody but me uses. I even do direct commits on master because I can, it's my tiny naughty guilty pleasure.
It's an orchestration tool that's common in the real world, and also notoriously hard to learn and "get right". Downtime due to obvious, important mistakes is common, and it leaves both engineers and lower management wondering if it was a good idea to adopt.
The thing is, in your home environment, you have no (or hopefully significantly lower) uptime requirements. If you break the entire cluster for a few days, because you ran into a network problem or upgraded it wrong, who cares? That's a potentially hundred-thousand-dollar learning opportunity at a large organization, for just the cost of electricity in your home.
For what it's worth, I run Kubernetes both in my day job and in my home lab. I've learned more about networking from running my own cluster on bare metal (HP DL360 boxes) than I have from ten years of managing infrastructure for bigcorp's, and it also gives me a safe place to play with functionality that I might want to adopt at work.
Do you have any pointers towards how to get going?
If you want your Services to behave a little bit more like physical boxes or VMs, where each service gets its own IP address (instead of using an ingress controller or service mesh, which are different beasts entirely), have a look at MetalLB. MetalLB allows associating an IP address on your home network with a Service, which is more or less exactly what you'd do with a VM or a Raspberry Pi.
It's definitely gone down a few times, but I've learned a TON tinkering with it. super easy to spin up a new hobby project, a nice web UI for seeing what the heck is going on.
I've completely borked it a couple times and survived one micro pc migration. Can't recommend it more
You can use portainer if you need a GUI but command line is not that complicated if you are comfortable with CLIs.
I did a disaster recovery test of my services (all on docker) from scratch and without documentation.
I started by downloading the Debian ISO and went from there.
It took me about 60 minutes to be up and running, this includes DHCP from pihole and getting everything from borg backups.
I took notes to be faster/better next time but I lost them somehow :-|
Docker everything and you will be fine.
I used docker compose for years and recently moved to portainer because it is easier to keep track of the docker compose files and you have a lot of things readily available
Every time I bootstrap it, I go through the instructions I wrote and improve/update if necessary.
Much simpler than Ansible
containers are just a packaging/isolation technique. you can keep using an obsolete stack in a container, regardless of what changes outside it. rebuilding containers from scratch is certainly not easier than rebuilding an install via ansible.
Container host.
> rebuilding containers from scratch is certainly not easier than rebuilding an install via ansible.
How so? The OP is giving an example of ansible scripts breaking because of OS version change, and having to fix them. With containers, the container OS is very slim, so fewer things to break with upgrades, and you can upgrade the host OS easily since docker is quite stable across OS versions.
Trying to replicate "the cloud" at home is a nice way to tie their own genitals, hang some loads than start jumping.
Said that: do not use Raspi or NAS, assemble a small desktop, it can be a NAS, a router, a server for any kind of service and it's just common commodity hw, the best supported in the FLOSS world, the quickest to be replaced/the cheap for spare parts. Desktop iron today does not eat that much electricity and have enough for most common needs. And using NixOS or Guix System you do not need to run a gazillion of stuff just to show a damn hello world, so you can milk you hw as needed.
Find a Hoster which offers you a shell login where the Hoster manages close to all services you need, including backups, security updates and so on.
That should massively simplify your setup.
The points to consider:
- architecture. You have three boxes. One of them has lots of storage. One of them is cheap. One costs you monthly. I don't know that this is what you actually want. You probably need a main box that can do anything, a backup facility for that, and a proxy to expose services to the outside world.
- a common operating system on all nodes. I like Debian stable. Not everybody does. Being happy with it is more important than being the "best". But you should only have one.
- automatic backup of config and data. Snapshots are nice.
- if you can't have perfect snapshots, you can at least check your config into git. Use etckeeper.
- set up a common approach to running things. Make everything grab TLS certs from Lets Encrypt through nginx. Make a new user for each service. Make a new database user for each service that needs that, make a new PHP worker pool, whatever. Be consistent.
- document your policy and your exceptions. This can be a text file or a wiki or something weird.
- know where you are getting things, how to upgrade them, and how to get announcements of available updates.
Thanks for etckeeper, it looks really interesting, I didn't know about it.
Haven't had to touch the system more than once a year or so when I got an alert that unattended upgrades couldn't install something.
The only thing I do not upgrade automatically is Home Assistant (major tag, the minors are). I had one failed update and it created tension say home when lights stopped to work, people stuck in fridges, aliens landing and whatnot. It fid not help I was 1500 km away.
I just use NixOS flakes with a syncthing'ed flake repo across 5 hosts (desktop, laptop, a media device (NUC7), a home server and a VPS). It has its problems, but I'll iron them out eventually.
As always start small...
If you're running a server in the cloud it's already available.
It takes no effort to set up yourself .. and it's just a basic script that is run that sets up a server exactly how you want it.
When you create backups of the state of the machine (even if the backups are just tarballs sent over ssh to the other machines), include a copy of that file.
Learning another DSL or desired state config system is going to be a pain because they lack a lot of things programmers like, like breakpoints, good LSPs and crucially: reproducible environments.
Worse still, the DSLs shift around. I know cfengine, puppet, chef, salt and ansible. Because they keep getting replaced with cleaner abstractions over time or have a kinder eye on them from the community.
Do the simple thing, document what you do to your machines. Its not sexy but you dont have to unlearn patterns or spend time trying to fix your environment just to make your docs (which now are code) work automatically.
My personal wiki has very short notes on how to rebuild each from scratch. (Pretty much: push Linode Web site buttons to make a new Debian Stable instance, get a shell and do this `apt install` command line, and edit config file like so). Data gets pushed/pulled via simple shell scripts run on laptop (usually using SSH and rsync).
Separate from those services, my GPU server is a separate box at home, and frequently changing at a low level, so blasting it entirely a few times has made pragmatic sense, so I'm glad it's not sharing config complexity any other resources. And setting up the large ML stacks down to proprietary drivers sometimes is initially very experimental, and I need to do it manually first anyway, not yet ready to make a Dockerfile or set up passthrough for containers, and after the experiment works, there's no reason to do that. Were I making a production setup, or something reproducible by others, I'd do more after the initial experiment setup.
Wrangling much more complex layers atop (e.g., K8s, Docker, Terraform, Ansible, NixOS, etc.) sometimes means more things that can go wrong, and sometimes more time spent learning someone else's bureaucracy. Most of tech work now is learning piles of other people's bureaucracy. That makes sense for businesses that actually need that complexity, and for people who just want to copy&paste cargo-cult command lines and hopefully it works, and for people who want to homelab it for experience (which is perfectly valid). But the way I run my important services and my experimental box seemed to be easier overall.
Of course, for curiosity/resume/masochism purposes, I do have a separate K8s cluster at home, which runs nothing important, and which I can obliterate and change and experiment with at will, without being encumbered by it running services I actually need.
But for a non-programmer, it's understandable you don't want to be bother with the inner workings of your OS and how to maintain Ansible script idempotency.
And for every piece of software you want to run on your server, the idempotency task grows more difficult.
My honest opinion? Tolerate the learning curve for docker-compose. Each application you need can be managed and tweaked in isolation. Troubleshooting "works on my machine" problems will cost you more time in the long-run. You can't anticipate all the weird interactions between your programs and the os. Being able to nuke the setup and rebuild from scratch is your most valuable tool.
- thin base os (install just enough to run docker-compose)
- maintain images for each of your apps you need.
- mount the essential volumes of each image to well known location on your hard drive to make manual backups easy.
My home network is just openwrt, and I use make plus a few scripts and imagebuilder to create images that I flash, including configs.
For rpi I actually liked cloud-init, but it is too flaky for complicated stuff. In that case I would nowerdays rather dockerize it and use systemd + podman or a kubelet in standalone mode. Secrets on a mount point. Make it so that the boot partition of the rpi is the main config folder. That way you can locally flash a golden image.
Anything that mutates a server is brittle as the starting point is a moving target. Building images (or fancy tarballs like docker) makes it way more likely that you get consistent results.
Might I suggest a different route that I took - use the base image from whatever vps and modify as little as possible of it. Then run everything else in docker.
That's how I migrated my placeholder website and my gogs install across to a new provider: I copied my data across and ran the original commands to launch docker containers that I used on the first server. These are now happily running on the new server.
Presumably you're trying to replace some paid services with local self-hosting? Consider that paying for a service _is_ the simpler option.
But you can easily fall into the trap of having a bazillion underspecified informal languages if you try cobbling together bash scripts, dockerfiles, and whatever other thing you need ad-hoc.
Nix is probably a good investment in that light. My personal concern is that it moves rather fast, and some things should run themselves and stay secure without being touched more than once a year.
For cloud/vps stuff I use a bunch of docker-compose files + configs that do pretty much everything. The underlying os is usually Debian because it’s what I’m used to and it doesn’t break stuff by going too fast.
dpkg --get-selections >>installed_packages
git init
git add installed_packages /etc /home/*/.* /root /whateverneeded
git commit -m "system init"
on a new system just copy over the .git folder
install packages from installed_packages, then git checkout
reboot
that's all :D
or there is chef, puppet