Old school Linux administration – my next homelab generation (opens in new tab)

(scholz.ruhr)

178 points_fnqu3y ago110 comments

110 comments

76 comments · 26 top-level

bayindirh3y ago· 11 in thread

As a person who manages a big fleet of servers containing both pets and cattle, the upkeep of the pets is nowhere near the cloud-lovers drum-up.

A server installed with half-decent care can run uninterrupted for a long long time, given minimal maintenance and usual care (update, and reboot if you change the kernel).

Also, not installing a n+3 Kubernetes cluster with an external storage backend reduces overheads and number of running cogs drastically.

VMs, containers, K8S and other things are nice, but pulling the trigger so blindly assuming every new technology is a silver bullet to all problems is just not right on many levels.

As for home hardware, I'm running a single OrangePi Zero with DNS and SyncThing. That fits the bill, for now. Fitting into smallest hardware possible is also pretty fun.

GordonS3y ago

> A server installed with half-decent care can run uninterrupted for a long long time, given minimal maintenance and usual care (update, and reboot if you change the kernel).

For my earlier home setups, this was actually part of the problem! My servers and apps were so zero-touch, that by the time I needed to do anything, I'd forgotten everything about them!

Now, I could have meticulously documented everything, but... I find that pretty boring. The thing with Docker is that, to some extent, Dockerfiles are a kind of documentation. They also mean I can run my workloads on any server - I don't need a special snowflake server that I'm scared to touch.

bayindirh3y ago

We have found out that, while some applications are installed much easier with Docker, operating them becomes much more harder on the long run.

NextCloud is a prime example. Adding some extensions (apps) on NextCloud becomes almost impossible when installed via Docker.

We have two teams with their own NextCloud installations. One is installed on bare metal, and other one is a Docker setup. The bare metal one is much easier to update, add apps, diagnose and operate in general. Docker installation needed three days of tinkering and headbanging to get what other team has enabled in 25 seconds flat.

To prevent such problems, JS Wiki runs a special container just for update duties for example.

I'd rather live document an installation and have an easier time in the long run, rather than bang my head during some routine update or config change, to be honest.

I document my home installations the same way, too. It creates a great knowledge base in the long run.

Not all applications fit into the scenario I told above, but Docker is not a panacea or a valid reason to not to document something, in my experience and perspective.

2 more replies

3np3y ago

If you make a habit to perform all changes via CI (ansible/chef/salt, maybe terraform if applicable) you get this for free too. See your playbooks as "dockerfiles".

irusensei3y ago

Where I work maintaining server is a big pain in the ass because of the ever growing security and regulatory compliance requirements. The rules makes patching things an exercise in red tape frustration.

Last year when we’ve been asked to redeploy our servers because the OS version was being added to a nope list we decided to migrate to Kubernetes so we don’t have to manage servers anymore (the nodes are a control plane thing so not our problem). Now we just build our stuff using wathever curated image is there that can be used and ship it without worrying about OS patches.

bauruine3y ago

>Now we just build our stuff using wathever curated image is there that can be used and ship it without worrying about OS patches.

So you basically replaced the "we regularly have to update the OS" with "we regularly have to pull the newest image". It's possible because I have a Linux admin background but I don't see that big of a difference here. Oh and just using whatever curated image is there doesn't necessarily provide you with a secure environment. [0]

[0] https://snyk.io/blog/top-ten-most-popular-docker-images-each...

1 more reply

yjftsjthsd-h3y ago

The thing that helped me was realizing that for a homelab set up, running without the extra redundancy is fine. Now for me that meant running k8s on a single box because I was specifically trying to get experience with it, and putting the control plane and actual workloads on a single machine simplified the whole thing to the point that it was easy to get up and running; I had gotten bogged down in setting up a full production grade cluster, but that wasn't even remotely needed for what I was doing.

turtledragonfly3y ago

Amen (:

Checking uptime on my FreeBSD home server ... 388 days. Probably 15-year-old hardware. That thing's a rock.

I can't say I'm very proud of my security posture, though (:

One day I'll do boot-to-ZFS and transubstantiate into the realm of never worrying about failed upgrades again.

nix233y ago

UFS Boot environments?

https://forums.freebsd.org/threads/ufs-boot-environments.796...

1 more reply

xani_3y ago

Our biggest project (with a bunch of different infrastructure stuff) is also one with the least time spent per instance). We have months where no ops even logged on their VMs.

Our highest maintenance stuff is entirely "the dev fucked up"/"the dev is clueless". Stuff like server deciding to dump gigabytes of logs per minute or run out of disk space coz of some code error. Only difference compared to k8s is that the dead app would signal dev first, not actual monitoring we had.

And honorable mention for WordPress "developers" that make the first place in "most issues per server" every single year. And we have very few WP compared to everything else. Stuff like "dev uploaded some plugins, got instantly hacked (partially, we force using outgoing proxy and that stopped full compromise), we reverted it, he uploaded same stuff and got hacked again". So, nothing to do with actual servers.

Stuff like setting up automation to deploy ceph cluster took some time... once. Now it takes nothing. We even managed to plug it to k8s cluster.

> A server installed with half-decent care can run uninterrupted for a long long time, given minimal maintenance and usual care (update, and reboot if you change the kernel).

could even not be touched if you set up automated updates and reboots, the red tape is usually a problem, like dumbly written support deals needing notification for every restart even if service is in HA. Or some dumbo selling a service with SLA but only on single machine...

ksbrooksjr3y ago

Even changing the kernel can often be done without rebooting the system depending on which distro you're using. Quite a few distros now include some sort of kernel livepatching service including Ubuntu, Amazon Linux, and RHEL.

bayindirh3y ago

Yeah, We're aware of that, but none of our pets is mission critical enough to prevent 5 minutes of downtime, at most, in most cases.

1 more reply

woopwoop243y ago· 7 in thread

HA is overrated, i'd much rather go for a low mean time to repair. Backups, reinstall, ansible playbook is my way to go if hardware fails, which is quite rare to be honest. HA goes beyond hardware in terms of "electricity, internet connection, storage, location etc.., IMO people quite often underestimate what it means to have real HA --> second location with identical setup to shift workload or even have active-active instead of active-passive. i have an intel nuc as plain debian server with containers on it. 2 raspberry pi's(act as loadbalancers with traefik and authelia on top) and a hetzner vm all connected through wireguard. All is configured via ansible and i rely on LE certificates to connect via https or directly via wireguard vpn to http if i want it only exposed via vpn. encrypted backups are copied via wireguard to an offsite storage box.

full down to back online if hardware is not damaged is less than 15 min.

it is very easy to do and rock solid, unattend-upgrades does the trick. i tried almost every combination and even though i am a k8s user from version 1.2 onwoards the complexity for k8s at home or even vsphere is too much for me vs. this super simple and stable configuration i now have.

xani_3y ago

> HA is overrated, i'd much rather go for a low mean time to repair.

Those are almost entirely independent domains

the only point where they connect is that

> full down to back online if hardware is not damaged is less than 15 min.

automation makes both easier to deal with.

"Mean time to repair" is all fine if you don't have to drive 30 minutes to datacenter to swap out stuff.

Even if its cloudy cloud and you have backups, restores can take a lot of time.

Also you will want HA when it's your internet gateway to die

There are also levels of implementation. Making fully redundant set of loadbalancers is relatively easy. But any stateful app is much, much harder. In case of applications active-passive type of setups are also much easier than full on active active redundancy, especially if app wasn't written for it.

R0b0t13y ago

Quick to repair is also a lot more versatile. Unfortunately, I had to work in a lot of environments with proprietary, vendor-locked software and hardware. Usually all you can do is make sure you design the system so that you can chuck entire parts of it (possibly for rework, but sometimes not) if it breaks or gets compromised.

Definitely relevant for, say, SCADA controls with terrible security.

bravetraveler3y ago

I completely agree, but there's an important balance to consider. HA is a good investment for things that are 'stateless' -- eg: VIPs/LBs

Rebuilding existing systems (or adding new ones) is excellent, but sometimes life really gets simpler when you have one well-known/HA endpoint to use in a given context

21433y ago

I'm not from this universe.

What does HA stand for?

Thank you.

ranic3y ago

High Availability, I believe.

j453y ago

HA is built into platforms like proxmox, what is the stress?

woopwoop243y ago

tl:dr if you need it, go for it :) my 2 cents is not needed at home and even for some of my clients the complexicty vs a simple setup and a few min downtime still speaks for MTTR instead of HA which noone can debug.

not about stress but to have HA proxmox cluster you will need at least 3 machines or fake one with a quorum machine without vms on it. Sure your vms will run if one machines goes down, but do you have HA Storage underneath? ganesha would work but more complexity or another network storage with more machines. Don't get me wrong it is fun to play with, but i doubt any homelab needs HA and cannot have a few minutes downtime. Or what do you do in case your internet provider goes down or when you have a power outage? i don't want to provoke, i have fiber switches and 10G at home and have 2 locations to switch in case one location goes down but i can live with multiple days of downtime if i have to, or if not i take my backups and fire some VM's on some cloudprovider and be back online in a few min and pay for it because backups are in both locations.

i would much rather develop a simple LB + autoscaling group with a deployment pipeline (lambda or some other control loop) and containers on it than a k8s cluster the client is not prepared for. if they outgrow this, most likely the following solution is better than going 100% "cloud" the first time. Most clients go from java 8 Jboss monolith to spring containers in k8s and then wonder why it is such a shitshow. but yeah pays the bills so i am not complaining that often ^^

1 more reply

blinkingled3y ago· 6 in thread

It's absolutely fine and I'd argue essential to know Linux system administration even for people who mostly work in the cloud. From my recent experience sys admin skills seem to be a lost art - people no longer have to care about the underpinnings of a Linux system and when problems hit it shows.

It's interesting from a job market standpoint how little incentive there is to learn the basics when more and more we are seeing developers being made responsible for ops and admin tasks.

jbotdev3y ago

I think nowadays requiring that all application developers should also do ops (DevOps?) is a bad idea. Sure they should have basic shell skills, but when you’re on Kubernetes or similar, understanding what’s underneath is not vital. Instead, rely on specialized teams that actually want to know this stuff, and become the experts you escalate to only when things really go sideways and the abstractions fail (which is rare if you do it well). If your budget is too small for this, there are always support contracts.

As someone who’s been hiring for both sides, I see this reflected in candidates more and more. The good devs rarely know ops, and the good ops rarely code well. For our “platform” teams, we end up just hiring good devs and teaching them ops. I think the people that are really good at both often end up working at the actual cloud providers or DevOps startups.

blinkingled3y ago

But then there's the problem that now you have two teams - ops team doesn't understand the app and app team doesn't understand the ops/infra side. I of course agree with your point that there should be two teams but you need a few guys who understand both app dev and ops/infra/OSes/k8s internals etc. And finding these people has been nearly impossible.

2 more replies

wjnc3y ago

I see this in my firm. Non-IT management loves to resort to the lowest denominator of 'systems', that is - systems built by people who are not trained as administrators nor experienced, but mainly follow online guides /and deliver/ (at p50). When I mention the resilience of the systems we maintain to senior directors the most common reply is that 1. the activities are not production and 2. the cost of letting our proper IT-departments handle things go way beyond the willingness to pay. When I reply that our non-production activities are still unmissable for anything more than a few days (which defines it as production for me, but not in our IT-risk landscape) I usually get greeted with something along the lines that in all things cloud all is different. For non-techies I think it's just hard to phantom that most of the effort in maintaining systems is in resilience, not in the just scripting it to work repeatedly.

Fiahil3y ago

They most likely resorted to skipping IT involvement because the department is difficult to work with.

"The cloud" is both a technical solution for flexible workload requirements, and a political solution to allow others to continue delivering when IT is quoting them 3 months of "prep work" for setting up a dev environment and 2 to 4 weeks for a single SSL certificate.

As a consultant, I am confronted to hostile IT requirements daily. Oftentimes, departments hiring us end up setting and training their own Ops teams outside of IT. despite leading to incredible redundancy, that's often credited as a huge success for that department, because of the gained flexibility.

amtamt3y ago

So many people somehow believe a single VM with no disk snapshot/ back-up, having a floating IP, running ad hoc scripts to bring up production workload is "production quality", only because it is running in cloud.

bamboozled3y ago

I agree, I'm lucky that in my career I've been exposed so many different disciplines of Network and Linux administration that I know (mostly) what "the cloud" is made of.

So when problems do occur, I can make some pretty good guesses on what's wrong before I actually know for sure.

jerrac3y ago· 4 in thread

I can see the appeal. I've been working on containerizing apps at work. There's a lot of complication there. But I also really really want to be able to run updates whenever I want. Not just during a maintenance window when it's ok for services to be offline. And containers are the most likely route to get me there.

At home, well, I enjoy my work, so I containerize all my home stuff as well. And I have a plan to switch my main linux box to ProxMox and then host both a Nomad cluster and a Rancher cluster.

If I weren't interested in the learning experience, I'd just stick with docker-compose based app deployment and Ansible for configuring the docker host vm.

j453y ago

Docker especially with Portainer is pretty serviceable for an at home setup.

Once the line has been crossed with "I should be backing this up", "I should have a dev copy vs the one i'm using", the existing setups can port quite well/into something like ProxMox to keep everything in a homelab setup running relatively like an appliance with a minimum of manual system administration, maintenance and upgrading.

jerrac3y ago

If Docker Swarm was a bit better, I'd have all my test instances at work running a Docker Swarm, Portainer, and Traefik stack. Unfortunately Swarm has some quirks that make running stateful apps a bit difficult.

For home use, I've been experimenting with Portainer. It seems to work well for apps I'm not developing and am just running.

1 more reply

woopwoop243y ago

if you have more than one pod running at the same time, otherwise will be downtime if the container spawns somewhere else or am i missing something?

jerrac3y ago

I assume you mean at work?

Yes, we'd have more than one instance of the app running. Ideally 3 or 5 instances. And they'd be load balanced. So I could update the app one instance at a time with no downtime. At least that is the goal.

davidwritesbugs3y ago· 4 in thread

Can someome explain "Pets" and "Cattle"? Not seen this useage before, think I get it but want to be sure. Pets are servers for love and fun, Cattle are servers exploited for loveless meat & milk?

frellus3y ago

"pets" - servers which require unique, individual care, love, naming and feeding and are irreplaceable in their function. You know it's a "pet" server when you think long and hard when you name it

"cattle" - servers which have no distinct personality traits, are fed, loved and cared for as a group or heard. You know it's a "cattle" server when the name isn't something you can easily remember

Neither pets nor cattle are "exploited", they all have their jobs to do, however they have an ROI and also a consequence of being unique or not. Cattle tend to be born, graze, die and it's part of the cycle. Pets die, and you are emotionally upset and do something stupid -- and you find yourself doing unholy things to resurrect them like using backup tapes or something

davidwritesbugs3y ago

"Exploited" was my failed attempt at humour :)

pixelmonkey3y ago

This post has the slide that seared into my memory when I first heard the analogy in 2012-2013 or so:

https://www.engineyard.com/blog/pets-vs-cattle/

Direct link to slide screencap here:

https://www.engineyard.com/wp-content/uploads/2022/02/vCloud...

hereyougo23y ago

https://cloudscaling.com/blog/cloud-computing/the-history-of...

dade_3y ago· 3 in thread

I love LXC/LXD for my home server. Far easier to use and maintain than VMs, fast, and use far less resources than VMs. And understanding containers is great foundational knowledge for working with Docker and K8s. They also work great with NextCloud, Plex, PostgreSQL, and Zabbix, and SAMBA. But each are separate, no risk of a library or an OS upgrade taking out an app (and my weekend along with it). Snapshots are the ultimate ctrl-z, and backups are a breeze once you get past the learning curve.

Ansible with ‘pet’ containers is the way to go. Use it to automate the backups and patching. Ansible cookbooks are surprisingly easy to work with. Again, a learning curve, but it pays for itself within months.

Running a single machine with all the apps in a single environment is a recipe for tears as you are always one OS upgrade patch, library requirement change, hard drive failure away from disaster and hours of rebuilding.

willjp3y ago

This sounds like my setup exactly, except instead of LXC/ansible, I’m using FreeBSD jails/saltstack.

I completely agree about the value of isolation. You can update or up-rebuild for a new os version on one service at a time - which I find helpful when pulling in updated packages/libraries. You can also cheaply experiment with a variation on your environment.

whitepoplar3y ago

Would you use LXD in production?

yokem553y ago

If you build it yourself from source or use distro packages. Running it under cannonical's snap packages is a bit of a nightmare because of issues around the forced auto updates.

1 more reply

mt_3y ago· 3 in thread

Automation is useful, even in homelab environments, since manual configuration introduces human errors, and automation can speed up recovery from disasters. This article is just for the sake of it. The author calls it old administration, but he better call it outdated and inefficient Linux administration.

marginalia_nu3y ago

Automation is still subject to human errors, and can propagate them much farther.

sophacles3y ago

So do like you would in old-school, have a staging/testing environment.

Nice thing about automation, is you can just rebuild that easily to a clean state, rather than having to remember how staging is different from prod. And as you write your automation and find those human errors, you put it in code so that you don't forget a step. Nothing more frustrating than thinking you've solved the problem in staging only to run a command in prod that breaks things more, and in a different way, because it turns out the fix was a combination of earlier experimentation plus the command you just ran.

eternityforest3y ago

Automation makes things repeatable. If errors happen, in the "cattle not pets" mindset, you nuke it from orbit and let Ansible make you a new one from a clean Debian/RHEL image. As long as your backup of user data is good(Better double check that script, if you have a script for that) it doesn't matter.

As long as you avoid technology that takes more than a few minutes to set up, rebuilding is trivial.

nixcraft3y ago· 2 in thread

Once you work in enterprise IDC, setting up home labs is nothing but a liability. Unwanted stuff like UPS, cooling and more bills. I have one Intel NUC running FreeBSD 13 (dual SSD with 32 GB ram and Intel i7 CPU 8th GEN) with one jail and VM. It acts as a backup server for my Ubuntu laptop and MacBook pro. Nightly I dump data to cloud providers for offsite reasons. Finally, I set LXD, Docker and KVM on my Ubuntu dev laptop for testing. That is all. No more FreeNAS or 3 more servers including 2 RPis. I made this change during the first lockdown. They (I mean fun and excitements) went away once I handled all those expensive IT equipment/servers/firewalls/routers/WAFs funded employer ;)

ianai3y ago

You’re still doing a lot of home serving/IT there.

I take your point and generally agree. Leave the major hardware to the actual data centers. Stuff at home should be either resume driven or very efficient.

neoromantique3y ago

Agreed, I used to run enterprise hardware at home, and it was certainly fun to tinker with (Not to mention all of the blinking lights!).

Last year I ran some numbers and realized that it just wasn't worth it, as electricity costs alone make it fairly close to what Hetzner/OVH would charge for similar hardware. I also had power go down in my area, for probably first time in 5 years or so that I lived there, which I took as a sign to just migrate away.

I peer it into my internal network using WireGuard, so I barely notice a difference in my use and now that electricity costs are skyrocketing in Europe I'm certainly very happy I went this route.

1 more reply

greggyb3y ago· 2 in thread

Are there any good resources for what to mind as a sysadmin? It seems that most resources I find when searching, or that come across my radar (like this article) are focused on mechanism and how-to.

I often feel a bit out of my depth when dealing with sysadmin tasks. I can achieve pretty much any task I set out to, but it is difficult to ascertain if I have done so well, or with obvious gaps, or if I have reinvented some wheels.

Does anyone have a recommendation for in-depth "So you want to be a sysadmin..." content? Whether books, articles, presentations, or any other format.

megous3y ago

Depends on what system you want to administrate. The topic is very broad.

But it never hurts to understand the fundamentals:

- networking (IPv4/IPv6/TCP/UDP/ICMP/various link layer technologies (ethernet, wifi, ppp, USB, ...)... and concepts like VPNs/VLAN/bridging/switching/routing/...)

- protocols (DHCP, DNS, NTP, SMTP, HTTP, TLS, ...)

- chosen OS fundamentals (so on Linux things like processes, memory, storage, process isolation, file access control, sockets, VFS, ...)

- OS tooling (process managers, logging tools, network configuration tools, firewall, NAT, ...)

- how to setup typical services (mail servers, file servers, http servers/proxies, DNS servers, DNS resolvers, ...)

- how to use tools to debug issues (tracing, packet capture, logging, probing, traffic generators, ...)

- how to use tools to monitor performance of the network and hosts

These are just some basics to run a home network or maybe a small business network.

greggyb3y ago

This is a helpful review of the fundamentals to cover. I will definitely be doing my own searches for content on the topics.

Do you have any good references for resources in the vein of teaching the fundamentals. Not like "DNS: here is how to configure a BIND server" but like "DNS: Here are the core concepts, what you will typically need to set up, common issues / confusions, and further reading."

I have tried going through RFCs for a lot of these, but they tend to be focused on implementation rather than "why". Similarly, software-specific content is "how to achieve this outcome with this implementation", but lacks context on why one would want that outcome.

1 more reply

404mm3y ago· 2 in thread

I’ve been there. Many technologies OP listed .. it all becomes a nightmare without proper HA (but who want’s to have _multiple_ servers at home (servers as enterprise HW, not a computer designation). Then you hit an issue with licensing, old HW not being supported by recent releases (looking at you, ESXi), loud and heat producing boxes in your closet... I am grateful for having this awful experience but I moved on as well.

For me, the middle ground seems to be a very plain Debian server, configured with Ansible and all services running in Docker. I consume a few official images and have a (free) pipeline set up to create and maintain some of mine own. I appreciate the flexibility of moving a container over to different HW, if needed, and also the isolation (both security and dependencies).

Going back pure, OS level configuration is always fun - and often necessary to understand how things work. It’s very true that the new cattle methodology takes away any deeper level of troubleshooting and pushes “SRE” towards not only focusing on their app code but also away from understanding the OS mechanics. Instance has died? Replace it. How did it happen? Disk filled up because of wrong configuration? Bad code caused excessive logging? Server was compromised? _They_ will never know.

jmillikin3y ago

  > cattle methodology takes away any deeper level of troubleshooting and
  > pushes “SRE” towards not only focusing on their app code but also away
  > from understanding the OS mechanics

This is a minor point, but as a former SRE: We are the people writing the layer between the OS and the product developers (owners of the "app code"). Understanding the OS mechanics is a core part of the job.

I know some places are trying to redefine "SRE" to mean "sysadmin", but that seems silly given that we'd need to invent a new term to mean the people who write and operate low-level infrastructure software that the rest of a modern distributed stack runs on.

404mm3y ago

SRE seems to mean something else at every company. As a current SRE, I agree that this is what _I_ do most of the time. But while saying that, I also don’t see myself as “true SRE”, not the way Google defined it. SRE should become the code owner once it becomes stable. This alone adds heavy emphasis on having a background in software development.

Most of the time, I see two kinds of SRE around me. 1. The “rebranded sysadmin” SRE. Typically heavy on Ops skills and deep OS level knowledge but no formal education or experience in Software development.

2. The “SDE in Cloud” SRE. Person coming from software development side, having very little Ops experience, relying on benefits of quick and disposable computational power from your favorite Cloud provider.

I don’t mean to “shame” either of those cases, I am just pointing out that person with years (or decades) of experience on one side cannot suddenly become well balanced SRE the way Google defined it.

neoromantique3y ago· 2 in thread

Honestly, it sounds more as if author is just not experienced enough to reap the rewards of containers/IaaC.

Having some servers as pets is fine for time being, but after being used to the cattle little pesky issues & upkeep starts to annoy more and more and you either give up on it and leave neglected or slowly build up most of the infra you ran away from in the first place.

blueflow3y ago

It depends on your software selection. I run machines than i do updates on twice a year and thats all the maintenance they need. Less than a person-day per year!

neoromantique3y ago

You're right, I might've extrapolated my usage/experience onto the author accidentally, maybe his use case is indeed different.

I still don't know why you wouldn't use containers though, even if you dislike Docker, something like LXC makes everything oh-so neater.

1 more reply

don-code3y ago· 1 in thread

My home lab sounds pretty similar to the author's - three compute nodes running Debian, and a single storage node (single point of failure, yes!) running TrueNAS Core.

I was initially pretty apprehensive about running Kubernetes on the compute nodes, my workloads all being special snowflakes and all. I looked at off-the-shelf apps like mailu for hosting my mail system, for instance, but I have some really bizarre postfix rules that it wouldn't support. So I was worried that I'd have to maintain Dockerfiles, and a registry, and lots of config files in Git, and all that.

And guess what? I do maintain Dockerfiles, and a registry, and lots of config files in Git, but the world didn't end. Once I got over the "this is different" hump, I actually found that the ability to pull an entire node out of service (or have fan failures do it for me), more than makes up for the difference. I no longer have awkward downtime when I need to reboot (or have to worry that the machines will reboot), or little bits of storage spread across lots of machines.

Sebb7673y ago

I fully agree. If you come from "old-school" administration, Docker and Kubernetes seem like massive black boxes that replace all your known configuration screws with fancy cloud terms. But once you get to know them, it just makes sense. Backups get a lot simpler, restoring state is easy and keeping things separated just becomes a lot easier.

That being said, I can only encourage the author with this plan. All those abstractions are great, but at least for me it was massively valuable to know what you are replacing and what an old-school setup is actually capable of.

dual_dingo3y ago· 1 in thread

I've done that in the past. It was fun and nostalgic. Then I had to upgrade the OS to a newer version and things were not fun anymore. At all. And I remembered why VMs and containers are so extremely useful.

nix233y ago

You are right, but he is using RHEL 9 so you have >10 years without upgrade's ;)

netfortius3y ago· 1 in thread

As much as people on HN frown upon anything reddit related, the /r/SelfHosted is an amazing source for what the article topic is about. In fact I came across large organizations running the likes of Proxmox (VE), Ceph, Minio, k3s, Graphana, etc., etc., with a lot of info available in this subreddit. Highly recommended.

bitlax3y ago

In my experience, Hacker News has been one of the places that's unusually sympathetic to Reddit, probably because it originated with YCombinator. If you get the impression that people here generally dislike Reddit you should take that as a a strong indicator that Reddit has actually become a cesspool.

dapozza3y ago· 1 in thread

I think this is to oldscool. You can still automate via Ansible, keep your playbooks and roles in Git and run VM's.

b14763y ago

Exactly this. I have a big nostalgia for old school sysadmin ways of working, but just use Ansible. It’s beyond easy and can be as simple or complex as you like. I use containers on my home server (with podman) alongside a bunch of other services and it’s all just so clean and easy to maintain as a collection of roles and playbooks.

snickersnee113y ago

Yes, there's really something about old school system administration. I do really appreciate how Fedora and Debian SA team does it, you can see here: - https://pagure.io/fedora-infrastructure/ - https://docs.fedoraproject.org/en-US/infra/ - https://dsa.debian.org/

joshka3y ago

Sounds like the author realized they have a pet that they've been trying to treat like some weird franken-cattle for some time.

The note about dns is fairly orthogonal to this though. Sensible naming should be the default regardless of the physical setup. My photos are at photos.home.mydomain.com, my nas is nas.home.mydomain.com. They're the same hardware - perhaps different containers.

There are a million ways to make your life harder in this world, and a bunch of ways to make it easier. I'd put docker in the simplify category, and a bunch of the other tooling the author mentions in the complexify side. YMMV.

lakomen3y ago

He should make 1 big VM and make snapshots before trying new experiments.

That's how I'll go about the next server I rent. And that's how I did with my main project.

You can have nested VMs if you need them.

But do whatever you want ofc.

zrail3y ago

I really like this article just for the straightforwardness of the setup. Pets not cattle should be the home server mantra.

My setup is not quite as simple. I have one homeprod server running Proxmox with a number of single task VMs and LXCs. A task, for my purposes, is a set of one or more related services. So I have an internal proxy VM that also runs my dashboard. I have a media VM that runs the *arrs. I have an LXC that runs Jellyfin (GPU pass through is easier with LXC). A VM running Home Assistant OS. Etcetera.

Most of these VMs are running Docker on top of Alpine and a silly container management scheme I've cooked up[1]. I've found this setup really easy to wrap my head around, vs docker swarm or k8s or what have you. I'm even in the process of stripping dokku out of my stack in favor of this setup.

Edit: the homeprod environment also consists of a SFF desktop running VyOS and a number of Dell Wyse 3040 thin clients running the same basic stack as the VMs, most running zwave and zigbee gateways and one more running DHCP and DNS.

[1]: https://github.com/peterkeen/docker-compose-stack

UncleEntity3y ago

> For my own personal tasks I plan on avoiding sudo at all. I've often found myself using it as shortcut to achieve things.

That’s all I use (for administration stuff), found out not too long ago I don’t even know the su password or it isn’t set up to allow users to su — dunno, never missed the ability until a couple weeks ago when I got tired of typing sudo for multiple commands.

Also found out a long time ago installing software to /opt turns into a big enough PITA that I’ll go to all the trouble to make an rpm and install stuff using the package manager. Usually I can find an old rpm script thingie to update so isn’t too much hassle and there’s only a couple libraries I use without (un)official packages. Why, one might ask, don’t I try to upstream these rpm so everyone will benefit? Well, I’ve tried a couple times and they make it nearly impossible so I just don’t care anymore.

irusensei3y ago

I think homeserver and home lab are different things.

For homeserver I use some rock chip arm computers. It runs stuff that is (arguably) useful like a file server and a local DNS with ad and malware filtering rules. These are two fanless ARM machines that use less than 15W idle and they have a single OS and funny names. You can even say 2 machines is a bit too much already.

For home lab I’ve set a k3s environment on Oracle free tier cloud. No personal data or anything that would cause me problems if Oracle decides supporting my freeloading ass is not worth the hassle. I can test things there for exemple recently dabbling into Prometheus since workplace will soon introduce it as a part of a managed offering for Azure Kubernetes services.

graphviz3y ago

I can't even expand half the acronyms in this article. When do people have time to work on anything besides keeping their "system" up to date. python-is-python2? Maybe use sed to write all your code, too.

l0rn3y ago

And the cycle begins once again :)

jmclnx3y ago

I agree with him, simpler is better and I still have not jumped on the container bandwagon. I did use Jails on FreeBSD and from my limited experience, I still think Jails is much better than anything Linux offers.

nix233y ago

Nice, did the same but with FreeBSD and Jail's (for organizational reasons), having a pet server makes you learn that system better and solve problems instead of destroy and re-create...and it's fun.

frellus3y ago

In your list of things to test, let me save you some time/effort:

"Hyper-V" -- Don't. Just... don't. Don't waste your time.

Source: ran Hyper-V in production for 3 years. Still have nightmares.

j / k navigate · click thread line to collapse

110 comments

76 comments · 26 top-level

bayindirh3y ago· 11 in thread

As a person who manages a big fleet of servers containing both pets and cattle, the upkeep of the pets is nowhere near the cloud-lovers drum-up.

A server installed with half-decent care can run uninterrupted for a long long time, given minimal maintenance and usual care (update, and reboot if you change the kernel).

Also, not installing a n+3 Kubernetes cluster with an external storage backend reduces overheads and number of running cogs drastically.

VMs, containers, K8S and other things are nice, but pulling the trigger so blindly assuming every new technology is a silver bullet to all problems is just not right on many levels.

As for home hardware, I'm running a single OrangePi Zero with DNS and SyncThing. That fits the bill, for now. Fitting into smallest hardware possible is also pretty fun.

GordonS3y ago

> A server installed with half-decent care can run uninterrupted for a long long time, given minimal maintenance and usual care (update, and reboot if you change the kernel).

For my earlier home setups, this was actually part of the problem! My servers and apps were so zero-touch, that by the time I needed to do anything, I'd forgotten everything about them!

bayindirh3y ago

We have found out that, while some applications are installed much easier with Docker, operating them becomes much more harder on the long run.

NextCloud is a prime example. Adding some extensions (apps) on NextCloud becomes almost impossible when installed via Docker.

To prevent such problems, JS Wiki runs a special container just for update duties for example.

I'd rather live document an installation and have an easier time in the long run, rather than bang my head during some routine update or config change, to be honest.

I document my home installations the same way, too. It creates a great knowledge base in the long run.

Not all applications fit into the scenario I told above, but Docker is not a panacea or a valid reason to not to document something, in my experience and perspective.

2 more replies

3np3y ago

If you make a habit to perform all changes via CI (ansible/chef/salt, maybe terraform if applicable) you get this for free too. See your playbooks as "dockerfiles".

irusensei3y ago

bauruine3y ago

>Now we just build our stuff using wathever curated image is there that can be used and ship it without worrying about OS patches.

[0] https://snyk.io/blog/top-ten-most-popular-docker-images-each...

1 more reply

yjftsjthsd-h3y ago

turtledragonfly3y ago

Amen (:

Checking uptime on my FreeBSD home server ... 388 days. Probably 15-year-old hardware. That thing's a rock.

I can't say I'm very proud of my security posture, though (:

One day I'll do boot-to-ZFS and transubstantiate into the realm of never worrying about failed upgrades again.

nix233y ago

UFS Boot environments?

https://forums.freebsd.org/threads/ufs-boot-environments.796...

1 more reply

xani_3y ago

Our biggest project (with a bunch of different infrastructure stuff) is also one with the least time spent per instance). We have months where no ops even logged on their VMs.

Stuff like setting up automation to deploy ceph cluster took some time... once. Now it takes nothing. We even managed to plug it to k8s cluster.

> A server installed with half-decent care can run uninterrupted for a long long time, given minimal maintenance and usual care (update, and reboot if you change the kernel).

ksbrooksjr3y ago

bayindirh3y ago

Yeah, We're aware of that, but none of our pets is mission critical enough to prevent 5 minutes of downtime, at most, in most cases.

1 more reply

woopwoop243y ago· 7 in thread

full down to back online if hardware is not damaged is less than 15 min.

xani_3y ago

> HA is overrated, i'd much rather go for a low mean time to repair.

Those are almost entirely independent domains

the only point where they connect is that

> full down to back online if hardware is not damaged is less than 15 min.

automation makes both easier to deal with.

"Mean time to repair" is all fine if you don't have to drive 30 minutes to datacenter to swap out stuff.

Even if its cloudy cloud and you have backups, restores can take a lot of time.

Also you will want HA when it's your internet gateway to die

R0b0t13y ago

Definitely relevant for, say, SCADA controls with terrible security.

bravetraveler3y ago

I completely agree, but there's an important balance to consider. HA is a good investment for things that are 'stateless' -- eg: VIPs/LBs

Rebuilding existing systems (or adding new ones) is excellent, but sometimes life really gets simpler when you have one well-known/HA endpoint to use in a given context

21433y ago

I'm not from this universe.

What does HA stand for?

Thank you.

ranic3y ago

High Availability, I believe.

j453y ago

HA is built into platforms like proxmox, what is the stress?

woopwoop243y ago

1 more reply

blinkingled3y ago· 6 in thread

It's interesting from a job market standpoint how little incentive there is to learn the basics when more and more we are seeing developers being made responsible for ops and admin tasks.

jbotdev3y ago

blinkingled3y ago

2 more replies

wjnc3y ago

Fiahil3y ago

They most likely resorted to skipping IT involvement because the department is difficult to work with.

amtamt3y ago

bamboozled3y ago

I agree, I'm lucky that in my career I've been exposed so many different disciplines of Network and Linux administration that I know (mostly) what "the cloud" is made of.

So when problems do occur, I can make some pretty good guesses on what's wrong before I actually know for sure.

jerrac3y ago· 4 in thread

At home, well, I enjoy my work, so I containerize all my home stuff as well. And I have a plan to switch my main linux box to ProxMox and then host both a Nomad cluster and a Rancher cluster.

If I weren't interested in the learning experience, I'd just stick with docker-compose based app deployment and Ansible for configuring the docker host vm.

j453y ago

Docker especially with Portainer is pretty serviceable for an at home setup.

jerrac3y ago

For home use, I've been experimenting with Portainer. It seems to work well for apps I'm not developing and am just running.

1 more reply

woopwoop243y ago

if you have more than one pod running at the same time, otherwise will be downtime if the container spawns somewhere else or am i missing something?

jerrac3y ago

I assume you mean at work?

davidwritesbugs3y ago· 4 in thread

Can someome explain "Pets" and "Cattle"? Not seen this useage before, think I get it but want to be sure. Pets are servers for love and fun, Cattle are servers exploited for loveless meat & milk?

frellus3y ago

"pets" - servers which require unique, individual care, love, naming and feeding and are irreplaceable in their function. You know it's a "pet" server when you think long and hard when you name it

"cattle" - servers which have no distinct personality traits, are fed, loved and cared for as a group or heard. You know it's a "cattle" server when the name isn't something you can easily remember

davidwritesbugs3y ago

"Exploited" was my failed attempt at humour :)

pixelmonkey3y ago

This post has the slide that seared into my memory when I first heard the analogy in 2012-2013 or so:

https://www.engineyard.com/blog/pets-vs-cattle/

Direct link to slide screencap here:

https://www.engineyard.com/wp-content/uploads/2022/02/vCloud...

hereyougo23y ago

https://cloudscaling.com/blog/cloud-computing/the-history-of...

dade_3y ago· 3 in thread

willjp3y ago

This sounds like my setup exactly, except instead of LXC/ansible, I’m using FreeBSD jails/saltstack.

whitepoplar3y ago

Would you use LXD in production?

yokem553y ago

If you build it yourself from source or use distro packages. Running it under cannonical's snap packages is a bit of a nightmare because of issues around the forced auto updates.

1 more reply

mt_3y ago· 3 in thread

marginalia_nu3y ago

Automation is still subject to human errors, and can propagate them much farther.

sophacles3y ago

So do like you would in old-school, have a staging/testing environment.

eternityforest3y ago

As long as you avoid technology that takes more than a few minutes to set up, rebuilding is trivial.

nixcraft3y ago· 2 in thread

ianai3y ago

You’re still doing a lot of home serving/IT there.

I take your point and generally agree. Leave the major hardware to the actual data centers. Stuff at home should be either resume driven or very efficient.

neoromantique3y ago

Agreed, I used to run enterprise hardware at home, and it was certainly fun to tinker with (Not to mention all of the blinking lights!).

I peer it into my internal network using WireGuard, so I barely notice a difference in my use and now that electricity costs are skyrocketing in Europe I'm certainly very happy I went this route.

1 more reply

greggyb3y ago· 2 in thread

Does anyone have a recommendation for in-depth "So you want to be a sysadmin..." content? Whether books, articles, presentations, or any other format.

megous3y ago

Depends on what system you want to administrate. The topic is very broad.

But it never hurts to understand the fundamentals:

- networking (IPv4/IPv6/TCP/UDP/ICMP/various link layer technologies (ethernet, wifi, ppp, USB, ...)... and concepts like VPNs/VLAN/bridging/switching/routing/...)

- protocols (DHCP, DNS, NTP, SMTP, HTTP, TLS, ...)

- chosen OS fundamentals (so on Linux things like processes, memory, storage, process isolation, file access control, sockets, VFS, ...)

- OS tooling (process managers, logging tools, network configuration tools, firewall, NAT, ...)

- how to setup typical services (mail servers, file servers, http servers/proxies, DNS servers, DNS resolvers, ...)

- how to use tools to debug issues (tracing, packet capture, logging, probing, traffic generators, ...)

- how to use tools to monitor performance of the network and hosts

These are just some basics to run a home network or maybe a small business network.

greggyb3y ago

This is a helpful review of the fundamentals to cover. I will definitely be doing my own searches for content on the topics.

1 more reply

404mm3y ago· 2 in thread

jmillikin3y ago

  > cattle methodology takes away any deeper level of troubleshooting and
  > pushes “SRE” towards not only focusing on their app code but also away
  > from understanding the OS mechanics

404mm3y ago

neoromantique3y ago· 2 in thread

Honestly, it sounds more as if author is just not experienced enough to reap the rewards of containers/IaaC.

blueflow3y ago

It depends on your software selection. I run machines than i do updates on twice a year and thats all the maintenance they need. Less than a person-day per year!

neoromantique3y ago

You're right, I might've extrapolated my usage/experience onto the author accidentally, maybe his use case is indeed different.

I still don't know why you wouldn't use containers though, even if you dislike Docker, something like LXC makes everything oh-so neater.

1 more reply

don-code3y ago· 1 in thread

My home lab sounds pretty similar to the author's - three compute nodes running Debian, and a single storage node (single point of failure, yes!) running TrueNAS Core.

Sebb7673y ago

dual_dingo3y ago· 1 in thread

nix233y ago

You are right, but he is using RHEL 9 so you have >10 years without upgrade's ;)

netfortius3y ago· 1 in thread

bitlax3y ago

dapozza3y ago· 1 in thread

I think this is to oldscool. You can still automate via Ansible, keep your playbooks and roles in Git and run VM's.

b14763y ago

snickersnee113y ago

joshka3y ago

Sounds like the author realized they have a pet that they've been trying to treat like some weird franken-cattle for some time.

lakomen3y ago

He should make 1 big VM and make snapshots before trying new experiments.

That's how I'll go about the next server I rent. And that's how I did with my main project.

You can have nested VMs if you need them.

But do whatever you want ofc.

zrail3y ago

I really like this article just for the straightforwardness of the setup. Pets not cattle should be the home server mantra.

[1]: https://github.com/peterkeen/docker-compose-stack

UncleEntity3y ago

> For my own personal tasks I plan on avoiding sudo at all. I've often found myself using it as shortcut to achieve things.

irusensei3y ago

I think homeserver and home lab are different things.

graphviz3y ago

l0rn3y ago

And the cycle begins once again :)

jmclnx3y ago

nix233y ago

Nice, did the same but with FreeBSD and Jail's (for organizational reasons), having a pet server makes you learn that system better and solve problems instead of destroy and re-create...and it's fun.

frellus3y ago

In your list of things to test, let me save you some time/effort:

"Hyper-V" -- Don't. Just... don't. Don't waste your time.

Source: ran Hyper-V in production for 3 years. Still have nightmares.

j / k navigate · click thread line to collapse