Bare-Metal Kubernetes, Part I: Talos on Hetzner (opens in new tab)

(datavirke.dk)

214 pointsMathiasPius2y ago77 comments

77 comments

59 comments · 13 top-level

wg02y ago· 20 in thread

I've come to the conclusion (after trying kops, kubespray, kubeadm, kubeone, GKE, EKS) that if you're looking for < 100 node cluster, docker swarm should suffice. Easier to setup, maintain and upgrade.

Docker swarm is to Kubernetes what SQLite is to PostgreSQL. To some extent.

husarcik2y ago

The docker swarm ecosystem is very poor as far as tooling goes. You're better off using docker-compose (? maybe docker swarm) and then migrating to k3s if you need a cluster.

My docker swarm config files are nearly the same craziness as my k3s config files so I figured I might as well benefit from the tooling in Kubernetes.

Edit for more random thoughts: being able to use helm to deploy services helped me switch to k3s from swarm.

amazingman2y ago

This is almost exactly my experience with Docker Compose, which is lionized by commenters in nearly every Kubernetes thread I read on HN. It's great and super simple and easy ... until you want to wire multiple applications together, you want to preserve state across workload lifecycles for stateful applications, and/or you need to stand up multiple configurations of the same application. The more you want to run applications that are part of a distributed system, the uglier your compose files get. Indeed, the original elegant Docker Compose syntax just couldn't do a bunch of things and had to be extended.

IMO a sufficiently advanced Docker Compose stack is not appreciably simpler than the Kubernetes manifests would be, and you don't get the benefits of Kubernetes' objects and their controllers because Docker Compose is basically just stringing low-level concepts together with light automation.

1 more reply

doctorpangloss2y ago

Any sufficiently complicated Docker Swarm, Heroku, Elastic Beanstalk, Nomad or other program contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of vanilla Kubernetes.

wg02y ago

Most smaller teams do not need a full fledge kubernetes anyways.

There's no one size fits all approach. There are trade offs. The Kubernetes tractor needs lots of oiling and what not for all the bells and whistles.

Trade offs is the keyword here.

ori_b2y ago

Unfortunately, the above statement also applies to kubernetes.

2 more replies

blowski2y ago

I agree in part - the features and simplicity of Docker Swarm are very appealing over k8s, but it also feels like so neglected that I'd be waiting every day for the EOL announcement.

wg02y ago

It's built from another separate project called swarm-kit. So if it comes to that where it is abandoned, the forks would be out in the wild soon enough.

I see more risk of docker engine as a whole pulling some terraform/elastic search licensing someday as investors get desperate to cash out.

1 more reply

KronisLV2y ago

> I've come to the conclusion (after trying kops, kubespray, kubeadm, kubeone, GKE, EKS) that if you're looking for < 100 node cluster, docker swarm should suffice. Easier to setup, maintain and upgrade.

Personally, I'd also consider throwing Portainer in there, which gives you both a nice way to interact with the cluster, as well as things like webhooks: https://www.portainer.io/

With something like Apache, Nginx, Caddy or something else acting as your "ingress" (taking care of TLS, reverse proxy, headers, rate limits, sometimes mTLS etc.) it's a surprisingly simple setup, at least for simple architectures.

If/when you need to look past that, K3s is probably worth a look, as some other comments pointed out. Maybe some other of Rancher's offerings as well, depending on how you like to interact with clusters (the K9s tool is nice too).

bionsystem2y ago

When I was deploying swarm clusters I would have a default stack.yml file with portainer for admin, traefik for reverse-proxying, and prometheus, grafana, alertmanager, unsee, cadvisor, for monitoring and metrics gathering. All were running on their own docker network completely separated from the app and were only accessible by ops (and dev if requested, but not end users). It was quite easy to deploy with HEAT+ansible or terraform+ansible and the hard part was the ci/cd for every app each in its tenant, but it worked really really well.

hn_user821792y ago

I’ve been at a company running swarm in prod for a few years. There have been several nasty bugs that are fun to debug but we’ve accumulated several layers of slapped bandaids trying to handle swarm’s deficiencies. I can’t say I’d pick it again, nor would I recommend it for anyone else.

linuxdude3142y ago

Node count driven infrastructure decisions make little sense.

A better approach is to translate business requirements to systems capabilities and evaluate which tool best satisfies those requirements given the other constraints within your organization.

Managed Kubernetes solutions like GKE require pretty minimal operational overhead at this point.

nyljasdfw3422y ago

amount of nodes is a poor position to take... it should be the features and requirements you need for the cluster.

If Docker Swarm satisfies, then yes.

riku_iki2y ago

> Docker swarm is to Kubernetes what SQLite is to PostgreSQL. To some extent.

curious what do you mean? To me Postgresql doesn't have disadvantages over SQLite, everything is just better..

mulmen2y ago

PostgreSQL is more complex to use and operate and requires more setup than SQLite. If you don’t need the capabilities of PostgreSQL then you can avoid paying the setup and maintenance costs by using the simpler SQLite.

1 more reply

Patrickmi2y ago

I was using docker swarm cause of the simplicity and easy setup but the one feature that I really really need was to be able to specify which runtime to use, either I use runsc (and docker plugins don’t work with runsc) or runc as the default and it was too inefficient to have groups of node with certain runtime, I really do like swarm but it misses too much features that are important

MathiasPiusOP2y ago

I haven't had much opportunity to work with Docker Swarm, but the one time I did, we hit certificate expiration and other issues constantly, and it was not always obvious what was going on. It soured my perception of it a bit, but like I said I hadn't had much prior experience with it, so it might have been on me.

vbezhenar2y ago

I didn’t try anything but kubeadm and it worked just fine for me for my 1 node cluster.

wg02y ago

Besides my local cluster of virtual box cluster, I have tried Kubernetes on three clouds with at least a dozen different installers/distributions and operational pain would be a factor going forward has always been my gut feeling.

That's where the author also has following to say:

>My conclusion at this point is that if you can afford it, both in terms of privacy/GDPR and dollarinos then managed is the way to go.

And I agree. Kubernetes managed is also really hard for those of offering it and have to manage it for you behind the scenes.[0]

[0]. https://blog.dave.tf/post/new-kubernetes/

stavros2y ago

I was of the same opinion, so I rolled my own thin layer over Compose:

https://harbormaster.readthedocs.io/

GordonS2y ago

This looks really nice, but the main feature of Docker Swarm rather than, Docker Compose, is the ability to run on a cluster of servers, not just a single node.

1 more reply

MathiasPiusOP2y ago· 11 in thread

I recently rebuilt my Kubernetes cluster running across three dedicated servers hosted by Hetzner and decided to document the process. It turned into a (so far) 8-part series covering everything from bootstrapping and firewalls to setting up persistent storage with Ceph.

Part I: Talos on Hetzner https://datavirke.dk/posts/bare-metal-kubernetes-part-1-talo...

Part II: Cilium CNI & Firewalls https://datavirke.dk/posts/bare-metal-kubernetes-part-2-cili...

Part III: Encrypted GitOps with FluxCD https://datavirke.dk/posts/bare-metal-kubernetes-part-3-encr...

Part IV: Ingress, DNS and Certificates https://datavirke.dk/posts/bare-metal-kubernetes-part-4-ingr...

Part V: Scaling Out https://datavirke.dk/posts/bare-metal-kubernetes-part-5-scal...

Part VI: Persistent Storage with Rook Ceph https://datavirke.dk/posts/bare-metal-kubernetes-part-6-pers...

Part VII: Private Registry with Harbor https://datavirke.dk/posts/bare-metal-kubernetes-part-7-priv...

Part VIII: Containerizing our Work Environment https://datavirke.dk/posts/bare-metal-kubernetes-part-8-cont...

And of course, when it all falls apart: Bare-metal Kubernetes: First Incident https://datavirke.dk/posts/bare-metal-kubernetes-first-incid...

Source code repository (set up in Part III) for node configuration and deployed services is available at https://github.com/MathiasPius/kronform

While the documentation was initially intended more as a future reference for myself as well as a log of decisions made, and why I made them, I've received some really good feedback and ideas already, and figured it might be interesting to the hacker community :)

cjr2y ago

Great write up and what I especially enjoyed was how you kept the bits where you ran into the classic sort of issues, diagnosed them and fixed them. The flow felt very familiar to whenever I do anything dev-opsy.

I’d be interested to read about how you might configure cluster auto scaling with bare metal machines. I noticed that the IP address of each node are kinda hard-coded into firewall and network policy rules, so that would have to be automated somehow. Similarly with automatically spawning a load-balancer from declaring a k8s Service. I realise these things are very cloud provider specific but would be interested to see if any folks are doing this with bare metal. For me, the ease of autoscaling is one of the primary benefits of k8s for my specific workload.

I also just read about Sidero Omni [1] from the makers of Talos which looks like a Saas to install Talos/Kubernetes across any kind of hardware sourced from pretty much any provider — cloud VM, bare metal etc. Perhaps it could make the initial bootstrap phase and future upgrades to these parts a little easier?

[1]: https://www.siderolabs.com/platform/saas-for-kubernetes/

MathiasPiusOP2y ago

When it comes to load balancing, I think the hcloud-cloud-controller-manager[1] is probably your best bet, and although I haven't tested it, I'm sure it can be coerced into some kind of working configuration with the vSwitch/Cloud Network coupling, even if none of cluster nodes are actually Cloud-based.

I haven't used Sidero Omni yet, but if it's as well architected as Talos is, I'm sure it's an excellent solution. It still leaves open the question of ordering and provisioning the servers themselves. For simpler use-cases it wouldn't be too difficult to hack together a script to interact with the Hetzner Robot API to achieve this goal, but if I wanted any level of robustness, and if you'll excuse the shameless plug, I think I'd write a custom operator in Rust using my hrobot-rs[2] library :)

As far as the hard-coded IP addresses goes, I think I would simply move that one rule into a separate ClusterWideNetworkPolicy which is created per-node during onboarding and deleted again after. The hard-coded IP addresses are only used before the node is joined to the cluster, so technically the rule becomes obsoleted by the generic "remote-node" one immediately after joining the cluster.[3]

[1] https://github.com/hetznercloud/hcloud-cloud-controller-mana...

[2] https://github.com/MathiasPius/hrobot-rs

[3] https://github.com/MathiasPius/kronform/blob/main/manifests/...

smartbit2y ago

Have you tried KubeOne? Also with the benefits of machine-deployments. Works like a charm, didn’t go through your blogs, but KubeOne on Hetzner [0] seems easier than your deployment. And yes, also Open Source and German support available.

[0] https://docs.kubermatic.com/kubeone/main/architecture/suppor...

MathiasPiusOP2y ago

Hetzner Cloud is officially supported, but that means setting up VPSs in Hetzner's Cloud offering, whereas this project was intended as a more or less independent pure bare-metal cluster. I see they offer Bare Metal support as well, but I haven't dived too deep into it.

I haven't used KubeOne, but I have previously used Syself's https://github.com/syself/cluster-api-provider-hetzner which I believe works in a similar fashion. I think the approach is very interesting and plays right into the Kubernetes Operator playbook and its self-healing ambitions.

That being said, the complexity of the approach, probably in trying to span and resolve inconsistencies across such a wide landscape of providers, caused me quite a bit of grief. I eventually abandoned this approach after having some operator somewhere consistently attempt and fail to spin up a secondary control plane VPS against my wishes. After poring over loads of documentation and half a dozen CRDs in an attempt to resolve it, I threw in my hat.

Of course, Kubermatic is not Syself, and this was about a year ago, so it is entirely possible that both projects are absolutely superb solutions to the problem at this point.

baz002y ago

Ah man just looking at that list makes me glad for EKS. But thanks for the effort, I will read to learn more.

msm_2y ago

If you ever want to have fun with setting up your own k8s, I recommend to start small. The author is already knowledgeable, so they probably knew from the start what they want, but a lot of this complexity is not essential.

When I deployed my first kubernetes "cluster", I just spinned a single-node "cluster" using kubeadm (today k3s is an option too) and started deploying services (with no distributed storage - everything stored using hostPath). You only need to know kubernetes basics to do this. Then you probably want to configure CNI (I recommend flannel when starting, later cilium), spin an ingress controller (I recommend nginx or traefik), deploy cert-manager (this was hard for me when I started) and you can go a long way. With time I scaled up, decided to use GitOps, and deployed many more services (including my own registry - I started with docker's own, then migrated to Gitea. Harbor is too heavy for me). And of course over time you add monitoring, alerting etc - the fun never ends (but it's all optional, you should to decide when is the right time).

MathiasPiusOP2y ago

Absolutely! If at all possible, go managed, preferably with a cloud provider that handles all the hard things for you like load balancing and so on.

*Sometimes* however, you want or need full control, either for compliance or economic reasons, and that's what I set out to explore :)

js4ever2y ago

Agreed, this is probably the best ad for managed k8s, this and horrors stories about self managed k8s clusters falling appart.

ralala2y ago

Interesting read. I have just setup a very similar cluster this week: 3 node bare metal cluster in a 10G mesh network. Decided for Debian, RKE2, Calico and Longhorn. Encryption is done using LUKS FDE. For Load Balancing I am using the HCloud Load Balancer (in TCP mode). At first I had some problems with the mesh network as the CNI would only bind to a single interface. Finally solved it using a bridge, veth and isolated ports.

fireflash382y ago

Using containerd I assume? I've been trying to get RKE2 or k3s play nicely with CRI-O and it's been a long exercise in frustration.

1 more reply

AndrewKemendo2y ago

Thank you for the amazing write up!

sureglymop2y ago· 4 in thread

Here's what I don't really get.. So, let's say you have three hosts and create your cluster. But now, you still need a reverse proxy or load balancer in front right? I mean not inside the cluster but to route requests to nodes of the cluster that are not currently down. So you could set up something like HAProxy on another host. But now you once again have a single point of failure. So do you replicate that part also and use DNS to make sure one of the reverse proxies is used? Maybe I'm just misunderstanding how it works but multiple nodes in a cluster still need some sort of central entry point right? So what is the correct way to do this.

MathiasPiusOP2y ago

My solution for this setup is having ingress controllers on all three nodes, and then specifying all three IPs in all DNS records. That way the end user will "load balance" based on the DNS randomization.

Of course, if a node goes down, a third of the traffic will be lost, but with low TTLs and some planning, you can minimoze the impact of this.

sureglymop2y ago

It's an interesting approach. I did it a bit differently. I set up three Proxmox nodes on three hetzner servers. Then I deployed virtual routers. I then set up HAProxy and k3s nodes as LXC containers. What's nice about the whole setup is that a proxmox node can go down and it all still works. I will now set up keepalived as mentioned in the other reply so the HAProxies will also be fully HA. Proxmox also works well with zfs and backups. I set up the proxmox nodes manually and did the rest with terraform + ansible. One `terraform destroy` cleans up everything nicely. I wonder how the performance difference is between bare metal and k8s node in LXC.

ralgozino2y ago

You almost answered your own question. One common solution is to have 2 nodes with haproxy (or similar) sharing a virtual IP with keepalived that load balance de traffic to the control plane nodes and to the nodes where your ingress controller runs.

There are other options, like running the haproxy in the control plane nodes.

sureglymop2y ago

Thank you, this was very helpful! I read up on keepalived and the used protocols now!

dhess2y ago· 3 in thread

What performance numbers are you seeing on pods with Ceph PVs? e.g., what does `rados bench` give?

MathiasPiusOP2y ago

I rand rados benchmarks and it seems writes are about 74MB/s, whereas both random and sequential reads are running at about 130MB/s, which is about wire speed given the 1Gbit/s NICs.

Complete results are here: https://gist.github.com/MathiasPius/cda8ae32ebab031deb054054...

dhess2y ago

Thanks!

MathiasPiusOP2y ago

I haven't had an excuse to test it yet, but since it's only 6 OSDs across 3 nodes and all of them are spinning rust, I'd be surprised if performance was amazing.

I'm definitely curious to find out though, so I'll run some tests and get back to you!

mythz2y ago· 2 in thread

Thankfully we've never had the need for such complexity and are happy with our current GitHub Actions > Docker Compose > GCR > SSH solution [1] we're using to deploy 50+ Docker Containers.

Requires no infrastructure dependencies, stateless deployment scripts checked into the same Repo as Project and after GitHub Organization is setup (4 secrets) and deployment server has Docker compose + nginx-proxy installed, deploying an App only requires 1 GitHub Action Secret, as such it doesn't get any simpler for us and we'll look to continue to use this approach for as long as we can.

[1] https://servicestack.net/posts/kubernetes_not_required

seabrookmx2y ago

I used to do something similar at a previous company and this works well if you don't have to worry about scaling. YAGNI principal and all that. When you run hundreds of containers for different workloads, k8s bin packing and autoscaling (both on the pod and node level) tips the balance in my experience.

mythz2y ago

Yeah if we ever need to autoscale then I can see Kubernetes being useful, but I'd be surprised if this a problem most companies face.

Even when working at StackOverflow (serving 1B+ pages, 55TB /mo [1]) did we need any autoscaling solution, it ran great on a handful of fixed servers. Although they were fairly beefy bare metal servers which I'd suspect would require significantly more VMs if it was to run on the Cloud.

[1] https://stackexchange.com/performance

1 more reply

xelxebar2y ago· 2 in thread

Speaking of k8s, anyone here know of ready-made solutions for getting XCode (i.e. xcodebuild) running in pods? As far as I'm aware, there are no good solutions for getting XCode running on Linux, so at the moment I'm just futzing about with a virtual-kubelet[0] implementation that spawns MacOS VMs. This works just fine, but the problem seems like such an obvious one that I expect there to be some existing solution(s) I just missed.

[0]:https://github.com/virtual-kubelet/virtual-kubelet/

yjftsjthsd-h2y ago

https://blog.darlinghq.org/2023/08/21/progress-report-q2-202... talks about running darling in flatpak, so it's not too much of a stretch to imagine it in a pod someday, but I don't think it's there today.

doctorpangloss2y ago

There are no good ready made solutions.

Someone has submitted patches to containerd and authored “rund” (d for darwin) to run HostProcess containers on macOS.

The underlying problem is poorly familiarity with Kubernetes on Windows among Kubernetes maintainers and users. Windows is where all similar problems have been solved, but the journey is long.

wiktor-k2y ago· 2 in thread

Very nice write-up!

I wonder if it's possible to combine the custom ISO with cloud init [0] to automate the initial node installation?

[0]: https://github.com/tech-otaku/hetzner-cloud-init

MathiasPiusOP2y ago

I believe the recommended[1] way to deploy Talos to Hetzner Cloud (not bare metal) is to use the rescue system and Hashicorp Packer to upload the Talos ISO, deploying your VPS using this image, and then configuring Talos using the standard bootstrapping procedure.

This post series is specifically aimed at deploying a pure-metal cluster.

[1] https://www.talos.dev/v1.5/talos-guides/install/cloud-platfo...

wiktor-k2y ago

Ah, I see. Thanks for the explanation!

InvaderFizz2y ago· 1 in thread

I'm going through you series now. Very well done.

I thought I would mention that age is now built in to SOPS, thus needs no external dependencies and is faster and easier than gpg.

MathiasPiusOP2y ago

Have seen age pop up here and there, but haven't spent the cycles to see where it fits in yet, so I just went with what I knew.

Will definitely take a look though, thanks!

lemper2y ago· 1 in thread

I thought it was about talos the power9 system. intrigued by kubernetes on them.

zkirill2y ago

Me too. That would be very cool and I'm surprised nobody is offering this as a service.

CoolCold2y ago

> Ceph is designed to host truly massive amounts of data, and generally becomes safer and more performant the more nodes and disks you have to spread your data across.

I'm very pessimistic on CEPH usage in the scenario you have - may be I've missed it, but seen nothing about upgrading networking, as by default you gonna have 1Gbit on single interface used for public network/internal vSwitch.

Even by your benchmarks, write test is 19 iops (block size is huge though)

    Max bandwidth (MB/sec): 92
    Min bandwidth (MB/sec): 40
    Average IOPS:           19
    Stddev IOPS:            2.62722
    Max IOPS:               23
    Min IOPS:               10

while single HDD drive would give ~ 120 iops. single 3 years old NVMe datacenter edition, gives ~ 33000 iops with 4k block + fdatasync=1

CEPH would be very limiting factor in 1Gbit networking I believe - I'd put clear disclaimer on that for fellow sysadmins.

P.S. The amount of work you done is huge and appreciated.

dave-at-koor2y ago

Great post. We (Koor) have been going through something similar to create a demo environment for Rook-Ceph. In our case, we want to show different types of data storage (block, object, file) in a production-like system, albeit at the smaller end of scale.

Our system is hosted at Hetzner on Ubuntu. KubeOne does the provisioning, backed by Terraform. We are using Calico for networking, and we have our own Rook operator.

What would have made the Rook-Ceph experience better for you?

mulmen2y ago

Just finished reading part one and wow, what an excellently written and presented post. This is exactly the series I needed to get started with Kubernetes in earnest. It’s like it was written for me personally. Thanks for the submission MathiasPius!

mkagenius2y ago

From this, if people get the idea that they should get a Bare Metal on Hetzner and try. Don't. They will reject you probably, they are very picky.

And if you are from a developing country like India, don't even think about it.

j / k navigate · click thread line to collapse

77 comments

59 comments · 13 top-level

wg02y ago· 20 in thread

Docker swarm is to Kubernetes what SQLite is to PostgreSQL. To some extent.

husarcik2y ago

The docker swarm ecosystem is very poor as far as tooling goes. You're better off using docker-compose (? maybe docker swarm) and then migrating to k3s if you need a cluster.

My docker swarm config files are nearly the same craziness as my k3s config files so I figured I might as well benefit from the tooling in Kubernetes.

Edit for more random thoughts: being able to use helm to deploy services helped me switch to k3s from swarm.

amazingman2y ago

1 more reply

doctorpangloss2y ago

Any sufficiently complicated Docker Swarm, Heroku, Elastic Beanstalk, Nomad or other program contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of vanilla Kubernetes.

wg02y ago

Most smaller teams do not need a full fledge kubernetes anyways.

There's no one size fits all approach. There are trade offs. The Kubernetes tractor needs lots of oiling and what not for all the bells and whistles.

Trade offs is the keyword here.

ori_b2y ago

Unfortunately, the above statement also applies to kubernetes.

2 more replies

blowski2y ago

I agree in part - the features and simplicity of Docker Swarm are very appealing over k8s, but it also feels like so neglected that I'd be waiting every day for the EOL announcement.

wg02y ago

It's built from another separate project called swarm-kit. So if it comes to that where it is abandoned, the forks would be out in the wild soon enough.

I see more risk of docker engine as a whole pulling some terraform/elastic search licensing someday as investors get desperate to cash out.

1 more reply

KronisLV2y ago

Personally, I'd also consider throwing Portainer in there, which gives you both a nice way to interact with the cluster, as well as things like webhooks: https://www.portainer.io/

bionsystem2y ago

hn_user821792y ago

linuxdude3142y ago

Node count driven infrastructure decisions make little sense.

A better approach is to translate business requirements to systems capabilities and evaluate which tool best satisfies those requirements given the other constraints within your organization.

Managed Kubernetes solutions like GKE require pretty minimal operational overhead at this point.

nyljasdfw3422y ago

amount of nodes is a poor position to take... it should be the features and requirements you need for the cluster.

If Docker Swarm satisfies, then yes.

riku_iki2y ago

> Docker swarm is to Kubernetes what SQLite is to PostgreSQL. To some extent.

curious what do you mean? To me Postgresql doesn't have disadvantages over SQLite, everything is just better..

mulmen2y ago

1 more reply

Patrickmi2y ago

MathiasPiusOP2y ago

vbezhenar2y ago

I didn’t try anything but kubeadm and it worked just fine for me for my 1 node cluster.

wg02y ago

That's where the author also has following to say:

>My conclusion at this point is that if you can afford it, both in terms of privacy/GDPR and dollarinos then managed is the way to go.

And I agree. Kubernetes managed is also really hard for those of offering it and have to manage it for you behind the scenes.[0]

[0]. https://blog.dave.tf/post/new-kubernetes/

stavros2y ago

I was of the same opinion, so I rolled my own thin layer over Compose:

https://harbormaster.readthedocs.io/

GordonS2y ago

This looks really nice, but the main feature of Docker Swarm rather than, Docker Compose, is the ability to run on a cluster of servers, not just a single node.

1 more reply

MathiasPiusOP2y ago· 11 in thread

Part I: Talos on Hetzner https://datavirke.dk/posts/bare-metal-kubernetes-part-1-talo...

Part II: Cilium CNI & Firewalls https://datavirke.dk/posts/bare-metal-kubernetes-part-2-cili...

Part III: Encrypted GitOps with FluxCD https://datavirke.dk/posts/bare-metal-kubernetes-part-3-encr...

Part IV: Ingress, DNS and Certificates https://datavirke.dk/posts/bare-metal-kubernetes-part-4-ingr...

Part V: Scaling Out https://datavirke.dk/posts/bare-metal-kubernetes-part-5-scal...

Part VI: Persistent Storage with Rook Ceph https://datavirke.dk/posts/bare-metal-kubernetes-part-6-pers...

Part VII: Private Registry with Harbor https://datavirke.dk/posts/bare-metal-kubernetes-part-7-priv...

Part VIII: Containerizing our Work Environment https://datavirke.dk/posts/bare-metal-kubernetes-part-8-cont...

And of course, when it all falls apart: Bare-metal Kubernetes: First Incident https://datavirke.dk/posts/bare-metal-kubernetes-first-incid...

Source code repository (set up in Part III) for node configuration and deployed services is available at https://github.com/MathiasPius/kronform

cjr2y ago

[1]: https://www.siderolabs.com/platform/saas-for-kubernetes/

MathiasPiusOP2y ago

[1] https://github.com/hetznercloud/hcloud-cloud-controller-mana...

[2] https://github.com/MathiasPius/hrobot-rs

[3] https://github.com/MathiasPius/kronform/blob/main/manifests/...

smartbit2y ago

[0] https://docs.kubermatic.com/kubeone/main/architecture/suppor...

MathiasPiusOP2y ago

Of course, Kubermatic is not Syself, and this was about a year ago, so it is entirely possible that both projects are absolutely superb solutions to the problem at this point.

baz002y ago

Ah man just looking at that list makes me glad for EKS. But thanks for the effort, I will read to learn more.

msm_2y ago

MathiasPiusOP2y ago

Absolutely! If at all possible, go managed, preferably with a cloud provider that handles all the hard things for you like load balancing and so on.

*Sometimes* however, you want or need full control, either for compliance or economic reasons, and that's what I set out to explore :)

js4ever2y ago

Agreed, this is probably the best ad for managed k8s, this and horrors stories about self managed k8s clusters falling appart.

ralala2y ago

fireflash382y ago

Using containerd I assume? I've been trying to get RKE2 or k3s play nicely with CRI-O and it's been a long exercise in frustration.

1 more reply

AndrewKemendo2y ago

Thank you for the amazing write up!

sureglymop2y ago· 4 in thread

MathiasPiusOP2y ago

Of course, if a node goes down, a third of the traffic will be lost, but with low TTLs and some planning, you can minimoze the impact of this.

sureglymop2y ago

ralgozino2y ago

There are other options, like running the haproxy in the control plane nodes.

sureglymop2y ago

Thank you, this was very helpful! I read up on keepalived and the used protocols now!

dhess2y ago· 3 in thread

What performance numbers are you seeing on pods with Ceph PVs? e.g., what does `rados bench` give?

MathiasPiusOP2y ago

I rand rados benchmarks and it seems writes are about 74MB/s, whereas both random and sequential reads are running at about 130MB/s, which is about wire speed given the 1Gbit/s NICs.

Complete results are here: https://gist.github.com/MathiasPius/cda8ae32ebab031deb054054...

dhess2y ago

Thanks!

MathiasPiusOP2y ago

I haven't had an excuse to test it yet, but since it's only 6 OSDs across 3 nodes and all of them are spinning rust, I'd be surprised if performance was amazing.

I'm definitely curious to find out though, so I'll run some tests and get back to you!

mythz2y ago· 2 in thread

Thankfully we've never had the need for such complexity and are happy with our current GitHub Actions > Docker Compose > GCR > SSH solution [1] we're using to deploy 50+ Docker Containers.

[1] https://servicestack.net/posts/kubernetes_not_required

seabrookmx2y ago

mythz2y ago

Yeah if we ever need to autoscale then I can see Kubernetes being useful, but I'd be surprised if this a problem most companies face.

[1] https://stackexchange.com/performance

1 more reply

xelxebar2y ago· 2 in thread

[0]:https://github.com/virtual-kubelet/virtual-kubelet/

yjftsjthsd-h2y ago

doctorpangloss2y ago

There are no good ready made solutions.

Someone has submitted patches to containerd and authored “rund” (d for darwin) to run HostProcess containers on macOS.

The underlying problem is poorly familiarity with Kubernetes on Windows among Kubernetes maintainers and users. Windows is where all similar problems have been solved, but the journey is long.

wiktor-k2y ago· 2 in thread

Very nice write-up!

I wonder if it's possible to combine the custom ISO with cloud init [0] to automate the initial node installation?

[0]: https://github.com/tech-otaku/hetzner-cloud-init

MathiasPiusOP2y ago

This post series is specifically aimed at deploying a pure-metal cluster.

[1] https://www.talos.dev/v1.5/talos-guides/install/cloud-platfo...

wiktor-k2y ago

Ah, I see. Thanks for the explanation!

InvaderFizz2y ago· 1 in thread

I'm going through you series now. Very well done.

I thought I would mention that age is now built in to SOPS, thus needs no external dependencies and is faster and easier than gpg.

MathiasPiusOP2y ago

Have seen age pop up here and there, but haven't spent the cycles to see where it fits in yet, so I just went with what I knew.

Will definitely take a look though, thanks!

lemper2y ago· 1 in thread

I thought it was about talos the power9 system. intrigued by kubernetes on them.

zkirill2y ago

Me too. That would be very cool and I'm surprised nobody is offering this as a service.

CoolCold2y ago

> Ceph is designed to host truly massive amounts of data, and generally becomes safer and more performant the more nodes and disks you have to spread your data across.

Even by your benchmarks, write test is 19 iops (block size is huge though)

    Max bandwidth (MB/sec): 92
    Min bandwidth (MB/sec): 40
    Average IOPS:           19
    Stddev IOPS:            2.62722
    Max IOPS:               23
    Min IOPS:               10

while single HDD drive would give ~ 120 iops. single 3 years old NVMe datacenter edition, gives ~ 33000 iops with 4k block + fdatasync=1

CEPH would be very limiting factor in 1Gbit networking I believe - I'd put clear disclaimer on that for fellow sysadmins.

P.S. The amount of work you done is huge and appreciated.

dave-at-koor2y ago

Our system is hosted at Hetzner on Ubuntu. KubeOne does the provisioning, backed by Terraform. We are using Calico for networking, and we have our own Rook operator.

What would have made the Rook-Ceph experience better for you?

mulmen2y ago

mkagenius2y ago

From this, if people get the idea that they should get a Bare Metal on Hetzner and try. Don't. They will reject you probably, they are very picky.

And if you are from a developing country like India, don't even think about it.

j / k navigate · click thread line to collapse