Docker swarm is to Kubernetes what SQLite is to PostgreSQL. To some extent.
My docker swarm config files are nearly the same craziness as my k3s config files so I figured I might as well benefit from the tooling in Kubernetes.
Edit for more random thoughts: being able to use helm to deploy services helped me switch to k3s from swarm.
IMO a sufficiently advanced Docker Compose stack is not appreciably simpler than the Kubernetes manifests would be, and you don't get the benefits of Kubernetes' objects and their controllers because Docker Compose is basically just stringing low-level concepts together with light automation.
There's no one size fits all approach. There are trade offs. The Kubernetes tractor needs lots of oiling and what not for all the bells and whistles.
Trade offs is the keyword here.
I see more risk of docker engine as a whole pulling some terraform/elastic search licensing someday as investors get desperate to cash out.
Personally, I'd also consider throwing Portainer in there, which gives you both a nice way to interact with the cluster, as well as things like webhooks: https://www.portainer.io/
With something like Apache, Nginx, Caddy or something else acting as your "ingress" (taking care of TLS, reverse proxy, headers, rate limits, sometimes mTLS etc.) it's a surprisingly simple setup, at least for simple architectures.
If/when you need to look past that, K3s is probably worth a look, as some other comments pointed out. Maybe some other of Rancher's offerings as well, depending on how you like to interact with clusters (the K9s tool is nice too).
A better approach is to translate business requirements to systems capabilities and evaluate which tool best satisfies those requirements given the other constraints within your organization.
Managed Kubernetes solutions like GKE require pretty minimal operational overhead at this point.
If Docker Swarm satisfies, then yes.
curious what do you mean? To me Postgresql doesn't have disadvantages over SQLite, everything is just better..
That's where the author also has following to say:
>My conclusion at this point is that if you can afford it, both in terms of privacy/GDPR and dollarinos then managed is the way to go.
And I agree. Kubernetes managed is also really hard for those of offering it and have to manage it for you behind the scenes.[0]
Part I: Talos on Hetzner https://datavirke.dk/posts/bare-metal-kubernetes-part-1-talo...
Part II: Cilium CNI & Firewalls https://datavirke.dk/posts/bare-metal-kubernetes-part-2-cili...
Part III: Encrypted GitOps with FluxCD https://datavirke.dk/posts/bare-metal-kubernetes-part-3-encr...
Part IV: Ingress, DNS and Certificates https://datavirke.dk/posts/bare-metal-kubernetes-part-4-ingr...
Part V: Scaling Out https://datavirke.dk/posts/bare-metal-kubernetes-part-5-scal...
Part VI: Persistent Storage with Rook Ceph https://datavirke.dk/posts/bare-metal-kubernetes-part-6-pers...
Part VII: Private Registry with Harbor https://datavirke.dk/posts/bare-metal-kubernetes-part-7-priv...
Part VIII: Containerizing our Work Environment https://datavirke.dk/posts/bare-metal-kubernetes-part-8-cont...
And of course, when it all falls apart: Bare-metal Kubernetes: First Incident https://datavirke.dk/posts/bare-metal-kubernetes-first-incid...
Source code repository (set up in Part III) for node configuration and deployed services is available at https://github.com/MathiasPius/kronform
While the documentation was initially intended more as a future reference for myself as well as a log of decisions made, and why I made them, I've received some really good feedback and ideas already, and figured it might be interesting to the hacker community :)
I’d be interested to read about how you might configure cluster auto scaling with bare metal machines. I noticed that the IP address of each node are kinda hard-coded into firewall and network policy rules, so that would have to be automated somehow. Similarly with automatically spawning a load-balancer from declaring a k8s Service. I realise these things are very cloud provider specific but would be interested to see if any folks are doing this with bare metal. For me, the ease of autoscaling is one of the primary benefits of k8s for my specific workload.
I also just read about Sidero Omni [1] from the makers of Talos which looks like a Saas to install Talos/Kubernetes across any kind of hardware sourced from pretty much any provider — cloud VM, bare metal etc. Perhaps it could make the initial bootstrap phase and future upgrades to these parts a little easier?
[1]: https://www.siderolabs.com/platform/saas-for-kubernetes/
I haven't used Sidero Omni yet, but if it's as well architected as Talos is, I'm sure it's an excellent solution. It still leaves open the question of ordering and provisioning the servers themselves. For simpler use-cases it wouldn't be too difficult to hack together a script to interact with the Hetzner Robot API to achieve this goal, but if I wanted any level of robustness, and if you'll excuse the shameless plug, I think I'd write a custom operator in Rust using my hrobot-rs[2] library :)
As far as the hard-coded IP addresses goes, I think I would simply move that one rule into a separate ClusterWideNetworkPolicy which is created per-node during onboarding and deleted again after. The hard-coded IP addresses are only used before the node is joined to the cluster, so technically the rule becomes obsoleted by the generic "remote-node" one immediately after joining the cluster.[3]
[1] https://github.com/hetznercloud/hcloud-cloud-controller-mana...
[2] https://github.com/MathiasPius/hrobot-rs
[3] https://github.com/MathiasPius/kronform/blob/main/manifests/...
[0] https://docs.kubermatic.com/kubeone/main/architecture/suppor...
I haven't used KubeOne, but I have previously used Syself's https://github.com/syself/cluster-api-provider-hetzner which I believe works in a similar fashion. I think the approach is very interesting and plays right into the Kubernetes Operator playbook and its self-healing ambitions.
That being said, the complexity of the approach, probably in trying to span and resolve inconsistencies across such a wide landscape of providers, caused me quite a bit of grief. I eventually abandoned this approach after having some operator somewhere consistently attempt and fail to spin up a secondary control plane VPS against my wishes. After poring over loads of documentation and half a dozen CRDs in an attempt to resolve it, I threw in my hat.
Of course, Kubermatic is not Syself, and this was about a year ago, so it is entirely possible that both projects are absolutely superb solutions to the problem at this point.
When I deployed my first kubernetes "cluster", I just spinned a single-node "cluster" using kubeadm (today k3s is an option too) and started deploying services (with no distributed storage - everything stored using hostPath). You only need to know kubernetes basics to do this. Then you probably want to configure CNI (I recommend flannel when starting, later cilium), spin an ingress controller (I recommend nginx or traefik), deploy cert-manager (this was hard for me when I started) and you can go a long way. With time I scaled up, decided to use GitOps, and deployed many more services (including my own registry - I started with docker's own, then migrated to Gitea. Harbor is too heavy for me). And of course over time you add monitoring, alerting etc - the fun never ends (but it's all optional, you should to decide when is the right time).
*Sometimes* however, you want or need full control, either for compliance or economic reasons, and that's what I set out to explore :)
Of course, if a node goes down, a third of the traffic will be lost, but with low TTLs and some planning, you can minimoze the impact of this.
There are other options, like running the haproxy in the control plane nodes.
Complete results are here: https://gist.github.com/MathiasPius/cda8ae32ebab031deb054054...
I'm definitely curious to find out though, so I'll run some tests and get back to you!
Requires no infrastructure dependencies, stateless deployment scripts checked into the same Repo as Project and after GitHub Organization is setup (4 secrets) and deployment server has Docker compose + nginx-proxy installed, deploying an App only requires 1 GitHub Action Secret, as such it doesn't get any simpler for us and we'll look to continue to use this approach for as long as we can.
Even when working at StackOverflow (serving 1B+ pages, 55TB /mo [1]) did we need any autoscaling solution, it ran great on a handful of fixed servers. Although they were fairly beefy bare metal servers which I'd suspect would require significantly more VMs if it was to run on the Cloud.
Someone has submitted patches to containerd and authored “rund” (d for darwin) to run HostProcess containers on macOS.
The underlying problem is poorly familiarity with Kubernetes on Windows among Kubernetes maintainers and users. Windows is where all similar problems have been solved, but the journey is long.
I wonder if it's possible to combine the custom ISO with cloud init [0] to automate the initial node installation?
This post series is specifically aimed at deploying a pure-metal cluster.
[1] https://www.talos.dev/v1.5/talos-guides/install/cloud-platfo...
I thought I would mention that age is now built in to SOPS, thus needs no external dependencies and is faster and easier than gpg.
Will definitely take a look though, thanks!
I'm very pessimistic on CEPH usage in the scenario you have - may be I've missed it, but seen nothing about upgrading networking, as by default you gonna have 1Gbit on single interface used for public network/internal vSwitch.
Even by your benchmarks, write test is 19 iops (block size is huge though)
Max bandwidth (MB/sec): 92
Min bandwidth (MB/sec): 40
Average IOPS: 19
Stddev IOPS: 2.62722
Max IOPS: 23
Min IOPS: 10
while single HDD drive would give ~ 120 iops. single 3 years old NVMe datacenter edition, gives ~ 33000 iops with 4k block + fdatasync=1CEPH would be very limiting factor in 1Gbit networking I believe - I'd put clear disclaimer on that for fellow sysadmins.
P.S. The amount of work you done is huge and appreciated.
Our system is hosted at Hetzner on Ubuntu. KubeOne does the provisioning, backed by Terraform. We are using Calico for networking, and we have our own Rook operator.
What would have made the Rook-Ceph experience better for you?
And if you are from a developing country like India, don't even think about it.