Talos: OS for Kubernetes (opens in new tab)

(talos-systems.com)

108 pointsstevetodd6y ago78 comments

78 comments

69 comments · 15 top-level

andrewrynhard6y ago· 25 in thread

Hey folks, Talos creator here. Happy to answer any questions you guys may have. Sounds like some confusion about exactly what Talos is. A lot of good feedback here that we will take and improve our documentation.

Talos is a Linux distribution built specifically for Kubernetes. The short version is that we have stripped out absolutely everything that is not required to make a machine a Kubernetes node, including SSH and console access (I will explain why).

Here goes the long version. We have done a number of things to improve security, including a read-only filesystem, except for what the Kubelet needs (/var/lib/kubelet, /etc/cni, etc.). It runs entirely in RAM from a Squashfs, and only Kubernetes makes use of a disk. We have stripped SSH/Console access and added a gRPC API that gives engineers the ability to debug and remediate issues.

We didn't just stop at this. We are writing everything, including the init system, in Golang, which allows us to integrate deeply with Kubernetes. Everything about Talos is API driven.

Some of the highlights include:

- SSH/console access replaced with gRPC API that is secured via mutual TLS.

- Immutable. Immutability prevents drift, making the cluster consistent across the board.

- Automated upgrades that can be orchestrated in an intelligent way. By using Kubernetes events, and our API, we can roll out upgrades from an operator (currently a WIP and planned for release in 0.3 Talos) and do safe in a safe manner.

- Cluster API (CAPI) integration that allows rapid creation of Kubernetes clusters using Kubernetes style declarative YAML.

- Support for AWS, GCP, Azure, Packet, vSphere, Bare Metal, and Docker. The experience for each is consistent, making it easy to reason about Talos regardless of where you run it.

- CIS and KSPP security configuration enforcements.

- Keeping current by supporting the latest and greatest version of Kubernetes, while writing upgrade paths into the system.

- Support for local Docker based clusters, easily created using our CLI. This is super useful for creating CI pipelines where you might want to run integration tests against the same Talos/Kubernetes versions running in production.

- Installs and upgrades are performed via containers.

We feel that by removing SSH/console, making the core of Talos read-only, and treating the nodes as ephemeral machines, we are creating a much more secure way to run Kubernetes. A really good talk was given on these ideas at Blackhat this year: https://swagitda.com/speaking/us-19-Shortridge-Forsgren-Cont.... We feel we align with the recommendations made there.

In addition to security, we envision a system that will be self-healing and intelligent. By having an API and integrating with Kubernetes, the sky is really the limit on the tooling we can build to create this self-healing system.

Our goal with Talos is to allow engineers to more or less forget about each individual node. Managing the OS alongside Kubernetes is a lot of work.

I will address the questions and comments as replies. Feel free to ask more as a reply to this comment.

Feel free to join our meetings every Monday and Thursday at 17:00 UTC on https://zoom.us/j/3595189922. Also, join our slack and I'd be more than happy to talk some more about Talos! https://slack.dev.talos-systems.io

willglynn6y ago

One thing which I would need to switch from CoreOS to Talos is GPU drivers. My current setup uses the NVIDIA driver containers:

https://hub.docker.com/r/nvidia/driver

I build slightly customized images using a process derived from the one in the NVIDIA repo:

https://gitlab.com/nvidia/container-images/driver/blob/maste... https://gitlab.com/nvidia/container-images/driver/blob/maste...

The automation here is predicated on CoreOS distributing matching { kernel, headers, toolchain } artifacts for each release, and in particular how specific OS releases get promoted from the alpha -> beta -> stable channels without modification. This lets me build new drivers automatically for each alpha release, validate the drivers on the beta channel, and have no surprises on the stable channel. Does Talos intend to do something similar?