1
Ask HN: How do you organize a platform team?
I am a new manager of a large team (12 SREs) that are taking care of the Kubernetes platform in my company. This team is responsible for the provisioning pipelines (for both baremetal and AWS - no EKS is used), the Kubernetes controllers to integrate with other custom services, the observability stack, etc. The total fleet in use is around 6000 baremetal nodes and 1000 VMs in AWS spread over various DCs and regions. There are over 1500 developers actively using the Kubernetes clusters every day for a total of 2500 applications running in production.
The team spends a lot of time in operations as well as solving compliancy issues, vulnerability patching and customer support. The struggle I'm having is "how to drive focus" and avoid to die of operations. The team is large making the Scrum process ineffective. Every time I try to define teams and to split the people I realise that everything on the platform is so interconnected that the moment I would create 2 or 3 separate teams they would start being on top of each other.
What would you recommend to do?