The way we did it is that we registered to a centralised “name” service which was an in memory database.
Failure to register to the name service leads to registration towards the next name service in the list, we ran three- so there were two spares, the library would then keep trying to register to the first name service, this would be our reconciliation loop.
When you request a service you request “traits” through the name service.
Not “trivial” but even though it was implemented in C++ on windows it was much less complicated than kubernetes discovery systems, which there are many ways to integrate (env-var, dns, api) and some should or must be disabled for security reasons (eg. The environment variables that display all services) or can change depending on RBAC and service account.
We deployed about 300,000 or so services with this infrastructure and it’s been running in production since 2015.
(Original product release was supposed to be 2013 but due to the client changes it was delayed, not due to the backend. This is Video Games unfortunately).
I’m not like, “hard” on these opinions, but people talk about kubernetes as if it’s the only way of solving these problems, when it can be the case that you have to learn a lot about kubernetes before you can actually do what you need to do; if you already understand distributed systems then it’s a bigger effort to understand kubernetes than would be to implement a distributed system that solves your needs from scratch