Interestingly, that's my experience on Azure. Seems like a lot of offerings were smashed together in order to try to form a coherent product.
Seems like that would apply to many, many web services and products in the last decade.
Azure's AKS is way better as a whole, but until recently, wasn't integrated with their own scalesets. How does that even happen?
To be fair to Azure, AKS is an example on how to rollout features. All the knobs are there (even more so than GKE, you can even set cluster ip ranges), it just works and no silly surprises. At least, I haven't found surprising behavior yet.
To be more specific.. Disclaimer: I've built distributed infra on AWS, Azure and GCP as well as on premise. There's a reason I'm using GCP. I'm not stating that reason below. Instead these are a few of the reasons I don't like it. In summary I believe they focus on too many things with too little depth. This is not quite unique to GCP but rather new to GCP IMO.
Take endpoints for example. You get a nice feature that documents your API based on the code-level docs. To update it you also get an API, but it only works if you do one manual sync first from the console (https://cloud.google.com/endpoints/docs/grpc/dev-portal-sync...)
Then you discover that your documentation starts disappearing and debug for a while just to figure out that the last refactoring got the total length of your service name, RPC name and parameters over 80 characters and your doc doesn't show up if that's the case.
Then you use the tracing capabilities only to discover that the traces don't propagate across services. There's something that ESP (their nginx proxy doesn't do). You take a look at it, try to build it, but discover it uses a Bazel version that is two years old.
Then you look at quotas / throttling First you'll notice that the examples don't work. You just get errors (and apparently they also don't get fixed after sending feedback) https://cloud.google.com/endpoints/docs/grpc/quotas-configur.... You look at the example and notice it's copy-pasted-modified from some JSON and the field names are incorrect.
Then you see it throttling only works with API keys, but they (and everyone else) don't recommend you use API keys and instead use IAM service accounts. Except that a bunch of features won't work unless you use API keys. So you use API keys and then discover you can't provision them, because there's no API. You have to, again, do it manually through the console. You talk to support and they'll recommend you use IAM service accounts because they are much better, although you won't be able to use the api-key specific features (https://cloud.google.com/endpoints/docs/grpc/when-why-api-ke...).
If you take a look at service accounts and IAM you discover new things.. I won't go into details here, but let’s just say figuring out what if a policy should work in a particular case is more art than science.
Then in GKE you want to enable TLS on your gRPC service. It should be easy. https://cloud.google.com/endpoints/docs/grpc/enabling-ssl It's just that it doesn't work, or that different documents say both that it works and that it doesn't work.. Take a look at https://github.com/kubernetes/ingress-gce/issues/18 From certificate provisioning to functional service, It took roughly 5 days involving tcpdump, looking at ESP source code, all the logs, raising bugs on github, etc.
Then see Service Catalog (https://cloud.google.com/kubernetes-engine/docs/how-to/add-o...). You try to download from the link they provide and it will throw an error. You debug maybe talk to support, and they'll tell you that there was a patch and it fixes it. Just that it wasn't released since April 2018...
Then one day your GKE pod won't deploy and you see no errors. It takes a while to realize that if the port name is over a certain number of characters it won't work.
There are many of these and I could keep going :)