Plenty has also been written about the bugs/issues that have cropped up that are only visible when inspecting what regions/nodes/cgroups an issue is coming from [0]. My use case wasn't exactly `pod=...` but it was very similar. It was more like `device=...`. Also, for a huge application, it's not uncommon to have 100s or even 1000s of metrics that are important to application health/performance. Constantly saying "do you really need X? It will cost us Y" will lead to an extremely under-monitored application.
[0] - https://cloud.google.com/blog/products/management-tools/sre-...