It's happened to me a few times.
This service is using 80% CPU, that seems a bit high... but is it always this high? Looks like it spiked within the last hour. But wait, it does that every Monday at 9 am, so probably a red herring.
This cache has a hit ratio of 60%... is that good? A bit low? Actually it's suspiciously high compared to last week - looks like a lot of people aren't getting a personalised feed.
Metrics are incredibly cheap to keep around for the value you get from a good operational dashboard, despite what Datadog/Amazon/Grafana Cloud tells you. It's just the most egregiously overpriced data you can buy since 20 cent text messages.
A good start is to set up VictoriaMetrics with some collectors and set retention to 14 days.
Also, as your storage hits 97%+, you'll probably start seeing effects in your business metrics, and then you can look into it.
real-time, high precision metrics aren't necessary. when you say that you don't need metrics and then say that you can poll metrics periodically, you are contradicting yourself.
Unless you want to be able to have trends over time, either for capacity planning (needing to order more storage in case of bare metal, or planning costs ahead) or to correlate with other things (storage consumption is growing twice as fast since deployment X, did we change something there?).
You don't need to have 1s granularity metrics on storage consumption, but having none is just stupid levels of fake "optimisation" that will cost you more in the long run.