In some environments I've seen cases where monitoring alerts arrive only after the system is already degrading.
Examples: - disk usage spikes faster than expected - network latency gradually increases - services degrade slowly before failing
Tools like Datadog, Zabbix, Prometheus etc. are great for alerts, but they still feel mostly reactive.
How do you deal with this in your infrastructure?
Do you rely more on: - anomaly detection - predictive monitoring - custom scripts - or just good incident response?
I'm trying to understand what actually works in real-world environments.