Servers/services to monitor: Amazon EC2 and RDS, Ruby on Rails web app Redis + Resque workers, our own Mac servers
Alert stages: Minor, where an unobtrusive message is sent to operations. Severe, where someone gets a phone call
Current solution: New Relic to monitor Ruby on Rails, Amazon EC2 instances, Amazon RDS, Redis, and Resque. Collectd + Riemann + Librato Metrics to monitor our own servers. Hipchat for minor alerting. PagerDuty for severe alerting
The biggest issue I'm having now, is that alerting in New Relic for anything other than Rails and Servers (RDS, Redis, and Resque) is bad. I'm asking HN, how do you monitor your cloud server and services, and how do you provide alerting to staff when something goes wrong?
No comments yet.