What monitors the heartbeat server to make sure it is up and running as expected?
Currently we simply write each heartbeat to our redis database, along with the UTC time it occurred. Some heartbeats are global (string key) and others we have one per server (hash key, server name => last heartbeat). There is an API endpoint set up that some of our systems use to trigger heartbeats.
We then have a public page at a secret url that renders out the full status check. Pingdom is set up to check this page for "warning-status: OK" and if its not there send us an email. It also checks for "final-status: OK" and if that is not there, it triggers PagerDuty which will wake us up if needed.
I would prefer if PagerDuty could check it directly, then they would be our only point of failure.
-- Edit -- I added a screenshot to the post which should make it a bit clearer.
I'm working with Microsoft's Semantic Logging Application Block to standardize error logging and auditing across all my .NET apps. It utilizes the built-in Event Tracing for Windows (ETW) to capture events written by any application, and writes them to a destination of your choice (db, file, etc). .NET 4.5 includes a new EventSource class that I inherit from, and simply call WriteEvent() and the message goes off to ETW. Reference: http://blogs.msdn.com/b/agile/archive/2013/02/07/embracing-s...
I would like to add heartbeats to my design, in addition to just error logging, tracing and http auditing.