undefined | Better HN

0 pointsdsfyu404ed7y ago0 comments

I work for a smaller but comparably large platform. "If everything is down check the DB" is at the top of one of our internal monitoring websites in red.

Screw ups related to data loss are rare (I've been here years and haven't seen one with the DBs that the stuff I work with uses) but failures at this scale tend to cascade a little ways and it takes time to dig out of the hole. They probably have the problem solved but they have to spend a bunch of time synchronizing things and verifying the fix before they press the big red "go live" button.

0 comments

2 comments · 1 top-level

pferde7y ago· 1 in thread

Shouldn't the monitoring websites be able to check the DB status for you before you even look at that red text? :)

dsfyu404edOP7y ago

We have a different dedicated page that gives an overviews of what's going on with the DB. The page in question is supposed to be a single stop that lets you visually get an overview of the state of the application servers and whether things are "normal" and if not allow you to quickly identify what is not normal.

j / k navigate · click thread line to collapse