Hey HN, I'm Oskar. For the past few months I've been building StatusDude - an uptime monitoring tool with private agents that auto-detects your Kubernetes resources.
I run a bunch of stuff across multiple orgs, different clusters, internal networks, self-hosted, GKE, EKS, etc. Monitoring all of it without Datadog money was getting tough, and most tools don't even support internal networks. So, here we are.
A tiny async agent sits inside your network and phones home over HTTPS. No inbound ports, no VPN, no firewall rules. One container, one helm install, done. A single instance handles 10k+ monitors comfortably.
The agent pulls check definitions from the cloud, runs them locally, uploads raw results. All evaluation is server-side - the agent stays dead simple, and the cloud decides what's actually down vs. a blip.
For Kubernetes, it auto-discovers Ingresses, Services, and HTTPRoutes. Deploy something new, it just gets picked up. Monitors and status pages spin up automatically.
During the development process I found out I don't know how to use Celery properly. Went with ARQ instead - 50k+ jobs/min, no drama. After I modified it a bit, that is ;-)
Not a full observability platform - no incident management, no on-call. Just monitoring, status pages, and notifications. If you want straightforward uptime monitoring that works behind firewalls, give it a go and please leave feedback in the comments!
New signups currently get the Team plan unlocked for free, I want people to test the full thing. Happy to answer any questions about the architecture.
https://statusdude.com
https://artifacthub.io/packages/helm/statusdude-agent/status...