kennethops on Hacker News

Ask HN: How do you keep system context from rotting over time?

Former SRE here, looking for advice.

I know there are a lot of tools focused on root cause analysis after things break. Cool, but that’s not what’s wearing me down. What actually hurts is the constant context switching while trying to understand how a system fits together, what depends on what, and what changed recently.

As systems grow, this feels like it gets exponentially harder. Add logs and now you’ve created a million new events to reason about. Add another database and suddenly you’re dealing with subnet constraints or a DB choice that’s expensive as hell, and no one noticed until later. Everyone knows their slice, but the full picture lives nowhere, so bit rot just keeps creeping in.

This feels even worse now that AI agents are pushing large amounts of code and config changes quickly. Things move faster, but shared understanding falls behind even faster.

I’m honestly stuck on how people handle this well in practice. For folks dealing with real production systems, what’s actually helped? Diagrams, docs, tribal knowledge, tooling, something else? Where does it break down?

35kennethops5mo ago28

Show HN: Former Cloudflare SRE building OpsCompanion a live map of whats running

Hey HN, I’m Kenneth. I spent several years as a Senior SRE at Cloudflare.

One thing that became painfully clear over time is that most outages, security issues, and compliance fire drills don’t come from a lack of tools. They come from missing context. People don’t know what’s running, how things connect, or what changed recently, especially once systems sprawl across clouds, repos, and teams.

That’s why I’m building OpsCompanion.

The goal is simple: keep a live, shared picture of what’s actually running and how it fits together.

OpsCompanion helps engineers:

See a live, visual map of services, infrastructure, and dependencies

Answer “what changed?” without digging through five tools, Slack threads, or outdated docs

Preserve operational context so the next person on call isn’t starting from zero

This isn’t about adding more logs or alerts, or slapping AI on top of existing dashboards. It’s about capturing the mental model experienced operators carry in their heads and keeping it shared and up to date.

It’s still early, and there are rough edges. I’ve opened it up to a small group of engineers who work close to production so I can get honest feedback. If it’s useful, great. If not, I genuinely want to understand why and what would make it better.

You can try it here: https://opscompanion.ai/?utm_source=hn&utm_medium=show_hn&ut...

I’ll be around in the comments. Happy to answer technical questions, hear skepticism, get a bit roasted, or talk about what actually breaks in real systems.

4kennethops5mo ago3

Ask HN: How do you keep system context from rotting over time?

Former SRE here, looking for advice.

This feels even worse now that AI agents are pushing large amounts of code and config changes quickly. Things move faster, but shared understanding falls behind even faster.

Show HN: Former Cloudflare SRE building OpsCompanion a live map of whats running

Hey HN, I’m Kenneth. I spent several years as a Senior SRE at Cloudflare.

That’s why I’m building OpsCompanion.

The goal is simple: keep a live, shared picture of what’s actually running and how it fits together.

OpsCompanion helps engineers:

See a live, visual map of services, infrastructure, and dependencies

Answer “what changed?” without digging through five tools, Slack threads, or outdated docs

Preserve operational context so the next person on call isn’t starting from zero

You can try it here: https://opscompanion.ai/?utm_source=hn&utm_medium=show_hn&ut...

I’ll be around in the comments. Happy to answer technical questions, hear skepticism, get a bit roasted, or talk about what actually breaks in real systems.

kennethops

Recent submissions

I am getting sick and tired of our AI oligarchs (opens in new tab)

Token economics don't make sense, and we need some new ideas (opens in new tab)

What if we start to draw inspiration from nature's greatest machine? (opens in new tab)

AI Agents Are Selfish and Biology Solved It (opens in new tab)

We automated everything except knowing what's going on (opens in new tab)

Show HN: OpsCompanion – A shared system model for humans and AI agents (opens in new tab)

Show HN: OpsCompanion – Live production context for teams and agents (opens in new tab)

Ask HN: How do you keep system context from rotting over time?

Show HN: Former Cloudflare SRE building OpsCompanion a live map of whats running

Recent submissions

I am getting sick and tired of our AI oligarchs (opens in new tab)

Token economics don't make sense, and we need some new ideas (opens in new tab)

What if we start to draw inspiration from nature's greatest machine? (opens in new tab)

AI Agents Are Selfish and Biology Solved It (opens in new tab)

We automated everything except knowing what's going on (opens in new tab)

Show HN: OpsCompanion – A shared system model for humans and AI agents (opens in new tab)

Show HN: OpsCompanion – Live production context for teams and agents (opens in new tab)

Ask HN: How do you keep system context from rotting over time?

Show HN: Former Cloudflare SRE building OpsCompanion a live map of whats running