- I 100% believe this is happening and is probably going to be the case in the next 6 months. I've seen Claude and Grok debug issues when they only had half of the relevant evidence (e.g. Given A and B, it's most likely X). It can even debug complex issues between systems using logs, metrics etc. In other words, everything a human would do (and sometimes better).
- The situation described is actually not that different from being a SRE manager. e.g. as you get more senior, you aren't doing the investigations yourself. It's usually your direct reports that are actually looking at the logs etc. You may occasionally get involved for more complex issues or big outages but the direct reports are doing a lot of the heavy lifting.
- All of the above being said, I can imagine errors so weird/complex etc that the LLMs either can't figure it out, don't have the MCP or skill to resolve it or there is some giant technology issue that breaks a lot of stuff. Facebook engineers using angle grinders to get into the data center due to DNS issues comes to mind for the last one.
Which probably means we are all going to start to be more like airline pilots:
- highly trained in debugging AND managing fleets of LLMs
- managing autonomous systems
- around "just in case" the LLMs fall over
P.S. I've been very well paid over the years and being a SRE is how I feed my family. I do worry, like many, about how all of this is going to affect that. Sobering stuff.
-