Would be nice if there was better tooling for going from observed problem to responsible team.
But you need to know they exist. :)
I would be very surprised if any enterprise of significant size and IT complexity didn't have an IT incident response team. I'm biased but I think they are a necessity in complex environments where oncall engineers can't possibly even keep track of all their integrators and integrators' integrators, etc. It also helps to have incident commanders who do that job multiple times a week instead of a few times a decade.
I can go to a website and type in search terms, URLs and pull up exactly who to contact. Even our generic "help something is broken" group relies on this. There are many names listed so even if the on call person listed is "making dinner", you have their backup, their manager, etc.
I can tag my system as dependent on another and if they have issues I get alerted.
If you didn't know what it was, you could page the SRE team and we'd diagnose with you.
Sometimes as SREs we would shortcut the process and just know who the right person is with the answer, but at least this way that tribal knowledge was somewhat encoded.
Edit: Lol at myself, I thought this was a blog post from Meta and I was pointing out that there is a YC company that does this for everyone.
Now I realize that this was an ad for a different YC company that also does (although WM is a year older).
Agree that most people hate being woken at 2am, but disagree that humans aren't great at incident response. Speaking generally, I think we're about as good as it gets when it comes to adaptability and the kind reasoning that's necessary to investigate complex issues.
That said, I also think AI can play a massive role aiding humans, especially in undifferentiated tasks like checking deployments, code changes, past incidents, and when it comes to spotting patterns.
IMO the sweet spot is going to come from highly ergonomic AI products that enable collaborative incident response, rather AI incident management or any other marketing BS.
This clearly telegraphs a few things for those that are observant enough to notice.
Anyone interested in going into that field who has a brain would note that they are removing the entry level jobs using this since those are the simpler jobs.
With less demand for said labor, the more cutthroat the competition to get said jobs will be. If you double the competition, wages must naturally drop and with that so does competency and skill. When proficiency isn't rewarded, those with options go elsewhere.
When any other job with 1/10th the responsibility gets paid the same, its a simple no brainer choice.
When industry with large marketshare telegraphs this, it is a strong indicator that IT operations (in this case) will soon be a dead profession (soon being a relative generation), and anyone investing in education for it will get a negative ROI.
Upfront, they companies may get some profits off the bottom-line, but long-term they won't be able to find or keep talent. Said talent will have left and you get Atlas Shrugged dynamics.
Aside from that, that type of strategy also suggests other cascading dynamics that lead to grid/societal collapse following a burning the bridges approach. Those in their ivory towers may not see the consequences of their choices until it is too late for anything except coffin nails.
The fundamental issue of market sector concentration where few parties cooperate is that counter-party systemic risk becomes unmanageable and chaotic, destructive decisions cannot be softened.
In chaotic feedback systems, the general rule of thumb is chaos increases until the underlying imbalance is equalized.
For some, recognizing the inevitable conclusion of a series of events may dictate a radically altered approach towards life planning/survival.
On the general topic - I’m interested in people sharing their experiences using LLMs in livesite scenarios.
If you're looking for something open source: https://github.com/robusta-dev/holmesgpt/
Reading between the lines/Translation: We need ~50% less IT operations staff, we can have LLMs do this instead.
The LLMs are coming for your white-collar work, and this is likely driving the white-collar recession.
Here's a thought, incident response is one of the areas where entry-level SA's sharpen their teeth and skills to become able to complete intermediate and senior level roles.
What happens when all the low-hanging fruit, the entry-level jobs are now replaced by LLMs and there is no short-term business need to hire such people.
No jobs means labor pool finds something else outside their profession, or they sit around on food stamps homeless, agitating with the homies, until a critical mass occurs.
Your intermediate and senior people naturally age and die, so how do you find replacements for them? ...
When there is no economic advantage for developing a skill set, no one goes into the field, it acts like a sieve, and eventually the skills involved becomes lost knowledge. Those that have the skills that are unable to find work seek work elsewhere, and rarely return. They were burned severely, enough times that it becomes a bad bet to try again in that field.
These things aren't rocket science, and yet people seem to be so slothful or greedy, that they are unable to see or act to prevent what naturally happens next.
When people can't find jobs to feed themselves or loved ones, where the only future which has been imposed on them is slavery or death, these people will organize and do the only thing they can once they are desperate enough; and that is violence. These same dynamics were present in the decades leading up to 1776, according to historic record.
It is so extremely short-sighted.