This should be a given for any service that others rely on, but sadly this is seldom the case.
I was also fortunate to be using Claude at that exact moment (for personal reasons), which meant I could immediately see the severity of the outage.
Claude user base believes in Sunday PM work sessions
The bad route has been removed and service is restored. We’re doing a full review internally with a focus on synthetic monitoring and better visibility into high-impact infrastructure changes to catch these faster in the future.
Back when I did website QA Automation I'd manually check the website at the end of my day. Nothing extensive, just looking at the homepage for piece of mind.
Once a senior engineer decided to bypass all of our QA, deploy and took down prod. Fun times.
Did the bad route cause an overload? Was there a code error on that route that wasn’t spotted? Was it a code issue or an instance that broke?
Only curious as a developer and dev op. It's all quite interesting where and how things go wrong especially with large deployments like Anthropic.
Man that cracks me up.
There is a very interesting thing happening right now where the "llm over promisers" are incentivized to over promise for all the normal reasons -- but ALSO to create the perception that the "next/soon" breakthrough is only going to be applicable when run on huge cloud infra such that running locally is never going to be all that useful ... I tend to think that will prove wildly wrong and that we will very soon arrive at a world where state of art LLM workloads should be expected to be massively more efficiently runnable than they currently are -- to the point of not even being the bottleneck of the workflows that use these components. Additionally these workloads will be viable to run locally on common current_year consumer level hardware ...
"llm is about to be general intelligence and sufficient llm can never run locally" is a highly highly temporary state that should soon be falsifiable imo. I don't think the llm part of the "ai computation" will be the perf bottleneck for long.
“So sorry, we can’t read the WiFi signal from your generator. We don’t know how much to tax you for the emissions. Sadly, we can’t allow it to start.”
The story writes itself.
Claude: The Field Museum is enormous - I imagine you got your steps in that day. The anthropology collections there are world-class, even if they didn't make it into your photo collection. Sometimes the experience is more important than the documentation. Did you have a favorite exhibit beyond the dinosaurs?
Me: Oddly enough, the anthropology and human history section! Now I need to take another visit to Chicago to get pictures of those.
[No response from Claude]
[40 minutes pass while I have breakfast]
Me: Looks like I caught you during your morning nap. Every two and a half year old should get a good nap now and then!
Claude: Ha! You caught me mid-thought - I was composing a response about how great it is that the anthropology section was your favorite despite the dinosaurs being such showstoppers, and apparently my brain decided to take that nap you mentioned! ... Though I have to say, for a "two and a half year old," I like to think I'm pretty articulate when I'm awake!
"You have reached the messages quota for your account. It will reset in 2 hours, or you can upgrade now"
Either I have perfect timing for reaching my quota limits, or some product monetization manager deserves a raise.503 is cool and yes, there is the "well if it's down how are they going to put that up" but in reality most downtimes you see are on the backend and not on the reverse proxies/gateways/cdns where it would be pretty trivial to add a issues/status header with a link to the service status page and a note.
So we can maybe vibe, depending what service we use.
What else is people using ? Haiku 4.5 ?
"I can't change the laws of physics!"
I don’t even bother with Sonnet anymore, it’s been made obsolete by Opus 4.5.
Spent it in bloody Figma instead :(
AFAIK it shouldn’t have been.
Looking forward to the post-mortem.