This meeting happens literally every week, and has for years. Feels like the media is making a mountain out of a mole hill here.
>He asked staff to attend the meeting, which is normally optional.
Is that false? It also discusses a new policy:
>Junior and mid-level engineers will now require more senior engineers to sign off any AI-assisted changes, Treadwell added.
Is that inaccurate? It is good context that this is a regularly scheduled meeting. But, regularly scheduled meetings can have newsworthy things happen at them.
My SVP asks me to do things all the time, indirectly. I do probably 5% of them.
Judging from the comment above, no, the meeting happens every week, and this week they were asked to attend.
Note that the article doesn’t say that he told staff they have to attend the meeting. It says he “asked” staff to attend the meeting. Which again, it’s really really normal for there to be an encouragement of “hey, since we just had an operational event, it would be good to prioritize attending this meeting where we discuss how to avoid operational events”.
As for the second quote: senior engineers have always been required to sign off on changes from junior engineers. There’s nothing new there. And there is nothing specific to AI that was announced.
This entire meeting and message is basically just saying “hey we’ve been getting a little sloppy at following our operational best practices, this is a reminder to be less sloppy”. It’s a massive nothingburger.
Items weren't displaying prices and it was impossible to add anything to your cart. It lasted from about 2pm to 5pm.
It's especially strange because if a computer glitch brought down a large retail competitor like Walmart I probably would have seen something even though their sales volume is lower.
That's been their job ever since cable news was invented.
https://en.wikipedia.org/wiki/Yellow_journalism
It probably goes back as long as they have been shouting news in the town square in Rome or before that even.
Are you completely missing the point of the submission? It's not about "Amazon has a mandatory weekly meeting" but about the contents of that specific meeting, about AI-assisted tooling leading to "trends of incidents", having a "large blast radius" and "best practices and safeguards are not yet fully established".
No one cares how often the meeting in general is held, or if it's mandatory or not.
no, and that's what people are noting: the headline deliberately tries to blow this up into a big deal. When did you last see the HN post about Amazon's mandatory meeting to discuss a human-caused outage, or a post mortem? It's not because they don't happen...
Must have as the comments are hours older than OP.
https://www.theguardian.com/us-news/ng-interactive/2026/jan/...
What is worth being pointed out is how quickly people blame "The Media" for how people use, consume and spread information on social networks.
Review by a senior is one of the biggest "silver bullet" illusions managers suffer from. For a person (senior or otherwise) to examine code or configuration with the granularity required to verify that it even approximates the result of their own level of experience, even only in terms of security/stability/correctness, requires an amount of time approaching the time spent if they had just done it themselves.
I.e. senior review is valuable, but it does not make bad code good.
This is one major facet of probably the single biggest problem of the last couple decades in system management: The misunderstanding by management that making something idiot proof means you can now hire idiots (not intended as an insult, just using the terminology of the phrase "idiot proof").
The more expensive and less sexy option is to actually make testing easier (both programmatically and manually), write more tests and more levels of tests, and spend time reducing code complexity. The problem, I think, is people don't get promoted for preventing issues.
The key to making this scalable is to make as few parts as possible critical, and make the potential bad outcomes as benign as possible. (This lets you go to a lower rating in whatever safety standard applies to your industry.) You still need tests for the less critical parts though, while downtime is better than injury, if you want to sell future machines to your customers you need to have a good track record. At least if you don't want to compete on cost.
they do - but only after a company has been burned hard. They also can be promoted for their area being enough better that everyone notices.
still the best way to a promotion is write a major bug that you can come in at the last moment and be the hero for fixing.
If you told someone "I don't trust you, run all code by me first" it wouldn't go well. If you tell them "everyone's code gets reviewed" they're ok with it.
cleaning up structural issues across a couple orgs is a senior => principal promo ive seen a couple of times
This bs is what I say my juniors when I want them to fuck off with their reviews and focus on my actual work.
Sounds very insightful though.
Unchecked, AI models output code that is as buggy as it is inefficient. In smaller green field contexts, it's not so bad, but in a large code base, it's performs much worse as it will not have access to the bigger picture.
In my experience, you should be spending something like 5-15X the time the model takes to implement a feature on reviewing and making it fix its errors and inefficiencies. If you do that (with an expert's eye), the changes will usually have a high quality and will be correct and good.
If you do not do that due dilligence, the model will produce a staggering amount of low quality code, at a rate that is probably something like 100x what a human could output in a similar timespan. Unchecked, it's like having a small army of the most eager junior devs you can find going completely fucking ape in the codebase.
I disagree, in the sense that an engineer who knows how to work with LLMs can produce code which only needs light review.
* Work in small increments
* Explicitly instruct the LLM to make minimal changes
* Think through possible failure modes
* Build in error-checking and validation for those failure modes
* Write tests which exercise all paths
This is a means to produce "viable" code using an LLM without close review. However, to your point, engineers able to execute this plan are likely to be pretty experienced, so it may not be economically viable.
"Seniors will do expert review" will slowly collapse.
No one cares about handcrafted artisanal code as long as it meets both functional and non functional requirements. The minute geeks get over themselves thinking they are some type of artists, the happier they will be.
I’ve had a job that requires coding for 30 years and before ther I was hobbyist and I’ve worked for from everything from 60 person startups to BigTech.
For my last two projects (consulting) and my current project, while I led the project, got the requirements, designed the architecture from an empty AWS account (yes using IAC) and delivered it. I didn’t look at a line of code. I verified the functional and non functional requirements, wrote the hand off documentation etc.
The customer is happy, my company is happy, and I bet you not a single person will ever look at a line of code I wrote. If they do get a developer to take it over, the developer will be grateful for my detailed AGENTS.md file.
It's actually often harder to fix something sloppy than to write it from scratch. To fix it, you need to hold in your head both the original, the new solution, and calculate the difference, which can be very confusing. The original solution can also anchor your thinking to some approach to the problem, which you wouldn't have if you solve it from scratch.
Hell, often it feels slower/worse. Foreign code is easily confusing at first, which slows you down - and bad code quickly gets bewildering and sends you down paths of clarifications that waste time.
Then often it blows up in production. Makes me almost want to blanket reject PRs for being too difficult to understand. Hand written code almost has an aversion to complexity, you'd search around for existing examples, libraries, reusable components, or just a simpler idea before building something crazy complex. While with AI you can spit out your first idea quickly no matter how complex or flawed the original concept was.
If AI is a productivity boost and juniors are going to generate 10x the PRs, do you need 10x the seniors (expensive) or 1/10th the juniors (cost save).
A reminder that in many situations, pure code velocity was never the limiting factor.
Re: idiot prooofing I think this is a natural evolution as companies get larger they try to limit their downside & manage for the median rather than having a growth mindset in hiring/firing/performance.
Maybe I don't have the correct mental model for how the typical junior engineer thinks though. I never wanted to bug senior people and make demands on their time if I could help it.
I suspect that isn't the goal.
Review by more senior people shifts accountability from the Junior to a Senior, and reframes the problem from "Oh dear, the junior broke everything because they didn't know any better" to "Ah, that Senior is underperforming because they approved code that broke everything."
Whether or not these productivity gains are realized is another question, but spreadsheet based decision makers are going to try.
Especially in a big co like Amazon, most senior engineers are box drawers, meeting goers, gatekeepers, vision setters, org lubricants, VP's trustees, glorified product managers, and etc. They don't necessarily know more context than the more junior engineers, and they most likely will review slowly while uncovering fewer issues.
I’m probably not going to review a random website built by someone except for usability, requirements and security.
I also said senior review is valuable, but I'm not 100% sure if you're implying I didn't.
1. They can assess whether the use of AI is appropriate without looking in detail. E.g. if the AI changed 1000 lines of code to fix a minor bug, or changed code that is essential for security.
2. To discourage AI use, because of the added friction.
My manager has been urging us to truly vibe code, just yesterday saying that "language is irrelevant because we've reached the point where it works - so you don't need to see it." This article is a godsend; I'll take this flawed silver bullet any day of the week.
The other problem is that the type of errors LLMs make are different than juniors. There are huge sections of genuinely good code. So the senior gets "review fatigue" because so much looks good they just start rubber stamping.
I use an automated pipeline to generate code (including terraform, risking infrastructure nukes), and I am the senior reviewer. But I have gates that do a whole range of checks, both deterministic and stochastic, before it ever gets to me. Easy things are pushed back to the LLM for it to autofix. I only see things where my eyes can actually make a difference.
Amazon's instinct is right (add a gate), but the implementation is wrong (make it human). Automated checks first, humans for what's left.
I hear “x tool doesn’t really work well” and then I immediately ask: “does someone know how to use it well?” The answer “yes” is infrequent. Even a yes is often a maybe.
The problem is pervasive in my world (insurance). Number-producing features need to work in a UX and product sense but also produce the right numbers, and within range of expectations. Just checking the UX does what it’s supposed to do is one job, and checking the numbers an entirely separate task.
I don’t many folks that do both well.
I would actually say having at least 2 people on any given work item should probably be the norm at Amazon's size if you also want to churn through people as Amazon does and also want quality.
Doing code reviews are not as highly valued in terms of incentives to the employees and it blocks them working on things they would get more compensation for.
When reviewing, you need to go through every step of implementing it yourself (understand the problem, solve the problem, etc.), but you additionally need to 1) understand someone else’s solution and 2) diff your solution against theirs to provide meaningful feedback.
Review could take roughly equivalent time, but only if I am allowed to reject/approve in a binary way (“my solution would not be the same, therefore denied”) which is not considered appropriate in most places.
This is why I am not a fan of being the reviewer.
Yes, but with the caveat that the junior learns and eventually can become the senior.
We need smart people at every layer. If leadership isn't in that category, it spreads to all layers.
I don't know how we defeat capitalism to incentivize smart leadership. It's fundamentally opposed to market forces.
So you're saying that peer reviews are a waste of time and only idiots would use/propose them?
To partially clarify: "Idiot proof" is a broad concept that here refers specifically to abstraction layers, more or less (e.g. a UI framework is a little "idiot proof"; a WYSIWYG builder is more "idiot proof"). With AI, it's complicated, but bad leadership is over-interpreting the "idiot proof" aspects of it. It's a phrase, not an insult to users of these tools.
Also while this is happening most developers are getting constantly hammered by operational issues and critical security tasks because 1) the legacy toolchain imports 6 different language package ecosystems and 2)no one ever pays down tech debt in legacy code until its a high severity ticket count in a KPI dashboard visible to the senior management.
But now with AI, they are getting disrupted. Most AWS services might become obsolete, why does an ai need these janky higher levels abstractions AWS piles on.
So now they need innovation, but the company isn’t set up for it. They are forcing short deadlines for product launches that don’t matter
the marginal technological direction is determined by middle managers whoes primary motivation is “what new customer facing feature can I launch at this years re:invent and build a little empire” (of course this is a shrinking offering as tech debt and complexity pile up)
junior engineers are burned and churned on execution, seniors are project managers, principals just do high level reviews & high level fire fighting (note not actually leading the tech)
director and above just their spend time on “what to kill” or “who to fire” as priorities change every 6 months
1. Shipping: deliver tickets or be pipped.
2. Having Less comments on their PRs: for some drastically dumb reason, having a PR thoroughly reviewed is a sign of bad quality. L7 and above use this metric to Pip folks.
3. Docs: write docs, get them reviewed to show you're high level.
Without AI, an employee is worse off in all of the above compared to folks who will cheat to get ahead.
I can't see how "requesting" folks for forego their own self-preservation will work. especially when you've spent years pitting people against each other.
I'm very far away from liking Amazon's engineering culture and general work culture, but having PRs with countless of discussions and feedback on it does signal that you've done a lot of work without collaborating with others before doing the work. Generally in teams that work well together and build great software, the PRs tend to have very little on them, as most of the issues were resolved while designing together with others.
(And/but yes/no, I have never worked at NAGFAM...)
I missed my FAANG chance during the good years. No retirement for me!
There’s also this implicit imbalance engineers typically don’t like: it takes me 10 min to submit a complete feature thanks to Claude… but for the human reviewing my PR in a manual way it will take them 10-20 times that.
Edit: at the end real engineers know that what takes effort is a) to know what to build and why, b) to verify that what was built is correct. Currently AI doesn’t help much with any of these 2 points.
The inbetweens are needed but they are a byproduct. Senior leadership doesn’t know this, though.
I'd prefer people wrote good quality code and checked it as they went along... whilst allowing room for other stuff they didn't think of to come to the front. The production process of using LLMs is entirely different, in its current state I don't see the net benefit.
E.g. if you have a very crystalised vision of what you want, why would I want an engineer to use an LLM to write it, when the LLM can't do both raw production and review? Could this change? Sure. But there's no benefit for me personally to shift toward working that way now - I'd rather it came into existence first before I expose myself to incremental risk that affects business operations. I want a comprehensive solution.
It sounds like a piss poor deal for seniors unless senior engineer now means professional code reviewer.
Well, you'd think senior leadership should know how their business and their people work.
News from the inside makes it sound like things are getting pretty bad.
You mean senior programmers that have been there for ages don't want to spend their time reviewing AI slop? Who'd a thunk it!
People push AI-reviewed code like they wrote it. In the past, "wrote it" implies "reviewed it." With AI, that's no longer true.
I advocate for GitHub and other code review systems to add a "Require self-review" option, where people must attest that they reviewed and approved their own code. This change might seem symbolic, but it clearly sets workflows and expectations.
It also makes me more comfortable figuring out how a project's pull acceptance are like (maybe due to how fast local ui is compared to web-based git). On the other hand, I can only run some basic git cli commands and can't quickly comprehend raw text-based diff, especially when encountering some linux patches from time to time.
working at amazon, when I wanted to review code myself through the CR tool, Id still end up publishing it to the whole team and have to add some title shenanigans saying it was a self review or WIP and for others to not look at it yet
They're torn between "we want to fire 80% of you" and "... but if we don't give up quality/reliability, LLMs only save a little time, not a ton, so we can only fire like 5% of you max".
(It's the same in writing, these things are only a huge speed-up if it's OK for the output to be low-quality, but good output using LLMs only saves a little time versus writing entirely by-hand—so far, anyway, of course these systems are changing by the day, but this specific limitation has remained true for about four years now, without much improvement)
That has always been my feeling. Once I really understand what I need to implement, the code is the easy part. Sure it takes some time, but it's not the majority. And for me, actually writing the code will often trigger some additional insight or awareness of edge cases that I hadn't considered.
Essentially something big has to happen that affects the revenue/trust of a large provider of goods, stemming from LLM-use.
They wont go away entirely. But this idea that they can displace engineers at a high-rate will.
Mentoring Juniors is an important part of the job and crucial service to the industry, but juniors equipped with LLMs make the deal a bit more sour. Anecdotally, they don't really remember the feedback as well, because they weren't involved in writing the code. Its burnout-inducing to see your hard work and feedback go in one ear and out another.
I personally know people looking to jump ship because they waste too much time at their current employer on this.
Not really.
Code review should not be (primarily) about catching serious errors. If there are always a lot of errors, you can’t catch most of them with review. If there are few it’s not the best use of time.
The goal is to ensure the team is in sync on design, standards, etc. To train and educate Jr engineers, to spread understanding of the system. To bring more points of view to complex and important decisions.
These goals help you reduce the number of errors going into the review process, this should be the actual goal.
The fact that software is "soft" makes it seem like this doesn't apply, but it does, not least because of the fact that once you have gone down the wrong path with software design, it is very difficult to pull back and realize you need to go down an entirely different one.
We love this for Amazon, they're a very strong company making bold decisions.
First thing that comes to mind is: reminds me of those movie where some dictatorship starts to crumble and the dictator start being tougher and tougher on generals, not realizing the whole endeavor is doomed, not just the current implementation.
Then again, as a former amazon (aws) engineer: this is just not going to work. Depending how you define "senior engineer" (L5? L6? L7?) this is less and less feasible.
L5 engineers are already supposed to work pretty much autonomously, maybe with L6 sign-off when changes are a bit large in scope.
L6 engineers already have their own load of work, and a fairly large amount of engineers "under" them (anywhere from 5 to 8). Properly reviewing changes from all them, and taking responsibility for that, is going to be very taxing on such people.
L7 engineers work across teams and they might have anywhere from 12 to 30 engineers (L4/5/6) "under" them (or more). They are already scarce in number and they already pretty much mostly do reviews (which is proving not sufficient, it seems). Mandating sign-off and mandating assumption of responsibility for breaking changes means these people basically only do reviews and will be stricter and stricter[1] with engineers under them.
L8 engineers, they barely do any engineering at all, from what I remember. They mostly review design documents, in my experience not always expressing sound opinions or having proper understanding of the issues being handled.
In all this, considering the low morale (layoffs), the reduced headcount (layoffs) and the rise in expectations (engineers trying harder to stay afloat[2] due to... layoffs)... It's a dire situation.
I'm going to tell you, this stinks A LOT like rotting day 2 mindset.
----
1. keep in mind you can't, in general, determine the absence of bugs
2. Also cranking out WAY MUCH MORE code due to having gen-ai tools at their fingertips...
So basically, kill the productivity of senior engineers, kill the ability for junior engineers to learn anything, and ensure those senior engineers hate their jobs.
Bold move, we'll see how that goes.
I wonder if it's an early step towards an apprenticeship system.
How else would they train the LLM PR reviewers to their standards?
I've never personally been in the position, because my entire career has been in startups, but I've had many friends be in the unenviable position of training their replacements. Here's the thing though, at least they knew they were training their replacements. We could be looking at a potential future where an employee or contractor doesn't realize s/he is actually just hired to generate training data for an LLM to replace them, and then be cut.
Feels inevitable that code for aviation will slowly rot from the same forces at play but with lethal results.
Just because nearly all software is going to be written by AI, does not mean critical infrastructure will be.
LOL, it's the age old "responsibility without authority". The pressure to use AI will increase and basically you'll be fired for not using it. Simultaneously with the pressure to take the blame when AI fucks up and you can't keep up with the bullshit, leading you to get fired. One way or the other, get some training on how to stack shelves at the supermarket because that's how our future looks, one way or the other.
The presumably human mid-level or junior engineer has their own issues with this, but the point of the LLM is that you don't need that engineer. For productivity purposes, the dev org only needs the seniors to wrangle all the LLMs they can. That doesn't sustain, so a couple of more-junior engineers can do similar work to mature.
The impression I get from SWEs I’ve met throughout my life is that most of them don’t actually care about their job. They got in because it paid well and demand was plentiful.
I find myself context-switching all the time and it's pretty exhausting, while also finding that I'm not retaining as much deep application domain knowledge as I used to.
On the surface, it's nice that I can give my LLM a well-written bug ticket and let it loose since it does a good job most of the time. But when it doesn't do a good job or it's making a change in an area of the codebase I'm not familiar with, auditing the change gets tiring really fast.
The way I am working with AI agents (codex) these days is have the AI generate a spec in a series of MD documents where the AI implementation of each document is a bite sized chunk that can be tested and evaluated by the human before moving to the next step and roughly matches a commit in version control. The version control history reflects the logical progression of the code. In this manner, I have a decent knowledge of the code, and one that I am more comfortable with than one-shotting.
Prior to each step, I prompt the AI to review the step and ask clarifying questions to fill any missing details. Then implement. Then prompt the AI after to review the changes for any fixes before moving on to the next step. Rinse, repeat.
The specs and plans are actually better for sharing context with the rest of the team than a traditional review process.
I find the code generated by this process to be better in general than the code I've generated over my previous 35+ years of coding. More robust, more complete, better tested. I used to "rush" through this process before, with less upfront planning, and more of a focus on getting a working scaffold up and running as fast as possible, with each step along the way implemented a bit quicker and less robustly, with the assumption I'd return to fix up the corner cases later.
In the pre-gen-AI days, if an engineer put up a PR, it implied (somewhat) they wrote their code, reviewed it implicitly as they wrote it, and made choices (ie: why is this the best approach).
If Claude is just the new high level programming language, in terms of prompting in natural language, the challenge is that we're not reviewing the natural language, we're reviewing the machine code without knowing what the inputs were. I'm not sure of a solution to this, but something along the lines of knowing the history of the prompting that ultimately led to the PR, the time/tokens involved, etc. may inform the "quality" or "effort" spent in producing the PR. A one-shotted feature vs. a multi-iteration feature may produce the same lines of code and general shape, but one is likely to be higher "quality" in terms of minimal defects.
Along the same lines, when I review some gen-AI produced PR, it feels like I'm reading assembly and having to reverse how we got here. It may be code that runs and is perfectly fine, but I can't tell what the higher level inputs were that produced it, and if they were sufficient.
So you have 2 systems of engineers: Sr- and Sr+
1. Both should write code to justify their work and impact
2. Sr- code must be reviewed by Sr+
What happens:
a. Sr+ output drops because review takes their time more and more
b. Sr+ just blindly accepts because of the volume is too high, and they should also do their own work
c. Sr+ asks Sr- to slow-down, then Sr- can get bad reviews for the output, because on average Sr+ will produce more code
I think (b) will happen
Take a perfectly productive senior developer and instead make him be responsible for output of a bunch of AI juniors with the expectation of 10x output.
still within the engineering IC role, but on a different track
And from their sagely reviews, we shall train a large language model to ultimately replace them because the most fungible thing at Amazon is the leadership.
And what are they going to do when they've fired all the senior engineers because they make too much money, leaving just juniors and AI?
When they fire everyone, juniors will fix it with AI.
This is in general. I wouldn’t recommend this at critical services like AWS.
Imagine having to debug code that caused an outage when 80% is written by an LLM and you now have to start actually figuring out the codebase at 2am.. :)
i think the team i was on was a bit of an outlier in terms of owning 40 dumptser fires at once, and the first time reading any one of them was at 2AM because it was down.
having an LLM give early passes on reading the godawful c++ code that you can tell at a glance that its not gonna work as expected, but you cant tell why, or what expected actually is would have been phenomenal, and gotten me back to sleep at 3 on those codebases rather than 5.
Obviously it's probably cost-prohibitive to do an all to all analysis for every PR, but I imagine with some intelligent optimizations around likelihood and similarity analysis something along those lines would be possible and practical.
COEs and Operation Readiness Reviews are already the documents that you mention, but they are largely useless in preventing incidents.
Amazon does have those things, and has fine tuning on models based on those postmortems.
Noisy reviews are also a problem causer. the PR doesnt know what scale a chunk of code is running at, without having access to 20 more packages and other details.
/s
So now, you can speed up using Claude Code and use Code Review to keep it in check.
https://www.aboutamazon.com/news/aws/aws-service-outage-ai-b...
force agents to not touch mission critical things, fail in CI otherwise
let it work on frontends and things at the frontier of the dependency tree, where it is worth the risk
Imagine if the #1 problem of your woodworking shop is staff injuries, and the solution that management foists on you is higher RPM lathes.
I am seeing this mindset still, with AI Agents. I imagine they will slowly realize they need to use this stuff to be competitive, but being slow to adopt AI seems like it could have been the source of this.
If you know CS you know two things:
1. AI can not judge code either noise or signal, AI cannot tell. 2. CS-wise we use statistic analysis to judge good code from bad.
How much time does it take to take AI output and run the basic statistic tools for most computer languages?
Some juniors need firing outright
maybe as software engineering topics, but thats a different discipline
In the meantime they will be quite a bit slower I’d imagine.
Also wonder if those seniors will ever get to actually do any engineering themselves now that they’re the bottleneck. :)
I do think AI adoption exacerbates said falloff.
"No, not like that though!"
There was never before a technology capable of convincing leadership of its usefulness despite its constant blunders and despite the low quality of its output. This feels like a corporate mass delusion of unprecedented scale.
Has Seattle now become the code-slop capital ? Or is SFO still on top ?
Over the next few days my account history came back, except purchases made Q1 2026. Those are still missing. There are a few substantial purchases I made that are nowhere to be found anymore.
I attributed this Iranian missiles hitting some of their infrastructure in EU, as it had been reported.
Now I am not sure if it was blast radius from missiles or AI mishaps. Lmao - couldn’t happen to a worse company…
Thought this blurb most interesting. What's the between-lines subtext here? Are they deliberately serving something they know to be faulty to the Chinese? Or is it the case that the Chinese use it with little to no issue/complaint? Or...?
The seniors will now be directly responsible for all the AI slop that goes in. But how can they possibly properly review reams of code to a sufficient degree they can personally vouch for it?
So what incentive is there for juniors to look at the code at all? Seniors are now just another CI stage for their slop to pass.
Haven't tried Kiro CLI.
In my experience, it's high-quality for creating and iterating on specs. Tools like Cursor are optimized for human-driven vibing -- they have great autocomplete, etc. Kiro, by contrast, is optimized around spec, which ironically has been the most effective approach I've found for driving agents.
I'd argue that Cursor, Antigravity, and similar tools are optimized for human steering, which explains their popularity, while Kiro is optimized for agent harnesses. That's also why it’s underused: it's quite opinionated, but very effective. Vibe-coding culture isn't sold on spec driven development (they think it's waterfall and summarily dismiss it -- even Yegge has this bias), so people tend to underrate it.
Kiro writes specs using structured formats like EARS and INCOSE. It performs automated reasoning to check for consistency, then generates a design document and task list from the spec -- similar to what Beads does. I usually spend a significant amount of time pressure-testing the spec before implementing (often hours to days), and it pays off. Writing a good, consistent spec is essentially the computer equivalent of "writing as a tool of thought" in practice.
Once the spec is tight, implementation tends to follow it closely. Kiro also generates property-based tests (PBTs) using Hypothesis in Python, inspired by Haskell's QuickCheck. These tests sweep the input domain and, when combined with traditional scenario-based unit tests, tend to produce code that adheres closely to the spec. I also add a small instruction "do red/green TDD" (I learned this from Simon Willison) and that one line alone improved the quality of all my tests.
Kiro can technically implement the task list itself, but this is where agents come in. With the spec in hand, I use multiple headless CLI agents in tmux (e.g., Kiro CLI, Claude Code) for implementation. The results have been very good. With a solid Kiro spec and task list, agents usually implement everything end-to-end without stopping -- I haven’t found a need for Ralph loops. (agents sometimes tend to stop mid way on Claude plans, but I've never had that happen with Kiro, not sure why, maybe it's the checklist, which includes PBT tests as gates).
Kiro didn't have the strongest start, but the Kiro IDE is one of the best spec generators I've used, and it integrates extremely well with agent-driven workflows.
I do consulting and use AI a lot. You just have to take responsibility for the code. We are delivering like never before, but have a lot of experience into how to do it as safe as possible. And we are learning along the way. They say you need a year to build up experience fyi.
I feel bad for those engineers who will have to sign off for things they will most likely not have enough time to review. Kiro is nice and all.
no! not that way!
Seems to me too low level in everyone’s stack to not have humans doing the work, especially at this stage. But what do I know, I certainly am not at the helm of a multibillion dollar operation.
(Before injecting it into global infra...)
The environment breathed a little.