I'm not (just) being glib. That earlier article displays some introspection and thoughtful consideration of an old debate. The writing style is clearly personal, human.
Today's post is not so much. It has LLM fingerprints on it. It's longer, there are more words. But it doesn't strike me as having the same thoughtful consideration in it. I would venture to guess that the author tried to come up with some new angles on the news of the Claude Code leak, because it's a hot topic, and jotted some notes, and then let an LLM flesh it out.
Writing styles of course change over time, but looking at these two posts side by side, the difference is stark.
I made a commitment to write more this year and put my thoughts out quicker than I used to, so that’s likely the primary reason it’s not as deep of a piece of writing as the post you’re referencing. But I do want to note that this wasn’t written using AI, it just wasn’t intended to be as rich of a post.
The reason it came out longer is that I’ve honestly been thinking about these ideas for a while, and there is so much to say about this subject. I didn’t have any particular intention of hopping on a news cycle, but once I started writing the juices were flowing and I found myself coming up with five separate but interrelated thoughts around this story that I thought were worth sharing.
First known use in English comes from a 1658 translation of Blaise Pascal in 1657
> Je n’ai fait celle-ci plus longue que parce que je n’ai pas eu le loisir de la faire plus courte.
translated to
> I had not made this longer then the rest, but that I had not the leisure to make it shorter then it is.
(note the archaic then)
This was a popular piece of wit at the time.
Mark Twain wrote something similar a hundred years later
> You'll have to excuse my lengthiness - the reason I dread writing letters is because I am so apt to get to slinging wisdom & forget to let up. Thus much precious time is lost.
But it's still quite different.
There is a great article about this one on quoteinvestigator! https://quoteinvestigator.com/2012/04/28/shorter-letter/
If you have a strategy for jotting down (or dictating) notes while walking about, I would be curious how you manage that. I spend plenty of time walking outside, and tend to get (at the time) ideas that I'd like to explore further, most of which have evaporated from my mind by the time I get back home. Or even before I can get my phone out to jot down the keywords to help me recall the details later.
Cannot even imagine how someone would manage both walking and writing at the same time.
Some are just born with it.
- I've had an iPhone for half my life (I'm 36 and got one when I was 19), so I've gotten pretty acclimated to typing on the go. I try switching to dictation every couple of months but the iPhone's dictation trips up over enough words that I find it more frustrating than typing as I walk.
- I don't do this but if you're worried about the thoughts disappearing I would absolutely recommend recording a voice note. As I'll touch on in a moment — do not let those thoughts disappear! Even the act of codifying them into something tangible allows you to process them more deeply.
- I live in NYC but I start most mornings by taking a walk along a relatively quiet street, so I rarely end up having to worry about bumping into someone. That is definitely not universally applicable advice. (:
- I look up as I'm typing and let autocorrect take the wheel. That works at least 95% of the time, so if I make the occasional typo it doesn't really matter, I'll just fix it in post.
- It helps to have an app with a great text editing experience. I've found that there are very few out there that are fluid, many have incredibly subtle hitches that make it hard to quickly jot down thoughts onto a canvas. I really love Craft (https://craft.do) and have been using it for years, so at this point it feels more like an extension of me than an app.
- This is surely unique to everyone but my writing tends to start from a few keystone thoughts. Once I have one written down, I let myself almost free associate, writing down whatever comes to mind from that initial thought to make sure I do not forget. I can always edit after the fact, and often the editing process leads to more interesting insights as well. But the main thing I want to avoid is losing those sparks, in the same way that you're mention your thoughts evaporating. Don't let those go, just get 'em on paper and sort through 'em afterwards.
- That's all a lot easier to do on my phone than if I approached the problem as "type an essay on my phone", so I'll almost always edit a post on my computer before publishing. Yesterday was more of an exception than the rule though because I was bouncing around between doctors all day, so I wrote all of this on my phone [not expecting it to blow up or get a ton of scrutiny].
Not sure if anything's missing but I'm happy to share anything that may be helpful! Clearly this post wasn't perfect, but I've been much happier since I started letting myself write out long-form thoughts on my phone and sharing them as blog post rather than firing them off as pithy tweets that decay into the ether once the algorithm says it's time for them to go.
Apropos of nothing, this is astonishing me to no end. The ergonomics of 1) using a phone keyboard for anything but a word or two and 2) doing so while walking pretty much guarantee that I'd probably need a half a day to recover if I attempted the same.
On many (most?) posts, far more energy is spent arguing about whether a post is AI than discussing if there’s anything of value in the post.
With the exception of things that places like HN seems to consider worth reading, which is why I'm looking through the comments to this and others to find recommendations.
We're starting to become wary due to the abuse of AI and proliferation of sloppy content, but also because we often have trouble distinguishing authentic from sloppy content.
Another feature of this AI era that I hate.
“This is AI” seems to just be an evolution of other thought terminating cliches where the negative conditioning associated with something is used in an abusive and manipulative way to evade challenge or the truth itself. It is a common tactic of abusive people, the “beyond the pale” moralizing.
But I do take extra care to avoid LLM-speak as much as I can.
What is interesting and has possibly bled over from heavy LLM use by the author is the style of simplistic bullet point titles for the argument with filler in between. It does read like they wrote the 5 bullet points then added the other text (by hand).
But who knows!
One way is that the law applies to everybody equally. That has been the way it works for many years, not perfectly, in democratic countries.
There is another way of working were the law is not blind. Laws are applied based in who is the one affected. This is what big tech and the ultra-rich have been advocating for. The law applies differently to nobility and aristocrats than to the working class.
So, for all this big tech companies the law is clear: I can copy from you, you cannot copy from me.
(That is horrifying in case that anyone needs me to spell it out)
Nobody, not even Anthropic, is arguing that they should be able to host other people's paid content for free. The crux of their fair-use defense is that models are transformative works, just like parodies or book reviews, and hence should be treated as fair use.
You can't just take a pile of books (no pun intended) and turn that into Claude in a day with 30 lines of Python, there's a lot of work and know-how on the Anthropic side that goes into making a good LLM.
Situation A - Anthropic pays for a book - Anthropic transform the book into a new llm (transformative use) -> OK
Situation B - I pay for Anthropic API - I transform API responses into a new model (transformative use) -> Not OK
the situations, are clearly the same
There is a lot of knowhow going into a good divx rip too, you know.
And it enables so much novel uses such as popcorn time, with fluorishing business opportunities.
You wouldn't download a car. They did.
That’s a cynical view, but unfortunately it seems true in many cases, especially for corporate law.
Did they actually? Someone can go to prison for 5 years for that.
Fact 1: AI generated code has no copyright, so the Digital Millennium Copyright Act does not apply.
Fact 2: Misrepresenting your copyright ownership under the DMCA is felony perjury.
Fact 3: The existence of undercover.ts in the leak is grounds to void any copyright claims on whatever human written code might have existed in Claude Code. You have a DUTY TO DISCLOSE any AI generated code in your copyrighted work. undercover.ts HIDES DISCLOSURE to FRAUDULENTLY claim all the code is human written when it is not.
Given the current administration has a bone to pick with Anthropic, it was a VERY BAD IDEA for them to send false DMCA takedowns to github. Someone at Anthropic may be the very first ever to go to prison under that section of the DMCA.
Good luck!
It is an affirmative defense, you to be able to argue the merits. If you publish their source code, they are allowed to come after you whether they have previously used fair use or not. It's fact specific and determined case by case.
Anthropic won half of their fair use argument in the billion dollar settlement, but lost the other half.
You can say you're just using their code to train your own models, just like they did, and they will correctly point out that how you obtained the code also matters and you will lose just like they did.
It’s not “underrated”. Everyone is just 50 steps ahead of you.
And this whole “they’re 50 steps ahead of you” nonsense is the same kind of stuff we heard from NFT or crypto bros, that we just couldn’t comprehend the infinite wisdom of a post currency world. Sometimes bad arguments are just bad arguments.
If anything, this is a question of whether you owe royalties to the owner of IP you consumed in your life since it became part of and trained your mind, identity, and outputs too.
According to IP owners ever since things were digitized, you technically own nothing and simply paid for an authorization to use any given IP for the duration that the IP owner authorized you to use it and you continue to pay, so pay your monthly meat-AI bill to pay for all the IP your mind has been trained on.
There's even a GUI called claudia for a piecemeal extraction with a PRD.
https://github.com/kristopolous/Claudette
I've got a web, rust and tkinter version (for fun) right now just making sure this approach works.
The answer is... Mostly...
Enjoy
Seems like it would be a nightmare to provide evidence of what parts of a half a million line codebase were written by humans if no one bothered to track it.
The product hasn't been around long enough to decide whether such an approach is "sustainable". It is currently in a hype state and needs more time for that hype to die down and the true value to show up, as well as to see whether it becomes the 9th circle of hell to keep in working order.
I have come to the conclusion that we just do not know yet. There is a part of me that believes there is a point somewhere on the grand scale where the code quality genuinely does not matter if the outcome is reliably and deterministically achieved. (As an image, I like to think of Wall—E literally compressing garbage into a cube shape.)
This would ignore maintenance costs (time and effort inclusive.) Those matter to an established user base (people do not love change in my experience, even if it solves the problem better.)
On the other hand, maybe software is meant to be highly personal and not widely general. For instance, I have had more fun in the past two years than the entire 15 years of coding before it, simply building small custom-fitted tools for exactly what I need. I aimed to please an audience of one. I have also done this for others. Code quality has not mattered all that much, if at all. It will be interesting to see where things go.
It's not. My favorite example: due to vibe coding overload literally nobody knows what configuration options OpenClaw now supports. (Not even other LLM's.)
Their "solution" is to build a chat bot LLM that will attempt to configure OpenClaw for you, and hope for the best, fingers crossed. Yes, really.
My setup is very simple too, just two agents, some MD files, and discord. Nothing else. These people using it for real work or managing their email and texts are in for a rough ride.
But, there is also quite a lot of confident "code quality" fluff claims that have nothing to do with maintainability, robustness or performance. Fairly often, a guy claiming "the code is garbage" is not actually concerned with code quality as much as he is concerned with asserting dominance. Or is confusing own preference with quality.
i tried using litellm and it was 700mb! it does have a lot of features. postgres db, prometheus, api layer. really complicated. but simple help is missing. it assumes you will put passwords in an env file. how do you add chat completions in the ui? how do you add a custom provider?
but, what i wanted, a simple proxy, can be written in 30 lines of python in fastapi. memory fetch/push in 50 lines. different providers can be added with two lines, one import and one client create.
Non-trivial things tend to be much more sensitive to code quality in my experience, and will by necessity be kept around for longer and thus be much more sensitive to maintenance issues.
If you are a serious software developer, then you will probably be able to explain both what your code does (i.e. what spec it implements) and how it does it (what does it call, what algorithms does it use, what properties does it rely on?). With the advent of LLMs, people have started to accept not having a clue about the "how", and I fear that we are also starting to sacrifice the "what". Unless our LLMs get context windows that are large enough to hold the source of the full software stack, including LLM-generated dependencies, then I think that sacrificing the "what" is going to lead to disaster. APIs will be designed and modified with only the use cases that fit in the modifying agents context window in mind with little regard for downstream consequences and stability of behavior, because there is not even a definition of what the behavior is supposed to be.
I hear this narrative being pushed quite a bit, and it makes my spidey senses tingle every time. Secure programs are a subset of correct programs, and to write and maintain correct programs you need to have a quality mindset.
A 0-day doesn't care if it's in a part of your computer you consider trivial or not.
Code doesn't matter IN THE EARLY DAYS.
This is similar to what I've observed over 25 years in the industry. In a startup, the code doesn't really matter; the market fit does.
But as time goes on your codebase has to mature, or else you end up using more and more resources on maintenance rather than innovation.
> This is similar to what I've observed over 25 years in the industry. In a startup, the code doesn't really matter; the market fit does.
> But as time goes on your codebase has to mature, or else you end up using more and more resources on maintenance rather than innovation.
Counterpoint: Code does matter, in the early days too!
It matters more after you have PMF, but that doesn't mean it doesn't matter pre-PMF.
After all, the code is a step-by-step list of instructions on solving a specific pain point for a specific target market.
I think the crux here is the OP means the "quality of code" doesn't matter until PMF, only the utility matters (to the extent it helps you find PMF), in which case you're both in violent agreement.
But even then you don't need code. I briefly worked for a startup that found PMF by calling people, sending text messages, creating social media posts, measuring engagement to create reports, and sending invoices... all manually. The "code" as such was a bunch of templates in a doc for each of those. Once they actually started getting paid they moved to writing code.
Right, and in that case there is no step-by-step recipe for the product. When all that is implemented in code, that is a set of step-by-step instructions for solving the pain points.
1. Find a potential customer who's excited about the idea of what you're going to build.
2. Build just enough to make them a mostly happy, paying customer while you secure more customers.
3. Now that you have a few customers, you have a better idea of where your architecture and business flow doesn't fit their needs.
4. Adapt to this reality, and make things robust enough that you're not spending too much time on customer support.
What you listed is important, but those findings are distilled into the source code of the product. If you open the source, you are providing step-by-step instructions on solving some problem that other people are prepared to pay to solve.
Basically, you come up with a recipe for success for $FOO - why would you give that recipe away unless you've already capitalised on it?
>> It matters more after you have PMF, but that doesn't mean it doesn't matter pre-PMF.
>> After all, the code is a step-by-step list of instructions on solving a specific pain point for a specific target market.
----------------------------------
> Nope. That’s what self-important engineers will tell themselves, but it doesn’t make it remotely true. You’re patting yourself on the back for throwing together a CRUD app and burning through a bajillion dollars on AWS.
Did you perhaps reply to the wrong comment?
In less than four years the AI coding workflow has been overhauled at least twice: from Chat interface (ChatGPT) to editor integration (Cursor), then to CLI agent harnesses (CC/Codex). It would be crazy to assume that harnesses are the end of evolution.
Except, apparently, Anthropic - who are doing their darndest to get everyone onboard their tools as a moat. Apparently that's the only strategy to AI stickiness.
Claude Code 3.0 (and other agent tools) are not expected to be mature. They'll all be obsolete in two or three years, replaced by the next generation of AI tools. Everyone knows that.
And so on and on and on.
A promise of AI was mature software
But now everything is, "ship as fast as is humanly possible, literally" from management, and "garbage Claude-written PRs" from devs. Trying to maintain sanity over my monorepo is impossible.
We have nearly a century of examples of "somebody who only mostly understands making a breaking change" and decided, "what the hell, this thing is called Claude, so it can wreak havoc for as long as corporate decides"
Claude Code is strictly worse than e.g. OpenCode in my experience. Not much to see in the app’s code except how it authenticates itself…
Sure I try and use all my subscription allowance with CC on side tasks, etc. but I still end up burning a bunch of API tokens (via OpenRouter) for more serious work (even the UI and ability to quickly review what the agent has done/is doing is vastly inferior in CC).
What they have done is got me experimenting with cheaper models from other providers with those API credits.
Is it possible to start with something of this size that's vibe coded and refactor your way into something resembling a human codebase?
Given the output speed, it's practically impossible for developers to keep up, which directly impacts maintenance: the knowledge that would previously reside in-house, now is becoming dependent on having codebases pre-processed by LLMs.
I hope in the near future local LLMs will gain traction and provide an alternative, otherwise we are in the risky path where businesses are over-reliant on a few big companies.
But you can use AI to improve your codebase too. Plus models are only going to get smarter from here (or stay the same).
Training models on AI generated content leads to model collapse so they hardly become smarter if more and more code is from AI
If dealing with a functionality that is splittable into microfeatures/microservices, then anything that you need right now can potentially be vibe-coded, even on the fly (and deleted afterwards). Single-use code.
>But as time goes on your codebase has to mature, or else you end up using more and more resources on maintenance rather than innovation.
tremendous resource sink in enterprise software. Solving it, even if making it just avoidable - may be Anthropic goes that way and leads the others - would be a huge revolution.
Have you seen the code generated by AI? These things converge on the "1 million lines to make an API call" pattern. They're a lot of things, but certainly not "micro".
- building functionalities as components that are swappable on a whim requires a level of careful thought, abstraction and architecture that essentially is the exact opposite to ai slop
- in this day and age we still don't make software for the sake of it, and who's financing it doesn't generally require such levels of functional flexibility (the physical world commandeering the coding isn't nearly as volatile as to justify that)
- this comes loaded with the implication that "stuff needs to work": if you are developing software that manages inventory, orders, resources, ... you just can't take the chance to corrupt your customers data or disrupt their business processes. Shipping faster than you can test and with no accountability and no oversight is a solution to a problem I've personally never encountered in the wild
that is only for humans really. Why we need these careful thought, abstraction and architecture? Because otherwise the required code becomes an unmanageable pile of spaghetti handling myriad of edge cases of abstraction leaks and unexpected side effects. Human brain can't manage it. AI can or at least soon would be able to. It will just be a large pile of AI slop.
It may also happen that AI will also start generate good component based architecture if forced to minimize or in some other measurable way improve its slop.
Seems like the phrase "clean room" is the new "nonplussed"... how does this make any sense?
[^1]: https://bsky.app/profile/mergesort.me/post/3mihhaliils2y
Then use Anthropic's own argument that LLM output is original work and thus not subject to copyright.
Does this still count as clean-room? Or what if the model wasn't the same exact one, but one trained the same way on the same input material, which Anthropic never owned?
This is going to be a decade of very interesting, and probably often hypocritical lawsuits.
if one person writes the spec from the implementation, and then also writes the new implementation, it is not clean-room design.
There are other details of course (is the old code in the training data?) but I'm not trying to weigh in on the argument one way or the other.
Sure, the weights are where the real value lives, but if the quality is so lax they leak their whole codebase, maybe they are just lucky they didn’t leak customer data or the model weights? If that did happen, the entire business might evaporate overnight.
Actually wait, it's worse than that. The product works, demo looks great. Then someone opens the network tab and ... yeah. "Quality doesn't matter" really just means nothing caught fire yet.
Seriously, if Anthropic were like oAI and let you use their subscription plans with any agent harness, how many users would CC instantly start bleeding? They're #39 in terminal bench and they get beaten by a harness that provides a single tool: tmux. You can literally get better results by giving Opus 4.6 only a tmux session and having it do everything with bash commands.
It seems premature to make sweeping claims about code quality, especially since the main reason to desire a well architected codebase is for development over the long haul.
Wasn't it CC itself the one that leaked, well, itself? It's completely vibe-coded, which I assume means it does its own build step too, which means it leaked itself.
The only breach of best-practice I see here is using an LLM for coding.
And frankly at that point Claude Code as software doesn't matter anymore. It is about their models, they could throw it away, rewrite from scratch, etc, it wouldn't be a big deal.
Claude Code as a harness was never likely going to be for 10 years, because there would be so many of these harnesses, all different, the direction may change, etc.
As I understand someone internally quickly vibecoded for themselves only as a productivity tool, and then they realized internally how productive it can be, they decided to release it and people found it so productive they got hugely popular now thanks to that while otherwise would have been eaten out by OpenAI.
Also if requirements for this had to came from product it would have never even happened in the first place. As it was engineer trying to optimize their own workflow.
The only reason they decided to hide source code is to delay competitors imo and it wasn't related to security or anything, but by now OpenCode etc are objectively better tools anyway.
Yes, exactly. Products.
It seems like me and all the engineers I've known always have this established dichotomy: engineers, who want to write good code and to think a lot about user needs, and project managers/ executives/sales people, who want to make the non-negative numbers on accounting documents larger.
The truth is that to write "good software," you do need to take care, review code, not single-shot vibe code and not let LLMs run rampant. The other truth is that good software is not necessary good product; the converse is also true: bad product doesn't necessarily mean bad software. However there's not really a correlation, as this article points out: terrible software can be great product! In fact if writing terrible software lets you shit out more features, more quickly, you'll probably come ahead in business world than someone carefully writing good software but releasing more slowly. That's because the priorities and incentives in business world are often in contradiction to priorities and incentives in human world.
I think this is hard to grasp for those of us who have been taught our whole lives that money is a good scorekeeper for quality and efficacy. In reality it's absolutely not. Money is Disney bucks recording who's doing Disney World in the most optimal way. Outside of Disney World, your optimal in-park behavior is often suboptimal for out-of-park needs. The problem is we've mistaken Disney World for all of reality, or, let Walt Disney enclose our globe within the boundaries of his park.
> The object which labor produces confronts it as something alien, as a power independent of the producer.
It's creators clearly care not for the efficiency of how it is built, which translates directly into how it runs.
This blog post is effectively being apologetic about the fact that this is alright, since at least they got product market fit. Except Anthropic is never going to go back and clean up the mess once (if) they become profitable.
I doubt anyone will like how things will be in 5 years time if this trend of releasing badly engineered spaghetti continues.
When I say “it doesn’t matter” I mean more in an existential sense, and that people don’t seem to care. On the other hand people should do things because they care, which is why I personally still review the code that goes into my apps and spend the time to refactor and improve the stability and foundation rather than slopping like there’s no tomorrow.
Maybe I’m growing cynical but I understand why a business doesn’t care (at least until it comes back to bite them — which may take longer than some have assumed). And most of what you read about the subject is ultimately being driven by business needs of the desire of businesses.
The practical question for any CEO: if your developer's machine is running an agent with filesystem access, do you know what it can touch? The leaked code shows the answer is more nuanced than "it only touches what you tell it to."
Wrote a non-technical breakdown of what this means for AI tool policy (specifically the autonomous permissions mode and memory system that were hidden behind feature flags): https://www.aipolicydesk.com/blog/claude-code-leak-what-ceo-...
Code quality tends to have an impact on more than just aesthetics - and Claude Code certainly feels like a buggy mess from an end user's perspective.
Of course people still use Claude Code, but that is certainly because of the underlying models first and foremost. Most products don't have such a moat and would not nearly see as much tolerance from end users. If the Max subscriptions could be used with other harnesses, I am sure Anthropic would have to compete harder on the quality of the harness (to be fair, most AI based tooling seems pretty alpha these days, but eventually things will stabilize).
Polish is not everything, clearly, but it is a factor, and I feel Claude Code is maybe the worst example to use here, as it doesn't at all generalize to most other products.
1. The code is garbage and this means the end of software.
Now try maintaining it.
2. Code doesn’t matter (the same point restated).
No, we shouldn’t accept garbage code that breaks e.g. login as an acceptable cost of business.
3. It’s about product market fit.
OK, but what happens after product market fit when your code is hot garbage that nobody understands?
4. Anthropic can’t defend the copyright of their leaked code.
This I agree with and they are hoist by their own petard. Would anyone want the garbage though?
5. This leak doesn’t matter
I agree with the author but for different reasons - the value is the models, which are incredibly expensive to train, not the badly written scaffold surrounding it.
We also should not mistake current market value for use value.
Unlike the author who seems to have fully signed up for the LLM hype train I don’t see this as meaning code is dead, it’s an illustration of where fully relying on generative AI will take you - to a garbage unmaintainable mess which must be a nightmare to work with for humans or LLMs.
I generally think this will be a very important technology so I teach the subject to make sure people understand how to use it as leverage in their lives. (Yes as paid workshops, but I also volunteer weekly for 3-4 hour sessions at a non-profit where I get nothing more than the joy of helping people learn a valuable skill.)
At the same time just last week I wrote a post decrying the slop people are hoisting on their coworkers[^1], because I want people to use this technology in a positive way to create the lives they want, not to create downstream consequences for others. Ultimately I think agentic systems are incredibly powerful but also a technology that lends itself to anti-social behavior because of how independently empowering it can be. And so I hope that with the right exposure, discussion, and teaching we can take advantage of its democratizing nature, while reinforcing that what makes us special as humans is that we care and coordinate to do greater things. Value in this world — not just in the financial sense that we often boil it down to when we talk about this subject.
Hope that context helps provide a better lens into the piece, and that I still do care a lot about code and everything else that got me here, but that you are also reading personal reflections of who I am in a time of change, which is making me question (or reinforcing) some of the fundamental things I believed about software and sometimes the world more widely.
The code from one of the leading companies in the space is a good example of where the reality of what is achieved falls far short of expectations.
This is what I meant by the hype train.
If I had to assign a confidence score for whether agents will change the way we all work and many aspects of how we live, I would put it at a 7/10, maybe 8/10. I felt about the same about the smartphone. While many things we do look the same way they did in 2005 (we still drive on roads, kids still go to school), at the same time it's undeniable that much of our lives are intermediated through a small screen and many societal dynamics have shifted due to that technology's existence.
I will concede that you should read my post with that context and draw your own conclusions about the veracity of my perspective — but I think it is more well-reasoned than what people generally attribute to "LLM hype". (Of course it's a bit tautological that I believe that, but I try to surround myself with people of all kinds technical and non-technical and like to think I stay reasonably grounded.)
All that said, I think the code from a leading company being bad and yet delivering good results is more a sign of the technology's jagged frontier[^1]. Calculators can't write sonnets the same way that LLMs are bad at math, but that doesn't make them useless — it just makes them a tool. This is a tool in our tool belt and I find is surprisingly useful as a general purpose technology despite it's limitations. (Which is related to the main argument I make in the post that bad code leading to good results may imply that we're under and overweighting certain aspects of what is important in software development, and that our expectations of code may may need to be recalibrated often as we gather more evidence.)
[^1]: https://www.oneusefulthing.org/p/centaurs-and-cyborgs-on-the...
I feel the author is just stating the obvious: code quality has very little to do with whether a product succeeds
First, the twitter quote is standard toxic clapback nonsense. Gambling makes billions and does not add any value. Even facebook can argue it adds more value than gambling so this one is a dud.
People use claud code because of claud the model and not claud the harness. Cursor or a hacked up agent loop using opus or whatever are about as good. The magic is in the model not the harness here. This isnt to say the hardness is doesnt do anything.
The other bit this misses is that yes the product matters more then the code, and if the product burns battery/ram/etc doing nothing because the ai has crappy code or maybe something leaks or has a security issue, then that impacts the product.
> That's a lot of defensive engineering for a CLI tool. Worth studying if you're building anything that gives an LLM access to a filesystem.
> The permission architecture is more interesting than the leak drama, lol! Anthropic clearly thought hard about what happens when your agent tries to rm -rf. Most agent frameworks just yolo it.
Speaking for myself, I long for the day I can dump the comparatively garbage experience of Claude Code for something more enjoyable and OSS like OpenCode. But the fact is that it is simply not economically viable to do so.
So the PMF is not really for Claude Code alone -- it is for Claude Code + Claude Max.
It was taught on massive code bases that carried valuable information about what works
That being said, if you're just beginning and looking for your market fit, or pitching to investors with a flashy demo, it doesn't need to be an architectural miracle, in fact it will waste your time.
> I’ve had to question the value of code a lot over the last couple of years, and this leak continues to reinforce the notion that I’ve vastly overestimated it my entire career.
Now we could be moments away from hitting any of the rules described on https://how.complexsystems.fail, but if you’d asked me a year ago how long it would take to get there with people working this way I would have definitely taken the under. That difference in what I believed and what I see with my own two eyes is what has me questioning my priors, because my calibration seems to need readjustment (maybe large or maybe small) for the world of software we’re in right now.
Extra one (3) We are getting super lenient with major failures and having a services that has only one 9 on reliability charts as norm.
Their "product market fit" is the LLM model itself, not the harness. The harness is completely replaceable in my opinion, it's just that Claude is the cheapest way to access the models.
Wut? The value in the ecosystem is the model. Harnesses are simple. Great models work nearly identically in every harness
I tried to build my own harness once. The amount of work that is required is incredible. From how external memory is managed per session to the techniques to save on the context window, for example, you do not want the llm to read in whole files, instead you give it the capability to read chunks from offsets, but then what should stay in context and what should be pruned.
After that you have to start designing the think - plan - generate - evaluation pipeline. A learning moment for me here was to split up when the llm is evaluating the work, because same lllm who did the work should not evaluate itself, it introduces a bias. Then you realize you need subagents too and statt wondering how their context will be handled (maybe return a summarized version to the main llm?).
And then you have to start thinking about integration with mcp servers and how the llm should invoke things like tools, prompts and resources from each mcp. I learned llms, especially the smaller ones tend to hiccup and return malformed json format.
At some point I started wondering about just throwing everything and just look at PydanticAi or Langchain or Langgraph or Microsoft Autogen to operate everything between the llm and mcps. Its quite difficult to make something like this work well, especially for long horizontal tasks.
I agree that good models have more value because a harness can't magically make a bad model good, but there's a lot that would be inordinately difficult without a proper harness.
Keeping models on rails is still important, if not essential. Great models might behave similarly in the same harness, but I suppose the value prop is that they wouldn't behave as well on the same task without a good harness.
The harness matters A LOT.
The model is the engine, the harness is the driver and chassis. Even the best top of the line engine in a shitty car driven by a bad driver won't win any races.
It is not everyone’s experience that models work the same in every harness.
I would just want to test around a bit locally, maybe let it do its thing over a weekend just to see the result and then stop it again
that's actually why i built SeqPU.com — been at it for about a year. T4 16GB all the way up to 2×B200 384GB, billed by the second so idle costs nothing. test cheap, scale up only if you need to. i'd love to show you how it works and set you up with some free credits — just reply here.
Seems wrong. Devs will whine, moan and nitpick about even free software but they can understand failure modes, navigate around bugs and file issues on GitHub. The quality bar is 10-100x amongst non-techno-savvy folks and enterprise users that are paying for your software. They’re far more “picky”.
Most corporations never give code a single thought.
In the race to market, quality always suffers, and with such high stakes, it should surprise no one that AI companies are vibe-coding their own slop.
This just validates my theory that open-sourcing old code that people have sentimental attachments for, and that you won't ever make any money off of, again is actually a terrible idea.
Everything about this leak is a long list of arguments why you shouldn't ever open source anything.
We, the developer community, have really dropped the ball here.