undefined | Better HN

0 pointsgwerbin2mo ago0 comments

Or just don't use AI to write code. Use it as a code reviewer assistant along with your usual test-lint development cycle. Use it to help evaluate 3rd party libraries faster. Use it to research new topics. Use it to help draft RFCs and design documents. Use it as a chat buddy when working on hard problems.

I think the AI companies all stink to high heaven and the whole thing being built on copyright infringement still makes me squirm. But the latest models are stupidly smart in some cases. It's starting to feel like I really do have a sci-fi AI assistant that I can just reach for whenever I need it, either to support hard thinking or to speed up or entirely avoid drudgery and toil.

You don't have to buy into the stupid vibecoding hype to get productivity value out of the technology.

You of course don't have to use it at all. And you don't owe your money to any particular company. Heck for non-code tasks the local-capable models are great. But you can't just look at vibecoding and dismiss the entire category of technology.

0 comments

31 comments · 2 top-level

onlyrealcuzzo2mo ago· 21 in thread

> Or just don't use AI to write code.

Anecdata, but I'm still finding CC to be absolutely outstanding at writing code.

It's regularly writing systems-level code that would take me months to write by hand in hours, with minimal babysitting, basically no "specs" - just giving it coherent sane direction: like to make sure it tests things in several different ways, for several different cases, including performance, comparing directly to similar implementations (and constantly triple-checking that it actually did what you asked after it said "done").

For $200/mo, I can still run 2-3 clients almost 24/7 pumping out features. I rarely clear my session. I haven't noticed quality declines.

Though, I will say, one random day - I'm not sure if it was dumb luck - or if I was in a test group, CC was literally doing 10x the amount of work / speed that it typically does. I guess strange things are bound to happen if you use it enough?

Related anecdata: IME, there has been a MASSIVE decline in the quality of claude.ai (the chatbot interface). It is so different recently. It feels like a wanna-be crapier version of ChatGPT, instead of what it used to be, which was something that tried to be factual and useful rather than conversational and addictive and sycophantic.

mlinsey2mo ago

My anecdata is that it heavily depends on how much of the relevant code and instructions it can fit in the context window.

A small app, or a task that touches one clear smaller subsection of a larger codebase, or a refactor that applies the same pattern independently to many different spots in a large codebase - the coding agents do extremely well, better than the median engineer I think.

Basically "do something really hard on this one section of code, whose contract of how it intereacts with other code is clear, documented, and respected" is an ideal case for these tools.

As soon as the codebase is large and there are gotchas, edge cases where one area of the code affects the other, or old requirements - things get treacherous. It will forget something was implemented somewhere else and write a duplicate version, it will hallucinate what the API shapes are, it will assume how a data field is used downstream based on its name and write something incorrect.

IMO you can still work around this and move net-faster, especially with good test coverage, but you certainly have to pay attention. Larger codebases also work better when you started them with CC from the beginning, because it's older code is more likely to actually work how it exepects/hallucinates.

onlyrealcuzzo2mo ago

> My anecdata is that it heavily depends on how much of the relevant code and instructions it can fit in the context window.

Agreed, but I'm working on something >100k lines of code total (a new language and a runtime).

It helps when you can implement new things as if they're green-field-ish AND THEN implement and plumb them later.

1 more reply

janalsncm2mo ago

How can a person reconcile this comment with the one at the root of this thread? One person says Claude struggles to even meet the strict requirements of a spec sheet, another says Claude is doing a great job and doesn’t even need specific specs?

I have my own anecdata but my comment is more about the dissonance here.

oefrha2mo ago

One aspect you have to consider is the differences in human beings doing the evaluation. I had a coworker/report who would hand me obvious garbage tier code with glaring issues even in its output, and it would take multiple iterations to address very specific review comments (once, in frustration, I showed a snippet of their output to my nontechnical mom and even my mom wtf’ed and pointed out the problem unprompted); I’m sure all the AI-generated code I painstakingly spec, review and fix is totally amazing to them and need very little human input. Not saying it must be the case here, that was extreme, but it’s a very likely factor.

2 more replies

DennisP2mo ago

I just read Steve Yegge's book Vibe Coding, and he says learning to use AI effectively is a skill of its own, and takes about a year of solid work to get good at it. It will sometimes do a good job and other times make a mess, and he has a lot of tips on how to get good results, but also says a lot of it is just experience and getting a good feel for when it's about to go haywire.

sarchertech2mo ago

One person is rigorously checking to see if Claude is actually following the spec and one person isn’t?

2 more replies

stevenicr2mo ago

I think it matters what the project and tech stack is, and how much you try to get done before starting a fresh chat.

I've had interesting chats where it explained that it's choice of tailwind for example was because it had a ton of training knowledge on it.

I've also had it try to build more in one chat than it should many times.

For some reason openai codex handles building too much without failing better - but that is total anecdata from my particular projects and ymmv.

I've had these things try to build big when a little nudge gets them to change direction and not build so much. Explain which libraries and such and asking it to change the tech stack and the steps to build at once seem to make things much better for my use cases.

Also running extra checks and cleanup later is a thing, that sure a human might have seen an obvious thing at time of build, but we have bigger memory context comparatively imho.

mojuba2mo ago

I think it depends on both the complexity and the quality bars set by the engineer.

From my observations, generally AI-generated code is average quality.

Even with average quality it can save you a lot of time on some narrowly specialized tasks that would otherwise take you a lot of research and understanding. For example, you can code some deep DSP thingie (say audio) without understanding much what it does and how.

For simpler things like backend or frontend code that doesn't require any special knowledge other than basic backend or frontend - this is where the bars of quality come into play. Some people will be more than happy with AI generated code, others won't be, depending on their experience, also requirements (speed of shipping vs. quality, which almost always resolves to speed) etc.

sameerds2mo ago

It could just be that each of the two reviewers is merely focussing on different sides of the same coin? I use Claude all the time. It saves me a lot of effort that I would have otherwise spent in looking up specific components. The magically autocompleted pieces of boilerplate are a tangible relief. It also catches issues that I missed. But when it is wrong, it can be subtly or embarassingly or spectacularly wrong depending on the situation.

justinclift2mo ago

Note that one person is mentioning they use Claude Sonnet, which is less capable than the higher tiers (Opus, etc).

aforwardslash2mo ago

It boils down to scope. I use CC in both very specific one-language systems and broad backend-frontend-db-cache systems. You can guess where the difficulty lies. (Hint: its the stuff with at least 3 distinct languages)

ghurtado2mo ago

> basically no "specs" - just giving it coherent sane direction

This is one variable I almost always see in this discussion: the more strict the rules that you give the LLM, the more likely it is to deeply disappoint you

The earlier in the process you use it (ie: scaffolding) the more mileage you will get out of it

It's about accepting fallability and working with it, rather than trying to polish it away with care

phatskat2mo ago

To me this still feels like it would be a net negative. I can scaffold most any project with a language/stack specific CLI command or even just checking out a repo.

And sure, AI could “scaffold” further into controllers and views and maybe even some models, and they probably work ok. It’s then when they don’t, or when I need something tweaked, that the worry becomes “do I really understand what’s going on under the hood? Is the time to understand that worth it? Am I going to run across a small thread that I end up pulling until my 80% done sweater is 95% loose yarn?”

To me the trade-off hasn’t proven worth it yet. Maybe for a personal pet project, and even then I don’t like the idea of letting something else undeterministically touch my system. “But use a VM!” they say, but that’s more overhead than I care for. Just researching the safest way to bootstrap this feels like more effort than value to me.

Lastly, I think that a big part of why I like programming is that I like the act of writing code, understanding how it works, and building something I _know_.

1 more reply

prmph2mo ago

But, how do you know the code is good?

If you do spot checks, that is woefully inadequate. I have lost count of the number of times when, poring over code a SOTA LLM has produced, I notice a lot of subtle but major issues (and many glaring ones as well), issues a cursory look is unlikely to pick up on. And if you are spending more time going over the code, how is that a massive speed improvement like you make it seem?

And, what do you even mean by 10x the amount of work? I keep saying anybody that starts to spout these sort of anecdotes absolutely does NOT understand real world production level serious software engineering.

Is the model doing 10x the amount of simplification, refactoring, and code pruning an effective senior level software engineer and architect would do? Is it doing 10x the detailed and agonizing architectural (re)work that a strong developer with honed architectural instincts would do?

And if you tell me it's all about accepting the LLM being in the driver's seat and embracing vibe coding, it absolutely does NOT work for anything exceeding a moderate level of complexity. I used to try that several times. Up to now no model is able to write a simple markdown viewer with certain specific features I have wanted for a long time. I really doubt the stories people tell about creating whole compilers with vide coding.

If all you see is and appreciate that it is pumping out 10x features, 10x more code, you are missing the whole point. In my experience you are actually producing a ton of sh*t, sorry.

hirvi742mo ago

> But, how do you know the code is good?

Honestly, this more of a question about scope of the application and the potential threat vectors.

If the GP is creating software that will never leave their machine(s) and is for personal usage only, I'd argue the code quality likely doesn't matter. If it's some enterprise production software that hundreds to millions of users depend on, software that manages sensitive data, etc., then I would argue code quality should asymptotically approach perfection.

However, I have many moons of programming under my belt. I would honestly say that I am not sure what good code even is. Good to who? Good for what? Good how?

I truly believe that most competent developers (however one defines competent) would be utterly appalled at the quality of the human-written code on some of the services they frequently use.

I apply the Herbie Hancock philosophy when defining good code. When once asked what is Jazz music, Herbie responded with, "I can't describe it in words, but I know it when I hear it."

1 more reply

datavirtue2mo ago

Way better than the random India dev output. I seriously don't know what everyone around here is doing. All I see are complaints while I produce the output of ten devs. Clean code, solid design.

Spend a few hours writing context files. Spend the rest of the week sipping bourbon.

1 more reply

sameerds2mo ago

> I can still run 2-3 clients almost 24/7 pumping out features.

Honest question. How does one do that? My workflow is to create one git worktree per feature and start one session per worktree. And then I spent two hours in a worktree talking to Opus and reviewing what it is doing.

Planktonne2mo ago

> It's regularly writing systems-level code that would take me months to write by hand in hours, with minimal babysitting

Has your output kept pace with the code? Because months in hours means, even pushing those ratios quite far, to be years in days.

Has your roadmap accelerated multiple years in the last few months in terms of verifiable results?

pighive2mo ago

Curious to know what you are using $200/mo CC for? New applications? Business? What kind of application needs to run 2-3 clients run 24/7? How are you coming up with features. Just trying to understand the workflow.

kobe_bryant2mo ago

months you say? how incredible. it beggars belief in fact

hirvi742mo ago

Not sure about ChatGPT, but Claude was (is still?) an absolute ripper at cracking some software if one has even a little bit of experience/low level knowledge. At least, that's what my friend told me... I would personally never ever violate any software ToA.

buredoranna2mo ago· 8 in thread

> the whole thing being built on copyright infringement

I am not a lawyer, but am generally familiar with two "is it fair use" tests.

1. Is it transformative?

I take a picture, I own the copyright. You can't sell it. But if you take a copy, and literally chop it to pieces, reforming it into a collage, you can sell that.

2. Does the alleged infringing work devalue the original?

If I have a conversation with ai about "The Lord of the Rings". Even if it reproduces good chunks of the original, it does not devalue the original... in fact, I would argue, it enhances it.

Have I failed to take into account additional arguments and/or scenarios? Probably.

But, in my opinion, AI passes these tests. AI output is transformative, and in general, does not devalue the original.

taikahessu2mo ago

In order for LLM to be useful, you need to copy and steal all of the work. Yes, you can argue you don't need the whole work, but that's what they took and feed it in.

And they are making money off of other people's work. Sure, you can use mental jiujutsu to make it fair use. But fair use for LLMs means you basically copy the whole thing. All of it. It sounds more like a total use to me.

I hope the free market and technology catches up and destroys the VC backed machinery. But only time will tell.

ragequittah2mo ago

I always wonder if anyone out there thinks they're not making money off of other people's work. If you're coding, writing a fantasy novel, taking a photograph or drawing a picture from first principals you came up with yourself I applaud you though.

1 more reply

jjwiseman2mo ago

And in Bartz v. Anthropic, the court found that Anthropic training their LLMs on books was "highly transformative."

verve_rat2mo ago

The US is not the only legal jurisdiction these services are being sold in.

idiotsecant2mo ago

This is a tiresome and well trod road.

The fact of the matter is that for profit corporations consumed the sum knowledge of mankind with the intent to make money on it by encoding it into a larger and better organized corpus of knowledge. They cited no sources and paid no fees (to any regular humans, at least).

They are making enormous sums of money (and burning even more, ironically) doing this.

If that doesn't violate copyright, it violates some basic principle of decency.

michaelmrose2mo ago

You are assuming intellectual property has intrinsic basis when it's at best functional not foundational. It's only useful if the net value to society is positive which is extremely dubious.

1 more reply

Madmallard2mo ago

What in the mental gymnastics?

They just stole everyone's hard work over decades to make this or it wouldn't have been useful at all.

NewsaHackO2mo ago

That's a statement. The comment you are replying to had actual reasoning behind his claim. Do you have any actual reasoning behind yours?

1 more reply

j / k navigate · click thread line to collapse

0 comments

31 comments · 2 top-level

onlyrealcuzzo2mo ago· 21 in thread

> Or just don't use AI to write code.

Anecdata, but I'm still finding CC to be absolutely outstanding at writing code.

For $200/mo, I can still run 2-3 clients almost 24/7 pumping out features. I rarely clear my session. I haven't noticed quality declines.

mlinsey2mo ago

My anecdata is that it heavily depends on how much of the relevant code and instructions it can fit in the context window.

Basically "do something really hard on this one section of code, whose contract of how it intereacts with other code is clear, documented, and respected" is an ideal case for these tools.

onlyrealcuzzo2mo ago

> My anecdata is that it heavily depends on how much of the relevant code and instructions it can fit in the context window.

Agreed, but I'm working on something >100k lines of code total (a new language and a runtime).

It helps when you can implement new things as if they're green-field-ish AND THEN implement and plumb them later.

1 more reply

janalsncm2mo ago

I have my own anecdata but my comment is more about the dissonance here.

oefrha2mo ago

2 more replies

DennisP2mo ago

sarchertech2mo ago

One person is rigorously checking to see if Claude is actually following the spec and one person isn’t?

2 more replies

stevenicr2mo ago

I think it matters what the project and tech stack is, and how much you try to get done before starting a fresh chat.

I've had interesting chats where it explained that it's choice of tailwind for example was because it had a ton of training knowledge on it.

I've also had it try to build more in one chat than it should many times.

For some reason openai codex handles building too much without failing better - but that is total anecdata from my particular projects and ymmv.

Also running extra checks and cleanup later is a thing, that sure a human might have seen an obvious thing at time of build, but we have bigger memory context comparatively imho.

mojuba2mo ago

I think it depends on both the complexity and the quality bars set by the engineer.

From my observations, generally AI-generated code is average quality.

sameerds2mo ago

justinclift2mo ago

Note that one person is mentioning they use Claude Sonnet, which is less capable than the higher tiers (Opus, etc).

aforwardslash2mo ago

ghurtado2mo ago

> basically no "specs" - just giving it coherent sane direction

This is one variable I almost always see in this discussion: the more strict the rules that you give the LLM, the more likely it is to deeply disappoint you

The earlier in the process you use it (ie: scaffolding) the more mileage you will get out of it

It's about accepting fallability and working with it, rather than trying to polish it away with care

phatskat2mo ago

To me this still feels like it would be a net negative. I can scaffold most any project with a language/stack specific CLI command or even just checking out a repo.

Lastly, I think that a big part of why I like programming is that I like the act of writing code, understanding how it works, and building something I _know_.

1 more reply

prmph2mo ago

But, how do you know the code is good?

If all you see is and appreciate that it is pumping out 10x features, 10x more code, you are missing the whole point. In my experience you are actually producing a ton of sh*t, sorry.

hirvi742mo ago

> But, how do you know the code is good?

Honestly, this more of a question about scope of the application and the potential threat vectors.

However, I have many moons of programming under my belt. I would honestly say that I am not sure what good code even is. Good to who? Good for what? Good how?

I truly believe that most competent developers (however one defines competent) would be utterly appalled at the quality of the human-written code on some of the services they frequently use.

I apply the Herbie Hancock philosophy when defining good code. When once asked what is Jazz music, Herbie responded with, "I can't describe it in words, but I know it when I hear it."

1 more reply

datavirtue2mo ago

Way better than the random India dev output. I seriously don't know what everyone around here is doing. All I see are complaints while I produce the output of ten devs. Clean code, solid design.

Spend a few hours writing context files. Spend the rest of the week sipping bourbon.

1 more reply

sameerds2mo ago

> I can still run 2-3 clients almost 24/7 pumping out features.

Planktonne2mo ago

> It's regularly writing systems-level code that would take me months to write by hand in hours, with minimal babysitting

Has your output kept pace with the code? Because months in hours means, even pushing those ratios quite far, to be years in days.

Has your roadmap accelerated multiple years in the last few months in terms of verifiable results?

pighive2mo ago

kobe_bryant2mo ago

months you say? how incredible. it beggars belief in fact

hirvi742mo ago

buredoranna2mo ago· 8 in thread

> the whole thing being built on copyright infringement

I am not a lawyer, but am generally familiar with two "is it fair use" tests.

1. Is it transformative?

I take a picture, I own the copyright. You can't sell it. But if you take a copy, and literally chop it to pieces, reforming it into a collage, you can sell that.

2. Does the alleged infringing work devalue the original?

If I have a conversation with ai about "The Lord of the Rings". Even if it reproduces good chunks of the original, it does not devalue the original... in fact, I would argue, it enhances it.

Have I failed to take into account additional arguments and/or scenarios? Probably.

But, in my opinion, AI passes these tests. AI output is transformative, and in general, does not devalue the original.

taikahessu2mo ago

In order for LLM to be useful, you need to copy and steal all of the work. Yes, you can argue you don't need the whole work, but that's what they took and feed it in.

I hope the free market and technology catches up and destroys the VC backed machinery. But only time will tell.

ragequittah2mo ago

1 more reply

jjwiseman2mo ago

And in Bartz v. Anthropic, the court found that Anthropic training their LLMs on books was "highly transformative."

verve_rat2mo ago

The US is not the only legal jurisdiction these services are being sold in.

idiotsecant2mo ago

This is a tiresome and well trod road.

They are making enormous sums of money (and burning even more, ironically) doing this.

If that doesn't violate copyright, it violates some basic principle of decency.

michaelmrose2mo ago

You are assuming intellectual property has intrinsic basis when it's at best functional not foundational. It's only useful if the net value to society is positive which is extremely dubious.

1 more reply

Madmallard2mo ago

What in the mental gymnastics?

They just stole everyone's hard work over decades to make this or it wouldn't have been useful at all.

NewsaHackO2mo ago

That's a statement. The comment you are replying to had actual reasoning behind his claim. Do you have any actual reasoning behind yours?

1 more reply

j / k navigate · click thread line to collapse