We have developed programming languages because coding in machine language is horrible, and over the decades we've refined them into tools people can use fluently and just directly think in code when they have to make a computer system behave in a certain way.
Only someone who has never built anything of significant complexity and utility can think that putting natural language encoding between you and the bytes is a net positive.
The hype should not be around replacing the typing, but in assisting your thoughts.
When you code, there's the dialog in your brain which thinks about the code and also creates the questions which you know you must answer in order to then transition to the dialog with the machine, that is, to type code.
And in this first part LLMs can be extremely useful, which will come to the point where you select a line, then explain your intent, and while the AI retrieves documentation and possible solutions, you can reason about the problem and then pick and choose from what the AI has collected for you.
> Only someone who has never built anything of significant complexity and utility can think that putting natural language encoding between you and the bytes is a net positive.
The question is then if assistants are "sitting next to you", like a secretary and a mentor, or if they are sitting between you and the editor, as the thing you need to control.
An assistant can be a really effective refinement in the programming process. Even so far that it ends up motivating you instead of you constantly getting demotivated due to hitting the wall of "not another problem that I need to solve before I can really continue" (which happens all to often).
I think it’s really good with the 101-level academics side. Learning the basics of anything through a conversational manner can be massively helpful.
As soon as your situation exceeds textbook level, I’ve found them to always be a waste of my time, and nothing I’ve seen as of late makes me think they’re trending in a direction to be helpful in this scenario
Ala - I need to write this unit test, it has these checks, it validates these methods.
Or write a log message for me about what error got encountered here. Those are annoying to write out, but often the llm has enough context that I just start to write and it completes it appropriately.
All of these are things I can easily do myself, are easy to validate correctness, but if I were to write them would consume my limited mental energy for the day.
Assistant should not help you thinking, any AI agent/tool should be doing what you want with minimal amount of explanation.
Only way I accept current hype is if I am able to type in "make a Twitter clone" it does the implementation, I can run it, I write "make it red, silver and yellow color themed" and it does just that. I am the one doing thinking here - I don't care about technical details. That should be state of art.
I can write my own Twitter clone and if I have to write prompt after prompt it is going to take me more time and more typing so it is useless.
A person that cannot write their own Twitter clone is not going to prompt their way out to having working and deployed Twitter clone.
> When all is said and told, the "naturalness" with which we use our native tongues boils down to the ease with which we can use them for making statements the nonsense of which is not obvious.
Although this is obviously not about LLMs, its astonishing how many parallels can be drawn to today's usage of AI systems.
1: https://www.cs.utexas.edu/~EWD/transcriptions/EWD06xx/EWD667...
That, the use of formal languages - although an evil (since it is not a natural human language) is essential to avoid nonsense (ie hallucinations). While intuition/natural language is more imaginative, formalism (ie narrow interfaces) is a forcing function to make things work.
According to Dijkstra, the use of natural languages regressed civilization by thousands of years (because of the nonsense and imprecision) ! So, expect thousands of years of LLM hell if we adopted it to replace our formal languages.
Indeed, math and other symbolism/formalism is a crowning achievement of humans.
Trusting an AI to code an app from start to finish seems crazy to me but hey... if some people can pull it off, good for them I guess.
Where ai starts breaking down is how to effectively incorporate a new feature in a complicated existing codebase. That is where us engineers can continue to hold an advantage.
Claude 3.5 did it from the first shot.
I literally just spent a full week on such a project. Respectfully, fuck people who don't read the docs/spec.
Claude 3.5 is pretty good at giving you the right hints though. If you aren't familiar with a library it's definitely faster than grepping through docs. If you are an expert in a library than it's pretty useless.
AppleScript might convince me of your argument.. ;-) But, seriously, we've been putting abstractions between "us and the bytes" ever since Fortran and COBOL appeared (and indeed, earlier). We can argue about the quality and expressiveness of those abstractions, and there are a lot of arguments against natural languages in this task, but the broad idea of putting things in between developers and machines is sound so it's worth continuing to explore IMHO.
If you have an LLM between you and the code, there is not even such a thing as ”source code”, only a history of prompts. You can’t check in your prompts in git, and re-generate the same code later.
In fact, it’s more like the antithesis of the reproducible builds movement. It’s introducing a proprietary networked high latency chaos agent into the critical path.
I am extremely LLM-optimistic though and largely in favor of abstractions, so that fuels my viewpoint. I still remember my dad, an embedded developer in the 90s-00s, ranting about how many people were starting to use 'inefficient and unpredictable' C compilers than whatever assembly he was using. I reckon he'd be appalled to learn that now even assembly isn't always a reliable model of what's really happening on the CPU thanks to microcode and optimizations.. ;-)
Well (and amusingly) said. The same (or at least a very similar) problem exists at the other end of the pipeline, i.e., whenever a user has to use a natural-language interface to get software to do something they want. Are we really going to tell our AI assistants to take complex actions on our behalf in the real world, and then just sit back? Are we really going to do this when money is involved?
"Have a nice day!" can mean many things (an insult, or sincere, for example).
We already put natural language between us and the bytes. Hence why most keywords and variable names (a hard part of computer science) are in simple English and it is considered a net positive.
Since then, I spent a week trying to get cursor to work, and after dealing dealing with all the bugs, and restarting the composer each time with a new prompt, was able to get what I would consider a quality output for a moderately complex app (a parimutuel betting market).
The issue isn’t that LLMs are terrible, it’s the software like cursor is buggy and poorly written.
It should know that I don’t want to use code from an old version of the library I am using because the new library I am using is already in my projects dependencies.
It should let me set up preferences for different programming languages. And preferences for all programming languages.
So when I give it a prompt, it looks at the dependencies and language rules I already have set up, adds those to the prompt and produces the quality output I’m seeing now without me having to manually specify all those things.
Short version: LLMs rule the software is just shitty.
There is the story that von Neumann flew off the handle the first time he saw an assembler.
>>How dare you waste compute cycles on this frivolity? Just use machine code like everyone else.
No, AI is a shitty tool that has yet to prove its utility. Autocomplete works by analyzing the official API and interface, it's completely different than AI which hallucinates meaning between words and also stuff that it was fed before it met you.
> variable names (a hard part of computer science)
Naming is for software engineering, not CS. One more confusion by people who want to sell us AI at all cost.
But modern AI tools are far beyond "auto complete". (I actually turn off those in-line completions, I feel they ruin flowstate). The tools now are fully prompted, with multi-file editing, with full codebase context, with web/search and doc integration, and for "on the rails" development are producing high quality code for "easier" tasks.
These modern models and tools can solve nearly every single leet code problem faster than you. They can do every single Advent of Code problem likely 10X-100X faster than you can.
In my professional, high standards, very legal and contract driven web app world, AI tools are still very useful for doing "on the rails" development. Is it architecting entire systems? No of course not (yet). Is it emulating existing patterns and extending them for new functionality 10X faster than a Jr or Mid? Yes it is. Is it writing nearly perfect automated tests based on examples? Yes it is. It is scaffolding new ideas and putting down a great starting point? Yep. And it's even able to iterate on featurework pretty well, and much faster than Jr/Mid.
The kind of work I'd give to a Jr/Mid and expect to take 2-3 days before they need serious feedback up and down the change, these AI are doing in about 30 seconds, maybe 90 seconds if you need to iterate a few times on the prompt.
I get that "AI" is a buzzword that is pumping valuations and making business people see $$$.
But coding assistants are not that. For many programmers, they are quickly becoming valuable tools that do in fact speed up development.
You can (and should) give the AI access to your existing codebase and any relevant documentation to use as context if you want good results. If you give the AI zero context for the problem it is trying to solve, of course it will struggle. If you give it all the necessary context, it will do much better.
I've found that just uploading the documentation of the API or library you are working with before asking the AI questions about it makes a huge difference in the quality of its output.
>Naming is for software engineering, not CS.
I figured they were referencing the “two hard problems of computer science”, those two being naming things, cache invalidation and off by one errors.
Everybody knows the hardest problems in software engineering are assembling promo packets and building consensus on number of spaces per indent.
As I'm not a native English speaker, I disagree. I learned programming long before I got decent in English, and even today I just consider the English keywords in programming languages to be some "abstract mathematical concept" that by mere coincidence is named after some real, existing English word. Even today, being somewhat decent in English, I stil think this way when I see program code.
I actually would insist that this is a much more useful way to think about good programming, since this way you have no difficulties to ask yourself all the time whether it would make sense to replace some "English-named" concept by something more useful, but which has no analogue in the English language (or any other natural language).
"Natural language" is about far more than individual words.
But well I guess there's a bright side to see here: those LLMs applied to software development might become the new Genexus and there are gonna be plenty of open positions for humans to rewrite entire systems in a not so far future.
Unless OP is also willing to claim that all the people who are working on LLM dev tools are frauds, and act against their better knowledge, OPs claim is obviously false. The entire premise that the people who build these tools operate under is that natural language "between you and bytes" can be a net positive.
[1] https://addyo.substack.com/p/the-70-problem-hard-truths-abou...
So is this just an illusion they create, or is it really possible to build software with AI, at least at a mediocre level?
I'm looking for open source projects that were built mostly with AI, but so far I couldn't find any bigger projects that were built with AI coding tools.
What you have to do, and what AI cannot do well, is to decide where in the codebase to put those functions, and decide how to structure the code around those functions. You have to decide how and when and why to call each of those functions.
Then you are a much better developer than me (which you may very well be). I'd like to think I'm pretty good, and I've many times spent hours trying to think through complex SQL queries or getting all the details right in some tricky equation or algorithm. Writing the same code with an AI often takes 2-20 minutes.
If it's faster for me, it might not be faster for everybody, but it is probably faster for many people.
This 100%. In my experience (ChatGPT - paid account), it often causes more problems than it solves when I ask it to do anything complex, but for writing functions that I describe in simple English, much like your example here, it has been overall pretty amazing. Also, I love asking it to generate tests for the function it writes (or that I write!). That has also been a huge timesaver for me. I find testing to be so boring and yet it's obviously essential, so it's nice to offload (some of) that to an LLM!
For the simpler cases I think prompting still took about as long as just writing the damn thing myself if I was familiar with the language.
The coding I have found it useful for is small, self contained, well defined scripts in bash where the tedious part is reminding myself of all of the command switches and the funky syntax.
There is no one out there building real, marketable production apps where AI "codes everything for them". At least not yet, but even in the future it seems infeasible because of context. I think even the most pro-AI people out there are vastly underestimating the amount of context that humans have and need to manage in order to build fully fledged software.
It is pretty great as a ridealong pair programmer though. I've been using Cursor as my IDE and can't imagine going back to a non-AI coding experience.
That attention does not map well to the important, hard, and more valuable parts of development.
Anecdotally, I still find it to be useful and it’s improving. I do think it’s going to be an huge impact in time.
Hype is part of the industry and it can be distracting to users, developers, and investors BUT it can also be useful (and I don’t know how to replace it) so, we live with it.
Made almost entirely with Cursor and Claude 3.5 Sonnet.
11k lines of C and counting.
https://github.com/jsonresume/jsonresume.org/pull/176
Meets my good enough standards fo sure
Here is some feedback:
There are a bunch of libraries that need to expose an http API. There is a niche for providing an embeddable http server that comes batteries included with all the features such as rate limiting, authentication, access control, etc. Things that constantly have to be reimplemented from scratch, but would not warrant adding a large framework by themselves.
That's where I think the idea of a "WebDSL" would shine the most.
I also don't see myself writing an entire webapp with this either - perhaps small sites or simple API endpoints?
I was mainly scratching an itch I've had for a couple of years. I also really like tuning C code just for the fun of it!
When I am working on something niche, it does not help either. I have tried to make it build modern UI applications for myself using modern Java, but it just can't. It hallucinates libs and functions that does not exists, and I cant really get it to produce what I want. I have had better experiences with languages that are simpler and more predictable (Go), and languages with huge amounts of learning material available (Typescript / React). But I have been trying to build open source UI apps in JavaFX and GTK, it just can not help me when I am stuck
Basically, it worked, but not without issues:
- The biggest issue was debugging: because the bugs appeared in XCode, not Cursor, it either meant laboriously describing/transcribing errors into Cursor, or manually fixing them.
- The 'parallel' work between Cursor and XCode was clunky, especially when Cursor created new files. It took a while to figure out a halfway-decent workflow.
- At one point something screwed up somewhere deep in the confusing depths of XCode, and the app refused to compile altogehter. Neither Cursor nor I could figure it out, but a new project with the files transferred over worked just fine.
But... after a few short hours' chatting, learning, and fixing, I had a functional app. It wasn't free of frustrations, and it's pretty far from the level where a non-coder could do the same, but it impressed me that it's already at the level where it's a decent multiplier of someone's abilities.
For example, my brother. He is what I'd refer to as 'tech-aligned' - he can and has written code before, but does not do it for a living and only ever wrote basic Python scripts every now and then to help with his actual work.
LLM's have enabled him to build out web apps in perhaps 1/5 of the time it would have taken him if he tried to learn and build them out from scratch. I don't think he would have even attempted it without an LLM.
Now it doesn't 'code everything' - he still has to massage the output to get what he wants, and there is still a learning curve to climb. But the spring-board that LLM's can give people, particularly those who don't have much experience in software development, should not be underestimated.
Those claims are about being able to create a profitable product with 10x efficiency.
But this is still huge, and shouldn’t be disregarded.
- You have zero engineering background and you use an LLM to build an MVP from scratch. As long as the MVP is sufficiently simple there is plenty of training data for LLM to do well. E.g. some kind of React website with a simple REST API backend. This works as long as the app is simple enough, but it'll start breaking down as the app becomes more complex and requires domain-specific business knowledge or more sophisticated engineering techniques. Because you don't understand what the LLM is doing, you can't debug or extend any of it.
- You are an experienced developer and know EXACTLY what you want. You then use an LLM to write all the boilerplate for you. I was surprised at how much of my daily engineering work is actually just boilerplate. Using an LLM has made me a significantly more productive. This only works if you know what you're doing, can spot mistakes immediately, and can describe in detail HOW an LLM should be doing the task.
For use cases in middle, LLMs kind of suck.
So I think the comparison to a (very) junior engineer is quit apt. If the task is simple you can just let them do it. If the task is hard or requires a lot of context, you need to give them step by step instructions on how to go about it, and that requires that you know how to do it yourself.
But once a project has more than 20 source files, most AI tools seem to be unable to grasp the context of the project. In my experience AI is really bad at multi threading code and distributed systems. It seems to be unable to build its "mental model" for those kind of problems.
But threading and asynchronous code are implicit - there's a lot going on that you can't see on the page, you need to think about what the system is actually doing rather than simply the words to make it do the thing.
I asked him to show me his process[1] after trying my hand (20 year, principal) and noticed a big difference in how we used AI: I instruct the AI how to code, he asks the AI to fix problems. In other words, I have a tendency to look at the code and ask the AI to fix it in more specific and direct ways that I want it fixed. On the other hand, if something doesn't work, my friend will copy/paste the error to the AI directly out of the dev tools console and ask the AI to fix the error. The two approaches are totally different.
My lesson here is that you're not meant to debug AI generated code; hand the error off to the AI and let it fix itself. I think if you're debugging AI generated code, you're doing AI generated code wrong. If you're an experienced dev picking up AI coding, I think you need to shift your mindset entirely. Ideally, someone out there will just create a closed loop where the AI can fix itself when it finds an error (integrate some browser and autonomous test loop into Cursor, for example, and let it fix its own errors).
Conclusion: if you're going to use AI to code, commit to it and use AI to fix the errors as well. Use AI for every aspect of it.
[0] Yes, I'm sure there are security holes and code issues galore, but those can always be fixed later when he's proven the business model.
[1] Yes, I have told him that he should create a YT channel or stream on Twitch because the content itself is super interesting how well he's been able to use AI.
The biggest risk to a startup is that you get the business model wrong or you don't ship code, even if it's the code is buggy and messy.
So maybe he was lucky or he is using a very good LLM I'm not aware of.
Unfortunately, that’s the most common kind of software in the saas industry anyway.
No better or worse than hiring cheap offshore contractors to do the same, IMO.
However, usually after three or four of those kind of fixes, I can walk it back to the starting point before the initial error, and I now know how to prompt it to produce correct code, because I now have a better mental model of how the thing is supposed to work.
This has been super helpful in my process of learning new things, as well as relearning things I haven’t worked with in a while.
It's not impossible to fix later. But it's often more effective to scrap and rewrite. Hopefully your proven business model has yielded enough money for that, before someone else has pwned it.
> must be some value
I saw that with users asking VBA code to be generated by people trying to automated part of email and excel work.
Also, it may be the case that the corpus of training data with VBA is not as good as it is with React these days.
Maybe the language your friend is using has more examples for training, or perhaps the dynamism of some languages get it to runtime errors that have better details it can work with.
I also tried it and the biggest issue I ran into is that I'm very specific about what I want. I wanted to use `nanostores` for state and routing. Problem is that the LLM keeps using code from `react-router` instead of `@nanostores/router`. As soon as I point it out, the LLM fixes it, but the first pass code generation is almost always wrong, even using an instruction file (as documented in both Cursor and GH Copilot).
That's when I realized that we are using the AI in two totally different ways: he simply doesn't care about the implementation, prop drilling, any of the technical details. None of that matters to him except that when "this button is clicked, that action happens". So however complex or inefficient or imperfect the code is, he doesn't care whereas I still have a tendency to read the code and try to ask the AI to do it in specific ways.
Doesn't this exist yet? It's such an obvious idea I'd be astonished if no-one has done it.
Code gen -> show the AI an example of how it's supposed to work -> error -> code gen -> AI tries it again by itself -> Code gen
This is only the case for new projects which don’t yet have users. Add users to even the simplest project and it evolves into a special snowflake with never before seen edge cases.
That’s why low code solutions are great for prototyping but eventually always explode into a nightmare of complexity.
To this day, no LLM that I tried passed this task of leading the development while detecting the underlying structure of the data.
I tend to restart chats from the beginning pretty much all the time, because of this.
I wonder how much better or worse things would get, if we took the human factor out of the loop. Give the LLM the ability to run tests and see the results, then iterate on its own output and branch off with different approaches, gradually increase the temperature etc.
Maybe it’d turn out that you need 10 LLMs running in parallel for an hour to fix something, or perhaps even a 100 would never stumble upon a solution for a particular type of problem. And even then I wonder, whether it’d get better if you fed it your entire codebase or the codebases of the entire libraries or frameworks that you use (though at that point you’re either training it yourself or are selectively finding and feeding the correct bits not to exceed the context).
A bit like traditional autocomplete can help streamline familiarising oneself with various libraries, a clear step ahead when compared to just needing to dig through documentation as much.
Maybe there’s a class of code problems that LLMs can be decent at solving, given the ability to iterate, verify solutions and what works or doesn’t, perhaps with 10x more compute than is utilized in the typical chat mode of interaction though.
I've gone 2-6 steps down a path before realizing this isn't going to work or the LLM is stuck in a loop. I just hard reset back to the first commit in that chain and either approach the task differently or skip it if it wasn't really that important.
It's a tool not a person. When was the last time you got mad at a hammer for being smug?
Mine always thinks it nailed it on the first try, and it's pretty hard-headed when you point out mistakes.
If you can't work around those limitations, you're screwed.
True AI should be capable of comprehending problems and devising its own solutions, rather than merely generating statistically likely outputs. Until AI reaches that level of cognitive ability, its applications in the real world remain limited, and much of what we see today is largely hype.
Tokenization and embeddings merely help models predict the most probable next token, a process that is executed at scale using vast computational resources. This is not intelligence but large-scale probabilistic prediction. The terminology used in computer science, especially in recent years, can often be misleading.
I never expect some magic "understanding" to ever arrive, but doing remedial pattern matching is already a hugely valuable power that frees up humans to do more interesting work. This is how I use current AI - spitting out 5 line functions I could spend 5 minutes writing that he can do in 3 seconds and take me 10 seconds to review. Like "check for circular references" or "use Django ORM to write a query for all categories that have this flag for users that have this permission".
It doesn't "write the app" or solve difficult problems for me (unless it is some configuration issue). I can paste in a error code and save myself a few minutes of manual debugging. If I add a new parameter to a function it prefills the correct type definition and things like that. These are all micro-improvements but add up to a lot of saved time. Some people have success with editing across files but I rarely even try that - it excels at solving discrete, repeatable bits of work with tidy solutions so I use it for that.
Until AI can return "I don't know" or, better, "did you want it this way or that way?" it will be severely limited. Yes, it acts like a junior dev in some ways, but a junior dev that never asks any questions, which is not the junior dev you ever want to give important work.
It's hard to predict what it will look like. I could write both utopian and dystopian narratives and I can pretty much guarantee they'll both be wrong. Not "in the middle" but something unexpected, the way nobody predicted cat videos or doomscrolling.
But you are almost certainly right that we will not be the inheritors.
That is the part that won’t actually happen, at least pretty quickly.
Except I'd add that as one gets experience working with the AI I can only assume they'd get much better at making it go smoothly. For example, I wouldn't manually rewrite localhost, I'd tell the AI "Why is localhost everywhere? Will this worker if I deploy to a droplet?" and it will fix it for you.
Also I just paste error-messages directly into the AI and it usually knows how to fix them.
Sometimes it's net positive, sometimes it's net-negative due to creating a mess that's really hard to get out of or debug. But I imagine it's only a matter of time until the scopes in which it's cost-effective go up.
I don't like that AI is a threat of huge monopolistic and job-reducing potential, but I don't think downplaying it is a long-term strategy to combat that.
The solution is multi occur (emacs), quickfix list (vim), or any editors that have whole project find and replace.
> I just paste error-messages directly into the AI
...
On HN especially, that’s really nothing novel, many of us have (including me) and the only thing that it takes to get into one as a software engineer is memorizing the solution to coding problems.
When I’m hiring - mostly for green field initiatives - coming from BigTech is usually a negative signal for me.
Where the author went wrong in this post is that he tried to interpret an error ("I was asking claude to solve the wrong problem"), was wrong, and then wasted a lot of his own time.
I really think it's best practice when describing a problem to anybody that you start with what you observe and then if you want to hint your suspicions you call those out afterward as such. If you're very confident the LLM is going down a wrong path, you can ask it things like "How would I test the theory that environment variables aren't set in my docker container?"
It's also about responsiveness. LLMs produce junior-level quality of code at a rate of hundreds of lines per minute. I need it to produce enough to spot where it's completely wrong as quickly as possible to I can change the prompt.
It's like a edit-compile-run cycle which you also need to be fast or you lose attention.
I was tempted to say it's another _step_ in the edit-compile-run but often the code is so bad I don't even bother compiling.
Why is this just like the last cost cutting exercise where the cheapest people in India produced a lot of "interesting" code.
The variables, functions and so on had names like:
a aa aaa b bb bbb
It helped me to grasp the basic concept, but was kinda hard to follow, tho. :D
The number of flops a gpu can output on the other hand does.
It makes me think many people haven't taken the time to actually learn to use the tool.
It just feels like they tried Copilot or ChatGPT for 5 minutes last year and concluded that all LLM's are useless and will be useless forever.
It makes me wonder if those people know that Claude 3.5 sonnet projects and/or Cursor with Claude exist?
Do they not appreciate some help to document their code? Do they never need to write or quickly understand scripts or code in one of the 100's of languages/stacks they're not too familiar with that they might encounter in the wild? How to get out of yet another git mess? Build a proof of concept in an hour that would've taken you days? A refresher on how to set up x toolchain to get started asap (the nr 1 hardest thing in programming :p) etc etc.
How does an LLM help there? What the code does should be obvious by looking at it, WHY it was written that way is the interesting question. Answering it often requires more context and domain knowledge.
> Do they never need to write or quickly understand scripts or code in one of the 100's of languages/stacks they're not too familiar with that they might encounter in the wild?
I'd rather take the time to do it myself because if I'm not familiar with a language/stack I won't be able to spot mistakes made by the LLM as easily.
> How to get out of yet another git mess?
Learn to solve the git issue and apply the knowledge in the future so you don't rely on yet another tool.
> Build a proof of concept in an hour that would've taken you days?
I question the premise.
> A refresher on how to set up x toolchain to get started asap (the nr 1 hardest thing in programming :p) etc etc.
How often do you do that? I think it's worth spending the time to do it yourself so you get an understanding of what exactly you're doing there. When you're done you can document the process and come back to it next time.
And what I'm saying is: that's exactly what LLM's are super useful for.
To answer your last question: about every 6 months or so. I'm a freelancer, I do a new project for a new client every 6 months on average. All of their toolchains, build systems, OS of choice for the dev machine, OS of choice for the SoC, documentation methods, PCB design tools, version management systems, release systems, testing frameworks are completely different per client and change constantly (even within the same company) depending on department and moment in time.
Despite my broadly positive view on usefulness of LLMs, I do not think they are good enough (yet) to build a full system from scratch without an expert supervisor. This should not IMO be used as a 'proof' they are dumb autocompleters.
I feel like I'm living on another planet when I see this point. I have almost never in my career encountered the situation where actually typing out the code is the time consuming part. The time consuming part is knowing what code you want to write, running it in a variety of circumstances to gain confidence that it's correct, and iterating when it isn't.
Please don't think I'm saying you're wrong by the way—if anything this just shows how diverse programming can be as a career. But I see this point raised a lot and it doesn't match my experience at all.
They have open source libraries, stack overflow, tutorials, documentation, simple code generator tools and snippets.
The speed up we’re seeing is from LLMs basically caching all those things into a huge mathematical model and retrieving information in summarized form ready for consumption.
And while speed is always nice, LLMs are expensive, require maintenance themselves to maintain relevant context, are still error prone, and terrible at true innovation.
In a few years we’ll be talking about the big “AI crash” and “what went wrong” when it has been obvious to experts all along. Winter is coming.
Until then its just nonsense pretending to be something else...
I made a saying about this some weeks ago: "A.I. can make the road for you, but you have to know where you are going". In Greek it sounds a little bit better.
Also code is the truth, but it is not the only truth. The underlying computer, the network infrastructure and other things have an effect on the code. So, there could be a saying in addition to the first: "A.I. can make the road for you, but you have to test the road".
At some point AI will probably be good enough that this won’t matter. But it feels like we’re still a long way off that.
Human language, used to convey ideas to other humans, is imprecise. It's fine that it's imprecise because the media (humans) have both good error correction and a reasonable set of global defaults.
Computer languages require enormous precision because they're some mechanical translation to a set of machine code runtime.
Perhaps you can train an LLM on lots of code, and it'll find semantic relationships between some clever code it's been trained on to and your specific request. Perhaps not, and it'll just give a dumb answer or an incorrect answer, (ideally some code copilot will actually try running the candidate answer code against your specific ask?) -- but once the answer gets complex you run into the "it's much harder to debug code than write it, so don't write code that's almost too complex for you to understand" problem.
At work, I constantly have to remind people "don't use math data structures for identities" "but int is smaller" "Are you ever going to want the 95th percentile customerID?" "no that's silly" "then it isn't a number". Or I get to constantly remind people "a string with lots of curly braces and quotes isn't necessarily json; if you're not using a serialized API and just sending bytes to stdout someone else has to parse it" "but I'm using a logging library" "does anything else ever send stuff to stdout while your logging library is running?" "oh yes, we're going to open a ticket to debug that." So I'm not optimistic that running code written by a machine is long-term viable.
That said -- there are situations where machine generated code works -- I think it's been a long time since anyone manually drew masks for etching dies when making CPUs.
The key issues here were staying on top of the AI's help.
Use AI wisely: as an assistant, not as a drunken lead developer.
One of the interesting things about OpenHands is that you can see what the AI is doing in the terminal window where you launched it. Since it can't really load the whole codebase into its context window, it does a lot of greping files, showing 10 lines on either side of the match, and then doing a search and replace based on this. This is pretty similar to what a human might do: attempt to identify the relevant function and change it.
I think I might have better luck with a simpler project, e.g. a Sinatra or Flask app where each route is relatively self-contained. I might give it or Cursor another try in the future when the tech has progressed a bit.
Seriously. It seems stupid. But AI works a lot better with a written spec.
The incredible thing is that the AI can actually be an excellent resource for writing the spec. And it will actually produce better code when you feed the spec back into said AI!
The current generation of AI seems to have fooled a lot of people into thinking that somehow you can jump straight to coding. (Well, you can, and it will probably work if you want to make something small or limited in scope.) Not so!
But, on the bright side, it’s just as good at design as code if you ask the right questions!
I say this having used 4 and 4o extensively in this manner. Just started using sonnet3.5 in this way in the last month or so, and it is amazing at this.
Once the quality of training data improves(somehow getting access to high quality codebase behind corporate walls by promoting these assistants and ingesting the codebase), the output improves.
There was a popular saying, garbage in garbage out.
The pitch:
AI generates tons of plausible-looking garbage Static types catch garbage at compile time OCaml/F#/Haskell fans quietly sipping tea in the corner
The irony? We spent years debating static vs dynamic typing for human developers. But the killer use case may ended up being catching AI hallucinations.
Finally, a business case for monads that doesn't require a PhD!
Time to dust off those Haskell books. Who knew safety could be so profitable? Plot twist: Category theory becomes a required interview question by 2025
To make it short, it got better when I made a project, uploaded the headers and docs of it as project files and moved my chat into that project as well.
That said, AI can help you but needs a lot support from you to do things somewhat right.
HN fell for it hard - 156 points, 180 comments (as of this writing).
Well done Nick! :) And congrats on launching Codescribble! Hope to see a "how my post on AI grew my userbase" followup in a few weeks!
What is this supposed to produce other than a mass of bugs and vulnerabilities? "A.I." is utter garbage and always will be, it is foolish to think otherwise.
You also have to get a good feel for when it’s best if you make a change vs the LLM. Aider doesn’t handle new files and moving around massive chunks super well. It can do it but if I want to rename someone everywhere or break out components/types/etc into different files then I know I should be doing that in my IDE myself. Same for little syntax errors when a diff the LLM makes isn’t quite right.
I spent a few nights last week using LLMs to help build a chrome extension to match my Amazon transactions with my YNAB transactions for the purpose of updating the memo field in YNAB with the item names I bought from Amazon to speed up my categorization and serve as history of what I bought (previously I did this whole process manually). I think it really helped and made the whole process go much faster.
It really excels (for me) in UI. I’d like to think I’m pretty competent at writing code/logic but I’m not great at UI. In many projects I get bogged down when it comes to UI. If I get stuck coming up with a UI or I don’t like how something looks I can lose motivation to continue forward on it. With Aider I can ask for UI and while it might be abhorrent to a designer I think it looks pretty damn good (better than what I could do) and lets me focus on the logic. Aider also lets me try radical changes knowing I can easy reset back a few steps if it doesn’t work out.
I’ve said many times at work that a huge power of LLMs is taking something that would take 30-60min down to <5min, specifically around things like little scripts to investigate a problem or get more details. For example, I might have a log that I can see there is data in that I want to extract. I know I can write a chained/piped command of sed/awk/grep/cut/sort/uniq/etc but it’s going to take some trial and error as well as time. With an LLM I can bang out the full command in 1-3 exchanges.
Same deal with visualizing some piece of data in the logs (note: yes, we use Prometheus/Grafana but not everything can go in there and for new bugs/issues in the field I’m normally dealing with something we haven’t seen before and thus haven’t setup monitoring/alerting on). I’ve had LLMs churn out simple HTML/JS/CSS files that I can feed data into “graph all instances of this happening if X > Y and time is between A and B, etc”.
Again, I can write this stuff from scratch but often don’t do it in practice because the ROI isn’t guaranteed. In the middle of a production issue do I want to waste 10-30+ min writing the script to see if I can prove a theory? No, it’s not worth it if it doesn’t pan out, but if I’m using an LLM and it takes me less than five minutes then I can throw a lot more stuff at the wall to see if it sticks.
1. It’s fun to use it to try unfamiliar languages and frameworks, but that exponentially increases the chance you get firmly stuck in a corner like OP’s deployment issue, where the AI can no longer figure it out and you find yourself needing to learn everything on the fly. I use a Django/Vue/Docker template repo that I’ve deployed many production apps from and know like the back of my hand, and I’m deeply familiar with each of the components of the stack.
2. Work in smaller chunks and keep it on a short leash. Agentic editors like Windsurf have a lot of promise but have the potential to make big sweeping messes in one go. I find the manual file context management of Aider to work pretty well. I think through the project structure I want and I ask it to implement it chunk by chunk—one or two moving pieces at a time. I work through it like I would pair programming with someone else at the keyboard: we take it step by step rather than giving a big upfront ask. This is still extremely fast because it’s less prone to big screwups. “Slow is smooth and smooth is fast.”
3. Don’t be afraid to undo everything it just did and re-prompt.
4. Use guidelines—I have had great success getting the AI to follow my desired patterns, e.g. how and where to make XHRs, by stubbing them in somewhere as an example or explicitly detailing them in a file.
5. Suggest the data structures and algorithms you want it to use. Design the software intentionally yourself. Tell it to make a module that does X with three classes that do A, B and C.
6. Let the AI do some gold plating: sometimes you gotta get in there and write the code yourself, but having an LLM assistant can help make it much more robust than I’d bother to in a PoC type project—thorough and friendly error handling, nice UI around data validation, extensive tests I’m less worried about maintaining, etc. There are lots of areas where I find myself able to do more and make better quality-oriented things even when I’m coding the core functionality myself.
7. Use frameworks and libraries the AI “knows” about. If your goal is speed, using something sufficiently mainstream that it has been trained on lots of examples helps a lot. That said, if something you’re using has had a major API change, you might struggle with it writing 1.0-style code even though you’re using 2.0.
8. Mix in other models. I’ve often had Claude back itself into a corner, only to loop in o1 via Aider’s architect mode and have it figure out the issue and tell Claude how to fix it.
9. Get a feel for what it’s good at in your domain—since I’m always ready to quickly roll back changes, I always go for the ambitious ask and see whether it can pull it off—sometimes it’s truly amazing in one shot! Other times it’s a mess and I undo it. Either way over time you get an intuition for when it will screw up. Just last week I was playing around with a project where I had a need to draw polygons over a photograph for debugging purposes. A nice to have on top of that was being able to add, delete, and drag to reshape them, but I never would have bothered coding it myself or pulling in a library just for that. I asked Claude for it, and got it in one shot.