1. Snake case to camelCase. Even without AI we can already complete these tasks easily. VSCode itself has command of "Transform to Camel Case" for selection. It is nice the AI can figure out which text to transform based on context, but not too impressive. I could select one ":, use "Select All Occurrences", press left, then ctrl+shift+left to select all the keys.
2. Generate boilerplate from documentation. Boilerplate are tedious, but not really time-consuming. How many of you spend 90% of time writing boilerplate instead of the core logic of the project? If a language/framework (Java used to be, not sure about now) requires me to spend that much time on boilerplate, that's a language to be ditched/fixed.
3. Turn problem description into a block of concurrency code. Unlike the boilerplate, these code are more complicated. If I already know the area, I don't need AI's help to begin with. If I don't know, how can I trust the generated code to be correct? It could miss a corner case that my question didn't specify, which I don't yet know existing myself. In the end, I still need to spend time learning Python concurrency, then I'll be writing the same code myself in no time.
In summary, my experience about AI is that if the question is easy (e.g. easy to find exactly same question in StackOverflow), their answer is highly accurate. But if it is a unique question, their accuracy drops quickly. But it is the latter case where we spend most of the time on.
It’s kinda like having a really smart new grad, who works instantly, and has memorized all the docs. Yes I have to code review and guide it. That’s an easy trade off to make for typing 1000 tokens/s, never losing focus, and double checking every detail in realtime.
First: it really does save a ton of time for tedious tasks. My best example is test cases. I can write a method in 3 minutes, but Sonnet will write the 8 best test cases in 4 seconds, which would have taken me 10 mins of switching back and forth, looking at branches/errors, and mocking. I can code review and run these in 30s. Often it finds a bug. It’s definitely more patient than me in writing detailed tests.
Instant and pretty great code review: it can understand what you are trying to do, find issues, and fix them quickly. Just ask it to review and fix issues.
Writing new code: it’s actually pretty great at this. I needed a util class for config that had fallbacks to config files, env vars and defaults. And I wanted type checking to work on the accessors. Nothing hard, but it would have taken time to look at docs for yaml parsing, how to find the home directory, which env vars api returns null vs error on blank, typing, etc. All easy, but takes time. Instead I described it in about 20 seconds and it wrote it (with tests) in a few seconds.
It’s moved well past the stage “it can answer questions on stack overflow”. If it has been a while (a while=6 months in ML), try again with new sonnet 3.5.
For me it doesn't work. Generated tests fail to run or they fail.
I work in large C# codebases and in each file I have lots of injected dependencies. I have one public method which can call lots of private methods in the same class.
AI either doesn't properly mock the dependencies, either ignores what happens in the private methods.
If I take a lot of time guiding it where to look, it can generate unit tests that pass. But it takes longer than if I write the unit tests myself.
- It's gotten way better in the last 6 months. Both models (Sonnet 3.5 and new October Sonnet 3.5), and tooling (Cursor). If you last tried Co-pilot, you should probably give it another look. It's also going to keep getting better. [1]
- It can make errors, and expect to do some code review and guiding. However the error rates are going way way down [1]. I'd say it's already below humans for a lot of tasks. I'm often doing 2/3 iterations before applying a diff, but a quick comment like "close, keep the test cases, but use the test fixture at the top of the file to reduce repeated code" and 5 seconds is all it takes to get a full refactor. Compared to code-review turn around with a team, it's magic.
- You need to learn how to use it. Setting the right prompts, adding files to the context, etc. I'd say it's already worth learning.
- I just knows the docs, and that's pretty invaluable. I know 10ish languages, which also means I don't remember the system call to get an env var in any of them. It does, and can insert it a lot faster than I can google it. Again, you'll need to code review, but more and more it's nailing idiomatic error checking in each language.
- You don't need libraries for boilerplate tasks. zero_pad is the extreme/joke example, but a lot more of my code is just using system libraries.
- It can do things other tools can't. Tell it to take the visual style of one blog post and port to another. Take it to use a test file I wrote for style reference, and update 12 other files to follow that style. Read the README and tests, then write pydocs for a library. Write a GitHub action to build docs and deploy to GitHub pages (including suggesting libraries, deploy actions, and offering alternatives). Again: you don't blindly trust anything, you code review, and tests are critical.
[1] https://www.anthropic.com/news/3-5-models-and-computer-use
Cursor’s code review is surprisingly good. It’s caught many bugs for me that would have taken a while to debug, like off by one errors or improperly refactored code (like changing is_alive to is_dead and forgetting to negate conditionals)
-- Get recipient's push token and sender's username
SELECT expo_push_token, p.username
INTO recipient_push_token, sender_username
FROM profiles p
WHERE p.id = NEW.recipient_id;
Seems like the world has truly gone insane and engineers are tuned into some alternate reality a la Fox News. Well…it’ll be a sobering day when the other shoe falls.Copy and paste the code to the Claude website? Or use an extension? o something else?
Done in 5 seconds.
I replaced SO with cGPT and it’s the only good case I found. Finding an answer I build onto. But outsourcing my reflexion ? That’s a dangerous path. I tried on small projects to do that, building a project from scratch with cursor just to test it. Sometimes it’s right on spot but in many instances it misses completely some cases and edge cases. Impossible to trust blindly. And if I do so and not take proper time to read and think about the code the consequences pile up and make me waste time in the long run because it’s prompt over prompt over prompt to refine it and sometimes it’s not exactly right. That messes up my thinking and I prefer to do it myself and use it as a documentation on steroids. I never used google and SO again for docs. I have the feeling that relying on it to much to write even small blocs of code will make us loose some abilities in the long run and I don’t think that’s a good thing. Will companies allow us to use AI in code interviews for boilerplate ?
This leads to your code being littered with problematic edge-cases that you still have to learn how to fix. Or in worst case you don't even notice that there are edge cases because you just copy-pasted the code and it works for you. The edge cases your users will find with time.
If AI tools let us vomit out boilerplate and syntax, I guess that sort of helps with the writing part (maybe. As long as you fully understand what the AI is writing). But it doesn’t make the resulting code any more understandable.
Of course, as is always the case, the tools we have now are the dumbest they’ll ever be. Maybe in the future we can have understandable AI that can be used as a programming language, or something. But AI as a programming language generator seems bad.
seniors
copilots
juniors
new languages
Wondering. Since the seniors pair with LLMs, world needs much less juniors. Some juniors will go away to other industries, but some might start projects in new languages without LLM/business support.Frankly, otherwise I don't see how any new lang corpus might get created.
I'm only 2 weeks in but it's basically impossible for me to imagine going back now.
It's not the same as GH Copilot, or any of the other "glorified auto-complete with a chatbox" tools out there. It's head and shoulders better than everything else I have seen, likely because the people behind it are actual AI experts and have built numerous custom models for specific types of interactions (vs a glorified ChatGPT prompt wrapper).
Only if it's the same occurrences. Cursor can often get the idea of what you want to do with the whole block of different names. Unless you're a vim macro master, it's not easily doable.
> How many of you spend 90% of time writing boilerplate instead of the core logic of the project?
It doesn't take much time, but it's a distraction. I'd rather tab through some things quickly than context switch to the docs, finding the example, adapting it for the local script, then getting back to what I was initially trying to do. Working memory in my brain is expensive.
I still spend a good amount of time on boilerplate. Stuff that's not thinking hard about the problem I'm trying to solve. Stuff like units tests, error logging, naming classes, methods and variables. Claude is really pretty good at this, not as good as the best code I've read in my career but definitely better than average.
When I review sonnets code the code is more likely to be correct than if I review my own. If I make a mistake I'll read what I intended to write, and not what I actually wrote. Where as when I review sonnets there's 2 passes so the chance an error slips through is smaller.
The major difference is that with Cursor you just hit "tab", and that thing is done. Vs breaking focus to open up a browser, searching SO, finding an applicable answer (hopefully), translating it into your editor, then reloading context in your head to keep moving.
When I'm writing unit tests or integration tests it can guess the boilerplate pretty well.
If I already have a AddUserSucceeds test and I start writing `public void Dele...` it usually fills up the DeleteUserSucceeds function with pretty good guesses on what Asserts I want there - most times it even guesses the API path/function correctly because it uses the whole project as context.
I can also open a fresh project I've never seen and ask "Where is DbContext initialised" and it'll give me the class and code snippet directly.
Oh my god get ready to waste a full weekend just to setup everything and get a formatted hello world.
In the aggregate, almost no programmer can think up code faster than they can type it in. But being a better typist still helps, because it cuts down on the amount you have to hold in your head.
Similar for automatically generating boilerplate.
> If I don't know, how can I trust the generated code to be correct?
Ask the AI for a proof of correctness. (And I'm only half-joking here.)
In languages like Rust the compiler gives you a lot of help in getting concurrency right, but you still have to write the code. If the Rust compiler approves of some code (AI generated or artisanally crafted), you are already pretty far along in concurrency right.
A great mind can take a complex problem and come up with a simple solution that's easy to understand and obviously correct. AI isn't quite there yet, but getting better all the time.
And thank god! Code is a liability. The price of code is coming down but selling code is almost entirely supplanted by selling features (SaaS) as a business model. The early cloud services have become legacy dependencies by now (great work if you can get it). Maintaining code is becoming a central business concern in all sectors governed by IT (i.e. all sectors, eating the world and all that).
On a per-feature basis, more code means higher maintenance costs, more bugs and greater demands on developer skills and experience. Validated production code that delivers proven customer value is not something you refactor on a whim (unless you plan to go out of business), and the fact that you did it in an evening thanks to ClippyGPT means nothing—-the costly part is always what comes after: demonstrating value or maintaing trust in a competitive market with a much shallower capital investment moat.
Mo’ code mo’ problems.
I mean on the big picture level sure they can. Or in detail if it is something that they have good experience with. In many cases I get a visual of the whole code blocks, and then if I use copilot I can already predict what it is going to auto complete for me based on the context and then I can pretty much in a second know if it was right or wrong. Of course it is more so for the side projects since I know exactly what I want to do and so it feels most of the time it is having to just vomit all the code out. And I feel impatient, so copilot helps a lot with that.
* figuring out how to X in an API - eg "write method dl_file(url, file) to download file from url using requests in a streaming manner"
* Brainstorming which libraries / tools / approaches exist to do a given task. Google can miss some. AI is a nice complement for Google.
I never understand arguments like this. I have no idea what the shortcut for this command is. I could learn this shortcut, sure, but tomorrow I’ll need something totally different. Surely people can see the value of having a single interface that can complete pretty much any small-to-medium-complexity data transformation. It feels like there’s some kind of purposeful gaslighting going on about this and I don’t really get the motive behind it.
(Just kidding. I’m just making fun of how AI maxis reply to such comments, but they do it more subtly.)
2. The last time I wrote boilerplate heavy Java code, 15+ years ago, the IDE already generated most of it for me. Nowadays boilerplate comes in two forms for me: new project setup, which I find it far quicker to use a template or just copy and gut an existing project (and it’s not like I start new projects that often anyway), or new components that follow some structure, where AI might actually be useful but I tend to just copy an existing one and gut it.
3. These aren’t tasks I really trust AI for. I still attempt to use AI for them, but 9 out of 10 times come away disappointed. And the other 1 time end up having to change a lot of it anyway.
I find a lot of value from AI, like you, asking it SO style questions. I do also use it for code snippets, eg “do this in CSS”. Its results for that are usually (but not always) reasonably good. I also use it for isolated helper functions (write a function to flood fill a grid where adjacent values match was a recent one). The results for this range from a perfect solution first try, to absolute trash. It’s still overall faster than not having AI, though. And I use it A LOT for rubber ducking.
I find AI is a useful tool, but I find a lot of the positive stories to be overblown compared to my experience with it. I also stopped using code assistants and just keep a ChatGPT tab open. I sometimes use Claude but it’s conversation length limits turned me off.
Looking at the videos in OP, I find the parallelising task to be exactly the kind of tricky and tedious task that I don’t trust AI to do, based on my experience with that kind of task, and with my experience with AI and the subtly buggy results it has given me.
It's amazing how many naysayers there are about Cursor. There are many here and they obviously don't use Cursor. I know this because they point out pitfalls that Cursor barely runs into, and their criticism is not about Cursor, but about AI code in general.
Some examples:
"I tried to create a TODO app entirely with AI prompts" - Cursor doesn't work like that. It lets you take the wheel at any moment because it's embedded in your IDE.
"AI is only good for reformatting or boilerplate" - I copy over my boilerplate. I use Cursor for brand new features.
"Sonnet is same as old-timey google" - lol Google never generated code for you in your IDE, instantly, in the proper place (usually).
"the constantly changing suggested completions seem really distracting" - You don't need to use the suggested completions. I barely do. I mostly use the chat.
"IDEs like cursor make you feel less competent" - This is perhaps the strongest argument, since my quarrel is simply philosophical. If you're executing better, you're being more competent. But yes some muscles atrophy.
"My problem with all AI code assistants is usually the context" - In Cursor you can pass in the context or let it index/search your codebase for the context.
You all need to open your minds. I understand change is hard, but this change is WAY better.
Cursor is a tool, and like any tool you need to know how to use it. Start with the chat. Start by learning when/what context you need to pass into chat. Learn when Cmd+K is better. Learn when to use Composer.
I don't think you should be upset or worried that people aren't adopting these tools as you think they should. If the tool really lives up to its hype then the non-adopters will fall behind and, for example, be forced to switch to Cursive. This happened with IDEs (e.g. IntelliSense, jump to definition). It may happen with tools like Cursive.
I certainly don't feel this way but if I'm proven wrong thats good.
Like OP using cursor has been a huge productivity boost. I maintain a few postgres databases, I work as a fullstack developer, and manage kubernetes configs. When using cursor to write sql tables or queries it adopts my way of writing sql. It analyzed (context) my database folder and when I ask it to create a query, a function, a table, the output is in my style. This blew me away when I first started with cursor.
Onto react/nextjs projects. In the same fashion, I have my way of writing components, fetching data, and now writing RSA. Cursor analyzed my src folder, and when asked to create components from scratch the output was again similar to my style. I use raw CSS and class names, what was an obstacle of naming has become trivial with Cursor ("add an appropriate class to this component with this styling"). Again, it analyzed all my CSS files and spits out css/classes in my writing/formatting style. And working on large projects it is easy to forget the many many components, packages, etc. that integrated/have been written already. Again, cursor comes out on top.
Am I good developer or a bad developer? Don't know. Don't care. I'm cranking out features faster than I have ever done in my decades of development. As has been said before, as a software engineer you spend more time reading code than writing. Same applies to genAI. It turns out that I can ask cursor to analyze packages, spit out code, yaml configuration, sql, and it gets me 80% done with writing from scratch. Heck, if I need types to get full client/server type completion experience, it does that too! I have removed many dependencies (tailwind, tRPC, react query, prisma, to name a few) because cursor has helped me overcome obstacles that these tool assisted in (and I still have typescript code hints in all my function calls!).
All in all, cursor has made a huge difference for me. When colleagues ask me to help them optimize sql, I ask cursor to help out. When colleagues ask to write generic types for their components, I ask cursor to help out. Whether cursor or some other tool, integrating AI with the IDE has been a boom for me.
Correct. Because they know they need to use the correct tools for the job.
> If the tool really lives up to its hype then the non-adopters will fall behind
This is already happening. I'm able to out-deploy many of my competitors because I'm using Cursor.
Have you actually spent much time with Cursor? The comparison to "Jump to definition" is pretty bad. You also misspelled its name twice.
OK, I understand. Maybe they can't get much use of them and that's fine. But why they always insist that the tools don't work for everyone is something I can't make any sense of.
I stopped arguing online about this though. If they don't want to use LLMs that's fine too. Others (we) are taking their business.
After trying Cursor, I'd say if I were Jetbrains devs I'd be very worried. It's a true paradiam shift. It feels like Jetbrains' competitve edge over other editors/IDEs mostly vanished overnight.
Of course Jetbrains has its own AI-based solution and I'm sure they'll add more. But I think what Jetbrains excels -- the understanding of semantics -- is no longer that important for an IDE.
> I don't use it because I already use JetBrains (Pycharm, mostly). Hard to see any value add of Cursor over that.
lol
When people complain about LLMs hallucinating results, that doesn't really apply because it is either guessing wrong on the autocomplete (in which case I just keep typing) or it doesn't instantly point out the bug, in which case I look at the code or jump to Google.
It reminds me of how blackberry users insisted physical keyboards were necessary and smartphone touchscreen users were deluded.
On the other hand, at another company, where the NDAs are stronger and more one-sided, and there's a stronger culture of code silos, "who needs to know" governing read access to individual code repos, even for mundane things like web dashboards, and higher security in general, I expected nobody would be allowed to use these tools, yet I saw people talking about their Copilot and Cursor use openly on the company Slack.
There was someone sitting next to me using Cursor yesterday. I'd consider hiring them, if they're interested, but there's no way they're going to want to join a company that forbids using AI tools that upload code being worked on.
So I don't think companies are particularly consistent about this at the moment.
(Perhaps Apple's Private Cloud Compute service, and whatever equivalents we get from other cloud vendors, will eventuall make a difference to how companies see this stuff. We might also see some interesting developments with fully homomorphic encryption (FHE). That's very slow, but the highly symmetric tensor arithmetic used in ML has potential to work better with FHE than general purpose compute.)
For a lighter-weight IDE I use Zed
I recently went from an idea for a casual word game (aka wordle) to a fully polished product in about 2h, which would have taking me 4 or 5 times that if I hadn’t used Cursor. I estimate that 90% of the time was spent thinking about the product, directing the AI, and testing and about 10% of the time actually coding.
Unless you work in R&D i've got some bad news for you..
Using AI enabled me to spend more time thinking about game mechanics.
they didnt specifically mean they built a wordle clone, just a game like it. if they wanted just a wordle clone, they wouldve gotten one within a few minutes of using codegen tools.
Have you written about your experience anywhere in greater length?
I'm below average in a lot of programming languages and tools. Cursor is extremely useful there because I don't have to spend tens of minutes looking up APIs or language syntax.
On the other hand, in areas I know more about, I feel that I can still write better code than Cursor. This applies to general programming as well. So even if Cursor knows exactly how to write the syntax and which function to invoke, I often find the higher-level code structure it creates sub-optimal.
Overall, Cursor is an extremely useful tool. It will be interesting to see whether it will be able to crawl out of the primordial soup of averages.
It's a great intern, letting me focus on the few but important places that I add specific value. If this is all it ever does, that's still enormously valuable.
You can see in general anything AI produces is pretty average.
But people who buy software don't care that the code behind it is average. As long as it works.
Whereas people who buy text, images and video do care.
I’ve also been logging every interaction with an LLM and the exit status of the build on every mtime of every language mode file and all the metadata: I can easily plot when I lean on the thing and when I came out ahead, I can tag diffs that broke CI. I’m measuring it.
My conclusion is that I value LLMs for coding in exact the same way that the kids do: you have to break Google in order for me to give a fuck about Sonnet.
LLMs seem like magic unless you remember when search worked.
Yikes. I didn’t even think about this, but it’s true.
I’m looking for the kinds of answers that Google used to surface from stack overflow
Fully switched over more than a year ago and never looked back.
I just spend any amount of tokens to build a database of how 4o behaves correlated to everything emacs knows, which is everything. I’m putting down tens of megabytes a day on what exact point they did whatever thing.
They didn’t get ahead by selling you the same thing they do, if they did Continue would be parity.
As long as one prompts it properly with sufficient context, reviews the generated code, and asks it to revise as needed, the productivity boost is significant in my experience.
That will be scary. Until then, it's basically just a better autocomplete for any competent developer.
If there’s no computation then there’s no computer science. It may be the case that Excel with attitude was a bubble in hiring.
But Sonnet and 4o both suck at why CUDA isn’t detected on this SkyPilot resource.
Personally, I find this kind of workflow totally counter-productive. My own programming workflow is ~90% mental work / doing sketches with pen & paper, and ~10% writing the code. When I do sit down to write the code, I know already what I want to write, don't need suggestions.
I've been in compilers, storage, and data backends for 15ish years, and had to do a little project that required recording audio clips in a browser and sending them over a websocket. Cursor helped me do it in about 5 minutes, while it would've taken at least 30 min of googling to find the relevant keywords like MediaStream and MediaRecorder, learn enough to whip something up, fail, then try to fix it until it worked.
Then I had to switch to streaming audio in near-realtime... here it wasn't as good: it tried sending segments of MediaRecorder audio which are not suitable for streaming (because of media file headers and stuff). But a bit of Googling, finding out about Web Audio APIs and Audio Worklet, and a bit of prompting, and it basically wrote something that almost worked. Sure it had some concurrency bugs like reading from the same buffer that it's overwriting in another thread. But that's why we're checking the generated code, right?
But eventually you get to a point where you've solved variations of the problem hundreds of times before, and it's just hours of time being burnt away writing it again with small adjustments.
It's like getting into making physical things with only a screwdriver and a hammer. Working with your hands on those little projects is fun. Then eventually you level up your skills and realize making massive things is much easier with a power drill and some automated equipment, and gives you time to focus on the design and intricacies of far more complicated projects. Though there are always those times where you just want to spend a weekend fiddling with basics for fun.
The rest is then general design and archiceture, where LLMs really don't help much with. What they are really good for is to get an idea of possible options in spaces were you have little experience or to quickly explain and summarize specific solutions and their pros and cons. But I tried to make it pick a solution based on the constraints and even with many tries and careful descriptions, the results were really bad.
i think for the most part its meant to help you "get past" all the generic code you usually write in the beginning of a project, generic functions you need in almost all systems, etc.
I'm not saying that I don't like writing code. I'm just saying that doing a lot of it can be mentally exhausting. Sometimes I'd just prefer to ship feature-complete stuff on-time and on-budget, then go back to my kids and wife without feeling like my brain is mush.
If you don't know how to handle a bike, the ebike won't help you in these situations. (You might even get yourself in a tricky spot).
But if you know how to ride, it can be really fun.
Same with code. If you know how to code it can make you much more productive. If you don't know how to code, you get into tricky spots...
In some cases, LLMs act as a stackoverflow replacement for me, like „sort this with bubble sort, by property X“. I’d also ask it to write some test cases around that. I won’t import a bubble sort library just for this, but I also don’t want to spend any more time than necessary, implementing this for the nth time.
That said, I think everyone can relate to wasting an awful lot of time on things that are not "interesting" from the perspective of the project you are working on. For example, I can't count the number of hours I've spent trying to get something specific to work in webpack, and there is no payoff because today the fashionable tool is vite and tomorrow it'll be something else. I still want to know my code inside and out, but writing a deploy script for it should not be something I need to spend time on. If I had a junior dev working for me for pennies a day, I would absolutely delegate that stuff to them.
Crap like that, 100 times a day.
"Walk through this array and pull out every element without an index field and add it to a new array called needsToBeIndexed, send them off to the indexing service, and log any failures to the log file as shown in the function above".
Cursor lets me think closer to the level of architecting software.
Sure having a deep knowledge of my language of choices is fun, and very needed at times, but for the 40% or so of code that is boring work of moving data around, Cursor helps a lot.
I have a feeling that blindly building things with AI will actually lead to incomprehensible monstrous codebases that are impossible to maintain over the long run.
Read “Programming as Theory Building” by Peter Naur. Programming is 80% theory-in-the-mind and only about 20% actual code.
Here's an actual example of a task I have at work right now that AI is almost useless in helping me solve. "I'm working with 4 different bank APIs, and I need to simplify the current request and data model so that data stored in disparate sources are unified into one SQL table called 'transactions'". AI can't even begin to understand this request, let alone refactor the codebase to solve it. The end result should have fewer lines of code, not more, and it requires a careful understanding of multiple APIs and careful data modelling design and mapping where a single mistake could result in real financial damage.
1. Auto-complete makes me type ~20% faster (I type 100+ WPM)
2. Composer can work across a few files simultaneously to update something (e.g. updating a chrome extension's manifest while proposing a code change)
3. Write something that you know _exactly_ how it should work but are too lazy to author it yourself (e.g. Write a function that takes 2 lists of string and pair-wise matches the most similar. Allow me to pass the similarity function as a parameter. Use openai embedding distance to find most similar pairings between these two results)Going the other direction in terms of model size, one tool I've found usable in these scenarios is Supermaven [0]. It's still just one or multi-line suggestions a la GH Copilot, so it's not generating entire apps for you, but it's much much better about pulling those one liners from the rest of the codebase in a logical way. If you have a custom logging module that overloads the standard one, with special functions, it will actually use those functions. Pretty impressive. Also very fast.
https://forum.cursor.com/t/capped-at-10k-context-no-matter-a...
[1] https://support.anthropic.com/en/articles/7996856-what-is-th...
I work in Rust and I had to start working with several new libraries this month. One example of them is `proptest-rs`, a rust property testing library that defines a whole new grammar to define the tests. I am 100% sure that I spent much less time to get on-boarded with the librariy's best practices and usages. I just quickly went through their book (to learn the vocabulary) and asked the AI to generate the code itself. I was very surprised that it did not do any mistakes, considering that sort of weird custom grammar of the lib. I will at least keep trying for another months.
I suspect some are inspired by Cursor?
https://blog.jetbrains.com/ai/2024/10/complete-the-un-comple...
It was like a second person being in the editor having a mind of its own constantly touching my code, even if it should have left it alone. It felt like I was finding myself undoing stuff it made all the time.
- Generating wrappers and simple CRUD APIs on top of database tables, provided only with a DDL of the tables.
- Optimizing SQL queries and schemas, especially for less familiar SQL dialects—extremely effective.
- Generating Swagger comments for API methods. Joyness
- Re-creating classes or components based on similar classes, especially with Next.js, where the component mechanics often make this necessary.
- Creating utility methods for data conversion or mapping between different formats or structures.
- Assisting with CSS and the intricacies of HTML for styling.
- GPT4 o1 is significantly better at handling more complex scenarios in creation and refactoring.
Current challenges based on my experience:
- LLM lacks critical thinking; they tend to accommodate the user’s input even if the question is flawed or lacks a valid answer.
- There’s a substantial lack of context in most cases. LLMs should integrate deeper with data sampling capabilities or, ideally, support real-time debugging context.
- Challenging to use in large projects due to limited awareness of project structure and dependencies.
I thought his "Changes to my workflow" section was the most interesting, coupled with the fact that coding productivity (churning out lines of code) was not something he found to be a benefit. However, IMO, the workflow changes he found beneficial seem to be a bit questionable in terms of desireability...
1) Having LLM write support libraries/functions from scratch rather than rely on external libraries seems a bit of a double-edged sword. It's good to minimize dependencies and not be affected by changes to external libraries, but OTOH there's probably a lot of thought and debugging that has been put into those external libraries, as well as support for features you may not need today but may tomorrow. Is it really preferable to have the LLM reinvent the wheel using untested code it's written channeling internet sources?
2) Avoiding functions (couched as excessive abstractions) in favor of having the LLM generate repeated copies of the same code seems like a poor idea, and will affect code readability, debugging and maintenance whereby a bugfix in one section is not guaranteed to be replicated in other copies of the same code.
3) Less hesitancy to use unfamiliar frameworks and libraries is a plus in terms of rapid prototyping, as well as coming up to speed with a new framework, but at the same time is a liability since the quality of LLM generated code is only as good as the person reviewing it for correctness and vulnerabilities. If you are having the LLM generate code using a framework you are not familiar with, then you are at it's mercy as to quality, same as if you cut and pasted some code from the internet without understanding it.
I'm not sure we've yet arrived at the best use of "AI" for developer productivity - while it can be used for everything and anything, just as ChatGPT can be asked anything, some uses are going to leverage the best of the underlying technology, while others are going to fall prey to it's weaknesses and fundamental limitations.
There is a very vocal old guard who are stubborn about ditching their 10,000+ hours master-level expertise to start from zero and adapt to the new paradigm. There is a lot of skepticism. There are a lot of people who take pride in how hard coding should be, and the blood and sweat they've invested.
If you look at AI from 10,000 feet, I think what you'll see is not AGI ruining the world, but rather LLMs limited by regression, eventually training on their own hallucinations, but good enough in their current state to be amazing tools. I think that Cursor, and products like it, are to coding what Photoshop was to artists. There are still people creating oil paintings, but the industry — and the profits — are driven by artists using Photoshop.
Cursor makes coders more efficient, and therefore more profitable, and anyone NOT using Cursor in a hiring pool of people who ARE using it will be left holding the short straw.
If you are an expert level software engineer, you will recognize where Cursor's output is bad, and you will be able to rapidly remediate. That still makes you more valuable and more efficient. If you're an expert level software engineer, and you don't use Cursor, you will be much slower, and it is just going to reduce your value more and more over time.
It's a specific thing and it doesn't suit me.
I've seen the glittery eyed hype on hn before and it basically means it will become a common tool. Whether it's good or not, that's a different question.
The people that are negative about these things, because they need to review it, seem to be missing the massive amount of time saved imho.
Many users point to the fact that they spend most of the time thinking, I'm glad for them, most of the time I spend is glueing APIs, boilerplates, refactoring, and on those aspects Cursor helps tremendously.
The biggest killer feature that I get from similar tools (I ditched Copilot recently in favor of it) is that they allow me to stay focused and in the flow longer.
I have a tendency to phase out when tasks get too boring, repetitive or stressed out when I can't come up with a solution. Similarly going on a search engine to find an answer would often put me in a long loop of looking for answers deeply buried in a very long article (you need to help SEO after all, don't you?) and then it would be more likely that I would get distracted by messages on my company chat or social media.
I can easily say that Cursor has made me more productive than I was one year ago.
I feel like the criticism many have comes from the wrong expectations of these tools doing the work for you, whereas they are more into easing out the boring and sometimes the hard parts.
I'm sure we'll get there, but I haven't seen anything even close available.
Oh my, oh my... How have I done this all these years before "AI" was a th- hype?
I did it without wasting even a fraction of the CO2 needed for these toys.
"AI" has some usecases, granted. But selling it as the holy grail and again sh•tting on the environment is getting more and more ridiculous by the day.
Humanity, even the smarter part, truly deserves what is coming.
Apes on a space rock
As an experiment, some time ago, I tried to build a TODO app entirely with AI prompts. I used a special serverless platform on the backend to store the data so that it would persist between page refreshes. I uploaded the platform's frontend components README file to the AI as part of the input.
Anyway, what happened is that it was able to create the TODO app quickly; it was mostly right after the first prompt and the app was storing and loading the TODOs on the server. Then I started asking for small changes like 'Add a delete button to the TODOs'; it got that right. Impressive!
All the code fit in a single file so I kept copying the new code and starting a new prompt to ask for changes... But eventually, in trying to turn it into a real product, it started to break things that it had fixed before and it started to feel like a game of whac-a-mole. Fixing one thing broke another and it often broke the same thing multiple times... I tried to keep conversations longer instead of starting a new one each iteration but the results were the same.
And you go on to say that your experiment was to build a TODO app "some time ago" in a single file of code.
For me it feels like it just starts to break after a certain length, but may not require a breakthrough new architecture to provide more value. Just larger context window sizes, so it can do the same thing it does on smaller pieces of code, on larger pieces of code, too.
The sweet spot seems to be bootstrapping something new from scratch and get all the boilerplate done in seconds. This is probably also where the hype comes from, feels like magic.
But the issue is, that once it gets slightly more complicated, thinks break apart and run into a dead end quickly. For example yesterday I wanted to build a simple CLI tool in Go (which is outstandingly friendly to LLM codegen as a language + stdlib) that acts as a simnple reverse proxy and (re-)starts the original thing in the background on file changes.
AI was able to knock out _something_ immediately that indeed compiled, only it didn't actually work like intended. After lots of iterations back-and-forth (Claude mostly) the code balooned in size to figure out what could be the issue, adding all kinds of useless crap that kindof-looks-helpful-but-isn't. After an hour I gave up and went through the whole code manually (few hundred lines, single file) and spotted the issue immediately (holding a mutex lock that gets released with `defer` doesn't play well with a recursive function call). After pointing that out, the LLM was able to fix it and produced a version that finally worked - still with tons of crap and useless complexity everywhere. And thats a simple straightforward coding task that can be accomplished in a single file and only few hundred lines, greenfield style. And all my claude chat tokens of the day got burned for this, only for me at the end having to dig in myself.
LLMs are great to produce things in small limited scopes (especially boilerplate-y stuff) or refactor something that already exists, when it has enough context and essentially doesn't really think about a problem but merely changes linguistic details (rewrites text to a different format ultimately) - its a large LANGUAGE model after all.
But full blown autonomous app building? Only if you do something that has been done exactly thousands of times before and is simple to begin with. There is lots of business value in that, though. Most programmers at companies don't do rocket science or novel things at all. It won't build any actual novelty - ideal case would be building an X for Y (like Uber For Catsitting) only but never an initial X.
Personal productivity of mine went through the roof since GPT4/Cursor though, but I guess I know how/when to use it properly. And developer demand will surge when the wave of LLM-coded startups get their funding and realize the codebase cannot be extended anymore with LLMs due to complexity and the raw amount of garbage in there.
That's what experience with current generation LLMs looks like. But you don't get points for getting the code in/by the LLM looking perfect, you get points for what you check-in to git and PR. So skill is in realizing the LLM is running itself in circles, before you run out of tokens and burn an hour, and do it yourself.
Why use an LLM at all if you still have to do it yourself? Because it's still faster than without, and also that's how you'll remain employable - by covering the gaps that an LLM can't handle (until they can actually do full blown autonomous app development, which is still a while away, imo).
Using Cody currently with our company enterprise API key
This sounds like a nightmare.
I think the biggest problem with AI at the moment is that it incorrectly assumes that coding is the difficult part of developing software, but it's actually the easiest part. Debugging broken code is a lot harder and more time consuming than writing new code; especially if it's code that someone else wrote. Also, architecting a system which is robust and resilient to requirement changes is much more challenging than coding.
It boggles the mind that many developers who hate reading and debugging their team members' code love spending hours reading and debugging AI-generated code. AI is literally an amalgamation of other peoples' code.
Continue Dev is a different extension which continues with the same name.
> For example, suppose I have a block of code with variable names in under_score notation that I want to convert to camelCase. It is sufficient to rename one instance of one variable, and then tab through all the lines that should be updated, including the other related variables.
For me that would be :%s/camel_case/camelCase/gc then yyyyyynyyyy as I confirm each change. Or if it's across a project, then put cursor on word, SPC p % (M-x projectile-replace-regexp), RET on the word, camelCase, yyyynyyynynn as it brings me through all files to confirm. There's probably even better smarter ways than that, I learn "just enough to get it done" and move on, to my continual detriment.
> Many times it will suggest imports when I add a dependency in Python or Go.
I can write a function in python or typescript and if I define a type like:
function(post: Po
and wait a half second, I'll be given a dropdown of options that include the Post type from Prisma. If I navigate to that (C-J) and press RET, it'll auto-complete the type, and then add the import to the top, either as a new line or included with the import object if something's already being imported from Prisma. The same works for Python, haven't tried other. I'm guessing this functionality comes from LSP? Actually not sure lol. Like I said, least knowledgeable emacs user.As for boilerplate, my understanding is most emacs people have a bunch of snippets they use, or they use snippet libraries for common languages like tsx or whatever. I don't know how to use these so I don't.
I still intend to try cursor since my boss thinks it might improve productivity and thus wants me to see if that's true, and I don't want to be some kind of technophobe, however I remain skeptical that these built-in tools are at least any more useful than me just quickly opening a chatgpt window in my browser and pasting some code in, with the downside of me losing all my bindings.
My spacemacs config: https://github.com/komali2/Configs/blob/master/emacs/.spacem...
Meanwhile, the Compose mode gives code changes a good shot either in one file or multiple, and you can easily direct it towards specific files. I do wish it could be a bit smarter about which files it looks at since unless you tell it about that file you have types in, it'll happily reimplement types it doesn't know of. And another big issue with Compose mode is that the product is just not really complete (as can be seen by how different the 3 UXes of applying edits are). It has reverted previous edits for me, even if they were saved on disk and the UI was in a fresh state (and even their "checkout" functionality lost the content).
The Cmd+K "edit these lines" mode has the most reliable behavior since it's such a self-contained problem where the implementation uses the least amount of tricks to make the LLM faster. But obviously it's also the least powerful.
I think it's great that companies are trying to figure this out but it's also clear that this problem isn't solved. There is so much to do around how the model gets context about your code, how it learns about your codebase over time (.cursorrules is just a crutch), and a LOT to do about how edits to code are applied when 95% of the output of the model is the old code and you just want those new lines of code applied. (On that last one, there are many ways to reduce output from the LLM but they're all problematic – Anthropic's Fast Edit feature is great here because it can rewrite the file super fast, but if I understand correctly it's way too expensive).
Why both functions are inlined and why FastAPI is used at all. I’m also not seeing any network bindings. Is it bound to local host (doubt it) or does it immediately bind to all interfaces.
It’s a 3 second thought from looking at Python code that I know only enough to write small and buggy utilities (yet Python is widely popular - so LLM have ton of data learned in). I know it’s a demo only but this video strengthen my feeling about drop of critical thinking and a raise of McDonalds productivity.
McDonalds is not bad: it’s great when you are somewhere you don’t know and not feeling well. They have same menu almost everywhere and it’s like 99% safe due to process standardization. Plus they can get you satiated in 20 minutes or less. It’s still type of food that you can’t really feed on for long; and if you do, there will be consequences.
The silver lining of it is that it most likely cut off all people from the field who hate the domain but love the money, as it’s exactly the problem it automatizes away.
Also did I understand correctly that you’re not very well versed with python in general but still decided to criticize the decisions the model and the author made in that language?
It's the best I've tried so far. I only tried copilot when it came out, so it might have improved a lot since then, perhaps someone using both now can chime in.
There's two things I like about cursor - the overall multi-file edit -> I use this for scaffolding whatever I need, at a high level, and claude 3.5 seems a bit better than 4o for this.
The tab thing really works, and it will surprise you. They don't do just autocomplete, they also do auto-action complete so after you edit something, your cursor (heh) goes to the next predicted place where you're likely to make an edit. And it works ootb most times, and a few times I've gone "huh, might have missed that, if it weren't for the cursor going there".
So that's my current workflow. Zoomed out scaffolding and then going in and editing / changing whatever it is that I need, and cursor has two really strong features for both cases.
I would like to use it, but I literally cannot because of this bug!
i had two csvs. one had an ISBN column, the other had isbn10, isbn13, and isbn. i tried to tell it to write python code that would merge these two sheets by finding the matching isbns. didn't work very well. it was trying to do pandas then i tried to get it to use pure python. it took what felt like an hour of back and forth with terrible results.
in a new chat, i told it that i wanted to know different algorithms for solving the problem. once we weighed all of the options, it wrote perfect python code. took like 5 minutes. like duh use a hashmap, why was that so hard?
So for making changes to existing code, Copilot isn't helpful and neither seems to Cursor.
I can use Copilot to write some new methods to parse strings or split strings or convert to/from JSON or make http calls. Bat anything that implies using or changing existing code doesn't yield good results.
Certainly for writting my emails for me I would quite like an Ai fine-tuned to my written voice.
> Copilot seems to only be aware of code in current file
Cursor does not have this limitation
* Cursor is massively faster than Copilot
* Cursor is absolutely aware of the codebase when you add it to the workspace (ctrl/cmd+b)
I have also used cursor to write Kubernetes client code with great success because the API space is so large it doesn’t fit into my head that well (not often writing such code these days) so that has been incredibly helpful.
So it’s not revolutionising my workflow but certainly a useful tool in some situations.
I'm not trying to generate programs with it, I find it still far too weak for it.
However, and while I believe the current models can't really reach it, there is nothing in my eyes that prevents to create an AI good enough for that
Shared my 8 pro tips in this post towards the bottom https://betaacid.co/blog/cursor-dethrones-copilot
As a learning tool and for one offs AI can be very nice.
My theory is, that in that case it’s hard to predict what to do from context and libraries are at the same time hyper specialized and similar.
Example: Creating a node and attaching a volume using ansible looks similar for different cloud providers but there a subtle differences in how to speciy the location etc.
I've tried various tools (including Cursor) and my problem is that they often (more than 50%) generates non-working code. Why does it do so? Because the ecosystems change so fast and they have been trained on old versions of various libraries (by definition) but when I use the latest version it's a constant uphill battle. And there are so many different combinations of how to use libraries together....
I can't be the only one facing this issue but I didn't see a lot of discussion around this.
So, as someone said: It's generating at most average code and most of it is outdated and sometimes vulnerable because... Well that's what's out there.
I still use these tools but mostly to know how a solution could look roughly and then start to do my own research and to avoid the black page problem. Sometimes just to learn what to even Google for especially in ecosystems I'm unfamiliar with.
If you have these skills, the productivity gains from tools like Cursor are insane.
If you lack any of these, it makes sense that you don't get the hype; you're missing a critical piece of the new development paradigm, and should work on that.
Cline, on the other hand, is a whole different beast. It edit multiple files, run the program, checks the shell for errors, go back to the files, edit them, run it again, and even access localhost:8000 to check. It's incredible! But if you use the recommended Sonnet 3.5, it'll eat your money very fast.
I've been using Cline + 4o-mini for a few days. Sometimes it's magical; it's the first time I truly feel that I have an assistant. I tell it 'run, check for errors, and correct them' and I leave to grab some coffee.
The bad side: I got lazy and once I almost let it delete rows in a table because it misunderstood what I said.
Flutter/Dart isn't something I'm actually familiar with though I have a lot of dev experience, so I was able to distinguish between poor advice and good advice from the agent. There was a good share of frustrating code deletion happening that required stern instruction adjustment.
For work I'd be concerned about sharing our IP and would want to be on an LLM tier that promises privacy, or maybe even a local LLM.
All in all I was more impressed than I thought I would be.
If cursor made those margins, humans 1 cursor 0
the architecture of I am building.
Sure, you probably don't want to blindly copy or accept suggested changes, but when the tools work, they're like a pretty good autocomplete for various snippets and I guess quite a bit more in the case of Cursor.
If that helps you focus on problem solving and lets the tooling, language and boilerplate get out of the way a little bit more, all the better! For what it's worth, I'm probably sticking with JetBrains IDEs for the foreseeable future, since they have a lot of useful features and are what I'm used to (with VS Code for various bits of scripting, configuration etc.).
But to summarize, Copilot is an okay ChatGPT wrapper for single line code completion. Cursor is a full IDE with AI baked in natively, with use-case specific models for different tasks, intelligent look-ahead, multi-line completion, etc.
If Copilot is a horse, Cursor is a Model T.
Show code.
I work in an environment right now where feeding proprietary code/docs into 3rd party hosted LLMs is a hard no-go, and we don't have any great locally hosted solution set up yet, so I haven't really taken the dive into actively writing code with LLM assistance. I feel like I should practice this skill, but the idea of using a tool like Cursor on personal projects just seems so antithetical to the point that I can't bring myself to actually do it.
It is much, much more than a ChatGPT wrapper. I'd encourage everyone to give it a shot with the free trial. If you're already a VSCode user, it only takes a minute to setup with your exact same devenv you already have.
Cursor has single-handedly changed the way I think about AI and its capabilities/potential. It's truly best-in-class and by a wide margin. I have no affiliation with Cursor, I'm just blown away by how good it is.
You can do most of the things the author showed with your craftfully set-up IDE and magic tricks, but that's not the point. I don't want to spend a lifetime setting up these things only to break when moving to another language.
Also, where the tab-completion shines for me in Cursor is exactly the edge case where it knows when _not_ to change things. In the camel casing example, if one of them were already camel cased, it would know not to touch it.
For the chat and editing, I've gotten a pretty good sense as to when I can expect the model to give me a correct completion (all required info in context or something relatively generic). For everything else I will just sit down and do it myself, because I can always _choose_ to do so. Just use it for when it suits you and don't for when it doesn't. That's it.
There's just so many cases where Cursor has been an incredible help and productivity boost. I suspect that the complainers either haven't used it at all or dismissed it too quickly.
Wrong you can do most of the things the author showed with a fresh install of vim/emacs or by logging in to a fresh install of vscode/intellij - In other words no lifetime was spent on this, I like having as bare an experience as possible so I can use the same setup on any computer.
> I don't want to spend a lifetime setting up these things only to break when moving to another language.
Editor configs don't break across languages?
> For the chat and editing, I've gotten a pretty good sense as to when I can expect the model to give me a correct completion (all required info in context or something relatively generic). For everything else I will just sit down and do it myself, because I can always _choose_ to do so. Just use it for when it suits you and don't for when it doesn't. That's it.
A lot of people don't have this level of wisdom or the skills to pick and continue without AI. Would I be wrong for assuming you've been programming for at least 10 years? I don't think AI is bad for a senior who has already earned their scars, but for a junior/no skill developer it stunts their growth simply because the do expect the model to give them a correct completion, and the thought/action of doing it without an AI is painful (because they lack the requisite skills) so they avoid it.
- the type of project you are working on (what are you writing)
- who are you writing for: is this meant to be bulletproof corporate code, a personal project, a throwaway prototype, etc
- the experience level of the developer
If your use case plays to the strength of the tool/technology, then obviously you will have a better experience than others trying to see if it can do things that it is not really capable of.
"AI positive Hacker News" or something like that.
There is just really not much point in reading anything on AI here. I get it, AI sucks. Next.
Haha, no. A web interface won't make a full-fledged IDE irrelevant. Canvas is really cool for a quick session in the browser, when travelling, while in bed, on a plane, etc. It's neat and works. But it's nowhere near the IDE experience.
Cursor is still a full IDE in the background, with all the bells and whistles that come with it. So if you're working on anything more complicated than one-off scripts, you'll still benefit a lot from having it, over a web interface.