I see claims that it makes you 7-8x more effective. That hasn't been my experience. Maybe 5-10% improvement at best. Instead of Googling myself, LLMs can sometimes give me the answer more quickly than I could find it.
Once you ask it anything not easily scraped on the internet it hallucinates like crazy and sounds so confident about it. Maybe I'm just horrible at prompting, but I can't help but feel we're still N breakthroughs away from having it really impact dev jobs.
People that are 5-10x more effective at hard skills like programming, well I'm just convinced they weren't a good programmer to begin with and are doing easy problems. Basically, anyone that says that is telling on themselves.
There are plenty of tasks I'd normally procrastinate on, or be reluctant to do, because they're tiring, boring, or emotionally difficult for reasons specific to myself. That is, cases when I have the knowledge and the skills, but lack the willpower or composture (or glucose / caffeine in my bloodstream). Using GPT-4 for with that kind of work isn't saving much time vs. what I could do, but it is compared to what would actually happen, which is either me procrastinating on it, delaying it for a better time (next morning, day with less meetings, etc.), or suffering a 2-10x performance penalty from having to fight through my own emotional blocks.
On the net, this isn't making me 5-10x more effective at work. It's probably not even 2x, short-term. Mid-term, 2-5x would be possible, because all the things I did earlier than later add up. Time will tell.
Am I telling on myself here? Maybe. Sorry not sorry. I am a human being, with a human brain, which means some things that should be easy for me, become hard for unrelated reasons. GPT-4 is one of many tools I have to overcome such challenges, but it's a particularly powerful and versatile one, so I'm happy that I can use it.
(Also, INB4, I have access to company-approved deployment on Azure, so I'm in the clear with using it at work.)
It can make devs 5-10x more effective at certain self-contained programming tasks, even if they aren't 5-10x more effective at their job overall. If you have to write a script in a language you haven't used before and it needs to interface with an API whose documentation you haven't read, and the script isn't mission critical but a "nice to have", GPT can massively shorten the learning curve required to just get this work done. Being a good dev doesn't excuse you from having to pay the cost of learning the syntax of this language and basics of the documentation more thoroughly than if you didn't have GPT.
Similar to anyone who says, “When you'd have a hard time googling something, you ask the LLM.”
Just my opinion.
I'm not saying it will never happen, just that it will be very obvious if it does. 5-10% sounds about right to me.
This means that I went from 0 opensource commits to 4000 since chatgpt came out.
Not just that, but I've gotten not only more adventurous, but have the time to consider doing drastic refactors and spend much more time thinking about my software.
I won't call it 10x or 100x, because that wouldn't mean anything, but surely it is a paradigm shift for me, completely world changing.
I feel like Google result quality has declined though, so chatGPT is just bridging the gap; not sure if I'm more productive than I was when Google worked better. ChatGPT often gives me concrete things to google which give me the results I want in the end, and save me a bit of time, but that's only because Googling the thing in the abstract doesn't give me those results any more.
...and I'm not enough of a maniac to trust what ChatGPT tells me without reference checking.
Even that 5-10% boost is mostly just from how horrible Google has become. I used to be able to find the proper sources much faster, but now I might as well ask LLMs since even that wait to generate the answer is technically "faster".
It seems to be better at helping me with stuff I don't know anything about and that has more flexibility in the results. I got it to make a website for me (something I don't know how to do) and it did ok, though required a lot of back and forth, and it picked a lot of defaults, meaning if I'd had more rigid specifications it would have been harder.
I tried to get help in writing a pretty simple python function (some conditional formatting of text) and quickly realized would have been much better looking up what I wanted on SO and writing something myself than get caught in an awkward cycle of taking it's versions which had missed various requirements and trying to get it to correct then.
Anyway it seems to be best at giving minimal examples of things I don't know about, which is helpful. I'm still trying to find the right balance between when I should use it and when I'm better doing things the "old fashioned" way.
This is what working with an expert looks like. They can’t read your mind.
However unlike an LLM, an expert knows what questions to ask to help you express what you want. This is currently lacking in AI, it doesn’t know how to guide you towards a solution you can’t verbalize because you are not the expert and don’t know what to specify.
GitHub Copilot is also very good because it has the context of your open files or selected text and will pretty regularly give you exactly what you want for the next line of code.
I will say it's much weaker for less popular libraries and programming languages, but for most full stack engineering tasks it gives a huge productivity boost
I don't think the grand OP is downplaying anything at all.
And generally speaking, 10% increase in productivity is extremely (repeat 3 times) significant. Remember most developed countries' GDP only grow <3% annually.
``` Given these X files, help me assess how best to implement feature Y. Ask any clarifying questions that may assist with writing a best practice solution. *File 1* *File 2* *File 3* ... ```
If the context length is a problem I'll use claude-100k.
Pretty much always results in good quality code.
I'll then typically open a new window context and get it to review according to best practices highlighting missing e.g. error handling or possible edge cases.
We will always be N breakthroughs away. Why? Because as our uses of computers improve, our work will become ever more complex. And our efforts will become even longer shots than they were before.
I needed to setup a websocket server, chatGPT saved me tons of hours because Google results are now all useless blogspam so even finding simple resources has become hard.
I needed a custom ui component built, chatGPT gave me a good base and I was then able to also it to go add accessibility keyboard handlers, something I do not know e how to do myself.
I was having some CSS issues and chatGPT was able to tell me what was going on.
IMHO it is a huge productivity booster, but only because I already know how to architect good software.
I’ve held off on some projects (but not others) completely because I needed help but I didn't trust the help to not put in backdoors or compete directly. I also needed to learn some concepts in advance.
With ChatGPT it has definitely been a 10x boost because I can trust it not to put in backdoors, at least for its own benefit. Its a great pair programmer, I dont take its solutions out of the box, I walk through them with it and iterate on improvements together. Instead of waiting for a response on StackOverflow and getting into a moderation war in the process, I just ask it and get instant responses that are far better than someone chasing digital karma points.
I’m coding in ways I wouldn't have, lots of bitshifting, assembly. I can just ask “how would this same procedure be done in assembly”, “why is the syntax like that”.
things that are both hard to ask and hard to google.
it predicts what it needs to explain, unprompted, instead of someone online that wants you to prove that you’ll take initiative in scouring documentation and trial and error yourself.
these are just some of the ways that its a productivity boost for me, let alone acknowledging the subcontractors and developers I simply didn’t seek out to hire.
Do you want a dystopia? Because that's how you get a dystopia :(.
I use ChatGPT when form matters but content doesn't, i.e. emails, letters, etc. I tell it the topic or the information that should be present and most of the time it generates a good email or letter.
If I need to find a specific piece of information I use FastGPT by Kagi, I'm consistently surprised by how good it is at finding exactly what I ask it, and it provides real references too.
"Given this json blob what is the typescript type?" "Given this type, make a function that loops over X and makes a map of Y?"
Or I use it to write code in languages I'm not familiar with. Heck I even used it to make some github actions cause I severely dislike yaml.
It did hallucinate a bit on the github actions (especially since it didn't know the latest api), but it put me on the right track. So I don't know, I think it enables me to do work in areas I normally wouldn't, like Java. It isn't always right, but it helps me immediately test ideas / iterate.
I think of it like reference I can ask questions without getting sass or having to worry about some toxic coworker wondering why I don't just know "x". It's not always right, but neither are my coworkers / humans. So I guess, treat it like a human aka fallible?
Lately, I've revised my view a bit. AI can get novices started on getting a project done and it can be a source of information for the project. Googling works but I think it's much faster querying an LLM. It's good for certain types of projects but I wouldn't trust it for important projects unless I can double check but the double checking will make it a drag to use. I think LLMs do have a place for novices but you have to be careful.
For simple things LLMs can excel but as you add complexity, you really need to double check
the produced playbooks passed the syntax check, that much i can admit. it went downhill from there, producing files with completely invalid keywords.
I'm assuming you are talking more about coding, in which case I agree.
10% is disruptive:)
GPT-4 is not immune to these issues, but it does much, much better.
Whenever I exceed GPT-4 quota (OpenAI has a hardware shortage) and switch to GPT-3 I remember how useless it is and I'm better off waiting a few hours for more GPT-4 quota.
Give it basic directions. I'm in my car turning right on spring garden onto Broad street. I Travel north 10 miles on the same road. Where am I? It has no clue
GPT4 nails your question about going east from New York in a boat, even though I think that’s a very poor test of ‘intelligence.’
I have no idea what you’re asking with your second question (what city are you talking about? There is no context), so I guess I, too, am not intelligent.
> My prompt:
What happens if you go due east exactly from NYC in a boat. Eventually you'll hit land where does it hit?
Start your response with "Let's work this out in a step by step way to be sure we have the right answer:" and work this out in a step by step way to be sure you have the right answer
> GPT-4 answer:
Let's work this out in a step by step way to be sure we have the right answer:
1. Firstly, the direction "due east" means you would be following a line of latitude, rather than a straight line on a globe. This is because the Earth is a sphere (more accurately, an oblate spheroid) rather than a flat surface.
2. New York City is located approximately at 40.71 degrees North latitude. If you were to go due east from NYC, you would be following this line of latitude across the Atlantic Ocean.
3. The first landmass you would likely encounter on this journey would be the westernmost points of mainland Europe, specifically the Iberian Peninsula which comprises Spain and Portugal. However, due to the curvature of the Earth and the specific latitude of New York, it is more likely that you'd hit the northern coast of Spain, around the region of Galicia.
4. So, based on this understanding, if you started in New York City and traveled due east, you would likely end up hitting the northern coast of Spain.
However, it's important to note that this answer assumes idealized conditions and that you're strictly adhering to the 40.71-degree North line of latitude. In reality, ocean currents, winds, and other factors might cause a vessel to drift from this line. Plus, the exact landfall can vary depending on the precise starting point in New York City, which is spread out over a fairly large geographical area.
---
> My prompt:
I'm in my car turning right on spring garden onto Broad street. I Travel north 10 miles on the same road. Where am I?
Start your response with "Let's work this out in a step by step way to be sure we have the right answer:" and work this out in a step by step way to be sure you have the right answer
> GPT-4 answer:
Let's work this out in a step by step way to be sure we have the right answer:
1. First, we need to identify where Spring Garden Street and Broad Street intersect. These are streets located in Philadelphia, Pennsylvania.
2. From the intersection, you said you would be turning right onto Broad Street, indicating that you are now heading north on Broad Street.
3. You then proceed to travel 10 miles north on Broad Street.
Determining the exact location after these steps is a bit more challenging because street maps generally don't provide mileage measurements. The scale can vary, and the actual distance can depend on the specific path taken.
However, to give a rough idea, Broad Street in Philadelphia is approximately 13 miles long from its southern end to its northern end. Given that Spring Garden Street is relatively central, traveling 10 miles north could possibly place you near or past the city limits of Philadelphia, but without more detailed map information, it's difficult to be more precise.
You'd likely end up in the vicinity of Cheltenham or Elkins Park, which are northern suburbs of Philadelphia, if the distance traveled is relatively straight, but again this could vary based on the specific path taken and the precise start location. This is a rough estimate and the exact location may vary. For precise navigation, it's recommended to use GPS or a reliable map application.
Getting it to code, it can save me, a fairly experienced coder, quite a lot of time because it can do some boring things - boilerplate stuff, or things I can’t be bothered to look up. And I can tell if it’s done it right or not.
I’ve seen people with less coding experience use it and just take its suggestions at face value with painful results and definite detriment to their learning.
The best test I’ve seen for what it’s good at is things that are hard to do, but easy to check.
But in any knowledge economy type job, there is no ceiling on skill. Creative, curious, intelligent people who are already high skill are also going to be the most effective at using a new and weird and tricky tool in interesting ways to develop their skills.
Generative AI will make some old skill obsolete, but people who are generically good at picking up new skills will be much more effective at wielding it. Those are the same people who were good at picking up the old skills, too.
Other commenters have pointed out the difficulties users with limited domaine expertise having in parsing the output and distinguishing what is useful, novel, interesting from what is trivial or incorrect.
This slows down the iterative process of learning how to use the tool as well as refining queries because they don’t have a big enough database of ground truth knowledge to calibrate against.
That being said these can work well as educational tools to rapidly get non-experts to a point where they can participate in conversations with experts. In that regard one might say that the bulk of the value is in the non-expert users because they are far more prevalent than experts.
The biggest advantage I saw for the expert researchers was that they knew (1) which big questions to ask, and (2) how to break down those big questions into smaller, more precise questions.
I don't think the paper matches the assertion made in the title of this article - it's talking about a very specific use-case for generative AI, which won't necessarily generalize to "in all cases, workers with less experience gain the most from these tools".
I grew up without a dishwasher (when we asked our parents why we didn't have one my dad responded, "what do you mean? We have six!" referring to us kids). As an adult we have one in the house but it is rarely used. Washing dishes is a menial task but I enjoy it. It's hard to articulate exactly why. Needless to say I'm quite an experienced dish washer, there's an art to it. My process is to collect any dishes that aren't already stacked, fill the sink with just enough water, wash everything well (cutlery in first, out second last, washed individually), then wipe down all the benches and sink. I leave the dishes to drip dry. They get put away later.
When we use the washing machine I still have to do most of the steps above. Except the washing which is 40-120mins of free time for me. But then I have an extra step of checking each dish or piece of cutlery while I'm putting it away. Most things get washed well, but about 10-15% of items have food baked onto them that I then need to soak or rewash (with a more abrasive scrubber than I'd usually use). Maybe this is down to my lack of experience stacking? The whole process takes longer, but it's (arguably) lower touch and (I hear) uses less water.
Which is better? In a commercial kitchen a dishwasher for sure, efficiency at all costs.
But do we need to remove all menial tasks from our workflows? I'm not so sure.
Also, just like getting a manual dishwashing routine figured out, dishwashers are all different and need to be learned like any tool. Understanding where pieces are, which cycles run when, which compartments open and close and even how they do it, are all important. It really is best to think of it as a tool.
Some useful videos:
I'm thinking patience and, I guess, the other meta skills and behaviours that aren't directly related to the washing itself.
That said, your pro-dishwasher explanation talks to some of these too.
When people claim that technology disadvantages workers it usually just reflects their prejudices about working people. So called knowledge workers like to think only they can leverage tech while the call center workers are fixed automatons. But in reality people at a lower skill ceiling have a much easier time leveling up.
ChatGPT when it works well is basically a Young Lady's Illustrated Primer (https://en.wikipedia.org/wiki/The_Diamond_Age). Giving underskilled people free access to information will always benefit them the most.
For example, for places where English isn't the first language, LLMs are widely used for content-based tasks (posts, emails, support), because waiting 10s for Bard or GPT to output something higher quality is worth their time. Perhaps these folks end up being more attuned than usual, more in harmony and sync with the LLMs?
Speaking from experience, it took a while for me to move from vim to vscode, from Javascript to Typescript. We probably prefer our silos and feel uber comfortable in existing setups.
Start your response with "Let's work this out in a step by step way to be sure we have the right answer:" and work this out in a step by step way to be sure you have the right answer
It will improve your answers significantly (for example, on word math problems, solve rate went from 18% to 79%).Source: https://github.com/openai/openai-cookbook/blob/main/techniqu...
If your team can't autocomplete APIs, and if you can't run everything locally so you can step through your code with a debugger, and if you don't have good tests so you need an AI to tell when you broke something (see also profiling, telemetry, release quickly and confidently, etc) - you're going to get a million percent more mileage just doing these sort of basics correctly (and fast) vs investing in AI because the open secret is that most of engineering time is spent reading and debugging code rather than writing it. This should be table stakes but so many engineering teams (even at bigtech companies) can't do the above. So they're all signing up for copilot but they still iterate with logging statements and catch regressions in production.
1. Google replacement for simple queries. Like "how do I HTTP POST with go?". Saves like 30 seconds of browsing.
2. Google replacement for vague questions. Like "Which ingress controllers are popular and what's their strengths".
3. Procrastination eliminators. Like "I need an identifier for ..., can you suggest me some proper names?".
4. Boring tasks which could be solved with enough bash scrpipts or regexes, but with ChatGPT it's often faster.
5. Most impressive to me: "I want to format the following YAML according to the following rules: <vague imprecise rules>. Example: ....". This sometimes saves me really lots of time, because it acts like a human and able to follow imprecise instructions. I could either do it myself spending lots of time or write some formatter spending, again, lots of time.
I was never able to utilize it for "smart" tasks, like if I couldn't solve something and it solved it for me. It's not smart yet. But it's an useful tool in my toolbelt, definitely worth $20 and saves me time and sanity for some boring tasks.
However, in all cases I have seen a very substantial decline of the capabilities of chatgpt-4 with the last few releases. E.g. it used to get code snippets most often right before. Whereas now it tends to be wrong most of the time. Usually it conflates capabilities of several distict libraries, just hallucinating (extrapolating) some non existing functions or attributes.
I personally suspect they are 'cleansing' the training data, and/or driving severe 'schizophrenia' into the model through conflicting RLHF.
feels like there is a viewpoint floating around that chatgpt is valuable in part because of the RLHF work
but does this makes it generally capable? or just prove that you can hire essentially the people you would hire to staff a call center (language skills + good generalists) to create a language model that behaves essentially like a call center (refuses to take any actions that are not allowed)
I’ve been in tech 18~ years and get a lot out of copilot and chatGPT every day.
Take for example this automated help that I found only minutes ago - and seems to prove that generators can be racist (...I am joking):
For example, we all have different weights for how much trust information we get from, e.g, Wikipedia, stackoverflow, high profile bloggers in your domain, conspiracy websites, listacles, blog-spam sites etc, marketing materal etc. If the info is on a social site, is highly upvoted? Is the info from an academic journal, arxiv.org, an academic blog, or a wordoc you downloaded from scribd [1]?
Yes, this is judging a book by it's cover, but it's a heuristic that tends to work well.
By contrast, LLMs present all information to you with the same confidence in the same homogeneous interface. There is no external context. So all your normal heuristics for judging reliability are broken. -
when i asked are you sure about that it owned up and and said sorry. wtf?