Diff Models – A New Way to Edit Code (opens in new tab)

(carper.ai)

237 pointssadiq3y ago199 comments

199 comments

100 comments · 26 top-level

pavlov3y ago· 42 in thread

Somehow these GitHub-trained ML code assistants sadden me.

My idea of enjoyable high-quality programming isn’t to dip a spoon into an ocean of soup made of other people’s random design decisions and bugs accumulated over fifteen years, hoping to get a spoonful without hidden crunchy insect bits.

I know the soup is nutritious and healthy 98% of the time, and eating it saves so much time compared to preparing a filet mignon myself. But it’s still brown sludge.

credit_guy3y ago

Take a look at the average faces of women across different countries [1]. They are all strikingly beautiful.

By averaging, a lot of imperfections get diluted away.

Like in Anna Karenina "happy families are all alike, unhappy ones are each in its own way". The defects are idiosyncratic, the commonalities are good.

[1] https://fstoppers.com/portraits/average-faces-women-around-w...

kristopolous3y ago

That's unrelated.

Multi-sourced accumulated unmaintained amateur software without clear provenance or ownership is more like creating a feature I'll call "insta-legacy": now you're responsible for a bunch of code you didn't write that by definition nobody you have access to understands.

This is absurd.

It's not going to stop people from doing it. The industry is clinically insane.

It allows people who do bad work to do more of it quickly. Before they had to manually shovel garbage into projects but now they have a dumptruck.

You know what? It might be fine. Maybe we're going to have a world of fast food programming where minimum wage coders pump out trash and there's going to be Michelin star programmers where you go to for the real stuff.

If that's the case, we'll have to somehow educate the public on the difference so they don't think it's the same thing. McDonald's and The French Laundry are both successful restaurants. That world is possible in programming as well.

It might already be like that. The cheap rates for shady contracting firms that do trash work are probably already using these things

4 more replies

RjQoLCOSwiIKfpm3y ago

Failure is an extremely common and accepted thing in biological systems - your offspring may just die if you have incompatible genes.

Software on the other hand is a logical environment with clear, logical requirements. It ought to work, not just fall apart randomly.

There is no guarantee that sticking the average of one software into a completely different one will satisfy the logical requirements by any means whatsoever.

2 more replies

lou13063y ago

I'm afraid this analogy doesn't hold much water: facial features inhabit a continuum with very nice smoothness properties, whereas program behaviour changes dramatically when you perturb its source code, even minimally (this is why mutation testing is effective, for instance).

Also, when you average you don't really kill "defects", but rather outliers. An "outlier" statement within a program is very likely to do something important, e.g. taking care of a corner case, otherwise it wouldn't be there.

amelius3y ago

"You look average" will be my new pickup line :)

gavinray3y ago

I like this analogy, and I'd never seen this before, thanks for sharing.

(My experience with Copilot and ML-assisted programming has been extremely positive, I would not choose to go without it at this point)

1 more reply

hoosieree3y ago

On the other hand, you can pick the average and be wrong every time: https://www.thestar.com/news/insight/2016/01/16/when-us-air-...

SergeAx3y ago

In my opinion, they are not strikingly beautiful. They are really average good-looking. To be strikingly beautiful a face needs some outstanding features (thus, in fact, "strikingly").

Also, unrelated: Leo Tolstoy had his number of problems with his wife and wrote them into Anna Karenina. In fact, it is quite contrary: most disfunctional families fall into several textbook scenarios, while happy families has their own distinct inner dynamics, just looking the same on the outside.

agilob3y ago

> The study also does not reveal how the participants were selected or how large the sample size actually is.

This "study" and your argument are meaningless.

avgcorrection3y ago

Wait. Facial symmetry is beautiful now? Dang it, I didn’t get the memo.

1 more reply

pjc503y ago

Given all the discussion about "supply chain security", heading in this direction is surprising. I guess it means we automate away the creative part and leave the humans to the duller work of validation. Everyone's going to become a software tester.

krono3y ago

  > Futurists in 1950: Automation will free mankind from meaningless tedium to focus on creative pursuits only humans can master.
  > Techbros in 2023: We coded AI to write all your books, music, and TV so you can focus on the meaningless tedium of your cubicle farm.

From this popular tweet by @stealthygeek https://twitter.com/stealthygeek/status/1618997354199400449

1 more reply

yowzadave3y ago

Isn’t this the same pattern that other fields have followed as they gradually became more automated? A hundred years ago, a cobbler was a skilled craftsperson who combined creative problem-solving with a variety of techniques to hand-make shoes to the precise specifications of each foot. Today, shoes are made in a factory, with humans limited to watching the line to catch the occasional defect.

1 more reply

franga20003y ago

The main problems with software supply chain security are that developers using libraries don't read their code and that that code can be changed later by the author. Neither of those are problems with "AI"-generated code - it lives right in your source files and you have to be actively avoiding reading it to miss critical issues that you're familiar with.

3 more replies

DennisP3y ago

For now at least, I think it's more likely to automate the boring parts of programming so we can stay focused on the creative stuff. It'd be like the first result of a quick google search always gives a concise blog post covering the exact language features and library calls we're looking for.

skybrian3y ago

Well, code reviewer anyway. Might be a good idea to add “includes unit tests” in the prompt?

rileymat23y ago

I agree with you, however, the work I see is not that.

What I see is a person who copy and pastes crap around until it works and calls it a day. I think code assistants can and will compete with them.

boredemployee3y ago

Since everyone has different goals and opinions on this, etc, many people will see it in a different way.

I love to solve _problems_ and to help people with it, but sometimes I just hate to write code to solve them. I wish my computer could have a clear picture of the solution that is in my mind so I didnt have to write a single line of code, so I could focus on the creative part of the problem solving

sanderjd3y ago

Unfortunately, I'm skeptical that this is going to end up looking much like "my computer has a clear picture of the solution in my mind so that I don't have to write a single line of code". I fear it's a siren song that's going to drag many of us into the shoals.

But I totally agree with you that it would be a positive outcome if we spent less time writing lines of code and more time using better tools to direct computers in solving problems and (I think just as critically) understanding the dynamics of those solutions. A major facet of my skepticism is that I think progress on that second part seems to be lagging way behind...

I foresee a lot of "we had a team, who have all now left, that used AI to write this system and it's mostly working right except in all these ways, and you need to fix it, good luck!" in all of our futures.

1 more reply

Pandabob3y ago

OpenAI Is reportedly hiring dev contractors to teach the new version of their Codex model[0].

[0]: https://www.semafor.com/article/01/27/2023/openai-has-hired-...

echelon3y ago

There goes the software career.

The six figure salaries won't last another decade. For some of us, maybe, but certainly not most of us.

Learn AI now.

Good luck, everyone.

2 more replies

GuB-423y ago

I am not much into ML code assistants either, though it may change in the future as technology becomes better and more reliable.

But I don't buy the "joy of writing code" argument. Coding is all about making a computer work for you, and I think that taming AIs to be more efficient without letting it introduce random crap will become both important and enjoyable. I think the techniques we have now are too crude for that, but it will improve. Keep in mind that even if you are writing C, you are already at high level, using libraries and compilers other people wrote, bugs included.

Now there is a certain charm being close to "hands on" programming, but if that's the case, go get an Amiga and make a few demos. It won't pay the bills, but it can be fun.

discreteevent3y ago

>But I don't buy the "joy of writing code" argument. Coding is all about making a computer work for you

That is an absolutely valid point of view. But it doesn't apply to everyone. Programming is something that can take me into the zone like nothing else. And it has the added side effect of making me think more precisely about higher level problems as well. It's one of those excercises that help me stop fooling myself (in the Feynman sense).

2 more replies

netr0ute3y ago

I don't know about this. I would actually expect the opposite, where the soup is just fine 98% of the time and sweet the last 2% because that's the "good stuff" that you can get creative on because the AI doesn't know how to help you.

derefr3y ago

> an ocean of soup made of other people’s random design decisions and bugs accumulated over fifteen years

As a human programmer, is this not what your own brain looks like? What are you doing to the information you take in that allows you to avoid regurgitating the "crunchy insect bits" of your own training corpus?

gfodor3y ago

These tools don’t introduce design decisions or other things that really constitute most of the “art” of programming. They just help you with the lowest grain bits of moving data around. This is probably a temporary condition but your concern here seems misplaced given where the tools are at today.

Der_Einzige3y ago

You didn't write your prompt well enough. I can ask it to design the software component and code it.

carlbarrdahl3y ago

What if you could inspire the assistant with code you like and it would generate in that style? For example choose a few repos with code-bases you want to mimic, give it a set of instructions (and perhaps structure), and it generates code for it.

Maybe something like GPT, style transfer, and OpenAPI combined.

indeyets3y ago

Well, these are not tools for the art-level programming. But it helps to improve productivity of commercial programming a lot. Different genre

nbardy3y ago

This isn't how large language models work. Deep Features are much more rich than this. It's not random the models have their own sense of taste, and you can easily control it with comments specifying what you care about in the code you are going to write.

SergeAx3y ago

It's okay, you don't have to use AI assistants to program.

neximo643y ago

And yet it is so useful. It is just an assistant. It's quite unlike soup, since you can easily alter it.

indeyets3y ago

I think it might be compared of "cook it yourself" kits of ingredients. Good base but you can alter to your liking

2 more replies

nikau3y ago

Is a bold assumption to assume most code these days isn't just a series of copy paste fragments from stack overflow anyway.

Zetobal3y ago

Eh... I don't have the desire to do all the plumbing in my house and neither in my code.

mstade3y ago

I'm afraid you may have chosen the wrong profession, to be honest. Programmers – myself included – are essentially glorified plumbers, piping data from one end to another. As much as we'd all like to think we're all architects, let's just be honest.

That's not to say plumbing doesn't take skill, it certainly does, but the point of it is that nobody except the next plumber cares how the pipes are laid out, so long as it works and works well. It's when one blows and you have to fix it, or install a second bathroom, that shit really tends to come out. If I'm the one that has to do the work, I can only hope the previous plumber had some idea of what they were doing, and didn't just leave it entirely to automation.

But if they did I'd hope they trust, but verify.

4 more replies

avgcorrection3y ago

Programmers when the plumbing malfunctions: Darn it, why are all abstractions leaky! Why can’t I just plug A and B together and have them work seamlessly! Why is everything BROKEN

Programmers when the plumbing works: But I don’t want to just stitch components together! This is boring.

Jeff_Brown3y ago

Plumbing is tricky. The only people I've heard denigrate plumbers don't know anything about it.

fortyseven3y ago

So don't use it?

williamcotton3y ago

How much of your identity is made up of “programmer”? Are you proud or hesitant to tell people you’re a programmer? Do you identify as a “painter” or anything else? How often do you compare yourself to other programmers and feel bad?

Der_Einzige3y ago

Americans absolutely identify themselves first by their job and second by everything else.

There's a reason they call Europeans "europoors". There are advantages to identifying yourself by your work.

1 more reply

thuuuomas3y ago

Do you truly believe yr StableDiffusion Americana is “not like what anyone else is making”? You’ve got some knots to untangle :)

1 more reply

RjQoLCOSwiIKfpm3y ago· 11 in thread

Prepare for household appliances - washing machines etc. - doing strange things randomly.

Prepare for the same thing with electronics which you didn't consider as containing much software before - central heating units, AC units, fridges, stoves, light switches, LED light bulbs, vacuum cleaners, electric shavers, electric toothbrushes, kids toys, microwave ovens, really anything which consumes electricity.

Prepare for the support of the vendors of those appliances not taking phone calls anymore, only text communication.

Prepare for the support not understanding the random problems you encounter.

Prepare for the answers you get from support being similarly random.

And maybe, with an unknown probability, prepare for your house burning down and nobody can tell you why.

ly3xqhl8g93y ago

Perhaps certain consumer electronics should come with a label "Programmed by Humans", such as the "Free-Range/Cage-Free" labels.

vagabund3y ago

It's funny seeing the same people who blithely told blue collar workers to "just learn how to code" now act like luddites when innovation comes for their skillset.

Just learn how to be a plumber.

2 more replies

albert_e3y ago

Was thinking exactly the same thing

"No GMO"!

1 more reply

dvngnt_3y ago

yeah I want human compilers none of that GCC crap

1 more reply

oldgradstudent3y ago

> Prepare for the support of the vendors of those appliances not taking phone calls anymore, only text communication.

Even worse, prepare for them to enthusiastically take calls.

napier3y ago

Prepare for your in-house-mind-core -- $10,000 of neural SoCs running a society-chain of LLM based OS and included as standard with all $400,000 or more home purchases -- conversationally debugging and providing pseudo-psychological support to your toaster, fridge, cleaning/security-bot-hive and other assistant golem appliances imbued with pseudo-sapience.

roarcher3y ago

> Prepare for the support of the vendors of those appliances not taking phone calls anymore, only text communication.

Don't worry, someone will plug ChatGPT into a text-to-speech model soon enough, and market it as a way to put the personal touch back into customer support. Maybe they'll even give it a folksy accent.

indeyets3y ago

You imply, that such tools would lead to lower quality of code. I actually hope for the opposite.

This is not a tool for generating applications using statistical methods (we have a lot of tools which do that already), but a tool for assisting human persons by taking boring/repetitive tasks from them and letting us focus on the meaning, the goal

RjQoLCOSwiIKfpm3y ago

If my house burns down due to random bugs in a big appliance, do you think the random underpaid 3rd world developers which will be used do care about that?

I think this will lead to extreme cost cutting measures in choice of the developers which are used.

People who would have previously been totally ineligible to develop software will happily be chosen.

And they won't care about the garbage code they produce as long as it somehow seems to work from the outside.

They'll care about feeding their families in the dire situation they are in, not more.

5 more replies

sanderjd3y ago

I hope for the opposite as well, but I think it's a false hope.

I think my intuition is that the average quality of software may well improve (good!) but that when issues arise they will be more obscure and harder to debug and fix, because nobody will know what the system is actually doing.

agumonkey3y ago

> Prepare for the support of the vendors of those appliances not taking phone calls anymore, only text communication.

this is a fun pattern I've seen play in other industries

abhijeetpbodas3y ago· 6 in thread

On a philosophical level, AI for writing code has always seemed redundant to me. Here's why:

1. Humans create programming languages which machines can understand. OK.

2. Humans build tools (LSP, treesitter, tags, type checkers and others) to help humans understand code better. OK.

3. Humans build (AI) programs which run on machines so that the computer can understand... computer programs???

Aren't computers supposed to be able to understand code already? Wasn't the concept of "computer code" created so as to have something which the computer could understand? Isn't making a (AI) program to help the computer understand computer programs re-inventing the wheel?

(Of course, I get that I use the terms "understand" and "computer programs" very loosely here!)

manmal3y ago

As long as we don’t have „level 5“ code generation (no human oversight necessary), we need the code to be human readable. Afterwards, sure, why not produce assembly directly. Still it might be more practical to produce platform independent code instead - you‘ll only need to train one model instead of one per platform.

semitones3y ago

The benefit here is that the machine can execute what the AI produces, and humans can understand it / modify it if they need to.

wankle3y ago

It can be seen as a benefit or a cautionary tale. Earlier in the comments, someone claimed ChatGPT gave a recipe for a omelet made with 2 to 3 cow eggs. If instead of putting a recipe on the screen, the AI was connected to a cow and a frying pan...OW!

1 more reply

jrvarela563y ago

The impact of context in LLM performance makes higher level languages a must for AI to generate programs. The AI doesn't 'understand' code like a 'computer' does - it understands it like we do using text to express logic.

Arguably, we would benefit from even higher level abstractions so the LLM can fit more logic in a single prompt/output.

divs12103y ago

Good point!

Maybe a future AI could generate machine code that could be "disasssembled" into higher level languages.

Not sure if that would be better.

elcomet3y ago

Yeah you do, machines execute code but don't understand it.

indeyets3y ago· 5 in thread

So, it is loosely the same as copilot? I understand that approach is a tad different, but result of converting natural language descriptions into code-changes should be comparable.

And both are trained on large corpus of github sources

Is there a way to test it somehow? Public API maybe?

Kiro3y ago

> converting natural language descriptions into code-changes

Do people actually use Copilot for that? I just let it work its magic uninstructed. I guess it sometimes uses comments and function/variable names for its suggestions but that's about it. 99% of the time it just looks at my code, the context and neighboring files to predict what I'm trying to do.

noncovalence3y ago

I've found writing a temporary comment can be particularly useful when working with Unicode. For example, something similar to

//insert a unicode dot between each character in the string, and convert the numbers to subscript

saved me a lot of copy-pasting.

indeyets3y ago

I use both. Sometimes it feels easier to write five words of text than starting to write code.

elcapitan3y ago

I use it most of the time as smart auto-completion as well, but sometimes for boilerplate it helps to just write a comment what you want to achieve, basically like a ChatGPT prompt.

bil73y ago

for my day job, no, not frequently. When I'm writing in an unfamiliar language like bash or something, I'll do a little # implement a function that does x, y and z

Kwantuum3y ago· 3 in thread

A lot of the comments seem to talk about the inevitable AI event horizon but unless I'm misreading this article the results are flat out bad. Even the 6 billion parameters model barely scratches a 50% success rate on a tiny problem that is trivial to fix for any human with basic knowledge of programming. Note the log scale of the graph.

hellodanylo3y ago

Yeah, I am also struggling to interpret the metrics in this post positively.

The 50% success rate is also best out of 3200 completions. For best out of 1 completion, the success rate is in low single digits.

I think the lesson here is that these models bring a lot more value when: 1. you have unit tests, 2. can afford compute/time to let the model try many solutions, 3. have enough isolation to run unverified code.

zaidhaan3y ago

They do note that the models "tend to do better when prompted with longer code generation tasks".

But yes, the choice of scales for the graph was rather peculiar.

kdnvk3y ago

6 billion is by no means large.

spapas823y ago· 3 in thread

I'd really like to see how this would work with my commits... 99% of the messages on my commits are single word, similar to:

- ok

- fix

- done

- test

- nice

prettyStandard3y ago

Garbage in garbage out.

You should fix that.

alchemist1e93y ago

Yeah I bet the people working with them really love the commit messages /s

Or more likely they are working alone.

1 more reply

dizhn3y ago

There's a character limit to commit messages in their training data.

DominikPeters3y ago· 2 in thread

It would have been helpful to show some example generations of the model, unless I've missed them.

bogwog3y ago

This is all I could find: https://twitter.com/carperai/status/1619082410213404672

jerpint3y ago

Yes I agree. My understanding is you have in training dataset the original code, diff + commit message.

So you train the LM to:

Input: code+commit output: diff

moconnor3y ago· 1 in thread

All that to end with “no meaningful improvement over the salesforce codegen model” is a bit disappointing.

Negative results are interesting in their own right. I’d rather read about why this isn’t better at the 6B parameter level than e see a hand wave that, well, the samples are more diverse and look the 350M model is better.

youssefabdelm3y ago

Yeah I felt the same way. Although perhaps at a higher scale the fine-tuning can make a bigger difference? The results go against this hypothesis but at least OpenAI states that GPT-3 only needs 200 examples, so who knows. In fact I wonder how well GPT-3 would do against this when fine-tuned on just 200 examples.

indeyets3y ago· 1 in thread

Related discussion: https://news.ycombinator.com/item?id=33271750

return_to_monke3y ago

while from the same company, same lab, I don't think this is what the article you linked is about. To me, that seems like a general purpose LLM and this just for code.

startupsfail3y ago

From the safety perspective (may get important soon), it is perhaps a very bad idea to allow easy execution/injection of arbitrary code into random places with little review.

One of the first steps of a misaligned/unhelpful/virus type of a system, attempting to secure its presence would likely be inference/GPU/TPU compute access. And code injection is a vector. There are multiple other vectors.

When designing such systems, please do keep that in mind. Make sure code changes are properly signed and the originating models are traceable.

Same applies to datasets generated by models.

jakear3y ago

Excellent. This is the beginning of the end for the cohort of people writing clear, descriptive commit messages. All your knowledge is soon to be acquisitioned and commodified by the Man with the GPU.

I on the other hand will survive: what sense is an AI to make of such classic messages as David Bowie's excellent "ch-ch-changes!", the five "fix CI maybe???"s in a row, or the eternal "fuck this shit"?

PoignardAzur3y ago

We're still in the beginning for these tools, but already they're demonstrating some really exciting capacity.

Something I haven't seen explored too much: navigation help. One of the things that takes me the most time when coding is remembering what was the next file / module / function I need to edit and jumping to it.

An autocomplete engine that would suggest jump locations instead of token could help me stay in the flow much longer, with fewer worries about whether I'm introducing subtle bugs because I'm relying on the AI too much.

lettergram3y ago

I view programming as a trade. I’ve spent years honing my skills, I pass wisdom to junior engineers as I can. I review code and provide detailed alternatives.

My concern with AI across all fields are that people won’t gain the fundamental skills necessary for moving the bounds of what’s possible. Certainly, tools like this AI could produce good results. However, the underlying human is still providing the training data. More importantly, humans are producing the trajectory of development.

If humans are no longer capable of pushing the AI systems. Then the AI systems will either cease to improve, or the AI systems will learn to play off each other. In highly complex systems like many programs, I suspect they’ll play off each other and achieve local minimum/maximum locations. Ie because the “game” (program development) can be iterative they’ll constantly improve code. However, because the AI systems don’t interact with all data (particularly real-world data) when a customer shows a sad face at some UI/UX, it won’t completely develop a new feature that matches the desires of the customer.

Where I fear this will leave us is a class of less-skilled engineers and overly optimized AI. Basically, stuck in development.

ilaksh3y ago

Since I am building a website https://aidev.codes to do programming based on natural language descriptions, this is extremely relevant to me.

OpenAI has an 'edit' endpoint but it's 'in beta' and limited to 10-20 requests per minute. They do not acknowledge support requests about this. Azure OpenAI also has this endpoint I think but they ignore me as well.

So for my edits just like everything else I have been relying on text-davinci-003 since it has much more feasible rate limits. I have just been having it output the full new file but maybe this Unified Diff thing is possible to leverage.

Does anyone know, what would be the easiest way to try to run their 6B Diff Models thing against my own prompts for my service? Maybe Hugging Face?

mortehu3y ago

I wrote the program between <BEF> and <MSG>, and it generated the following output:

  <NME> diff_model.py
  <BEF> import argparse

  import torch
  import transformers

  def main():
      argparser = argparse.ArgumentParser()
      argparser.add_argument('--checkpoint', default='CarperAI/diff-codegen-2b-v2', choices=['CarperAI/diff-codegen-6b-v2', 'CarperAI/diff-codegen-2b-v2', 'CarperAI/diff-codegen-350m-v2'], help='Model to use')
      args = argparser.parse_args()

      model = transformers.AutoModelForCausalLM.from_pretrained(args.checkpoint)
      tokenizer = transformers.AutoTokenizer.from_pretrained(args.checkpoint)

      # Use CUDA.
      model = model.cuda()

      text = '<NME> diff_model.py\n<BEF> '
      text += open("diff_model.py").read()
      text += '<MSG> Print a message after loading the model\n<DFF>\n'

      completion = model.generate(tokenizer.encode(text, return_tensors="pt").cuda(), max_new_tokens=400)

      completion = tokenizer.decode(completion[0], skip_special_tokens=True)

      print(completion)


  if __name__ == '__main__':
      main()
  <MSG> Print a message after loading the model
  <DFF>
  <DFF> @@ -1,4 +1,5 @@
   '''
  +import sys
   import argparse

   import torch
  @@ -8,6 +9,7 @@ def main():
       tokenizer = transformers.AutoTokenizer.from_pretrained(args.checkpoint)

       # Use CUDA.
  +    print("CUDA:", torch.cuda.is_available())
       model = model.cuda()

       text = '<NME> diff_model.py\n<BEF> '
  @@ -10,3 +12,4 @@ def main():
       completion = tokenizer.decode(completion[0], skip_special_tokens=True)

       print(completion)
  +    sys.exit(0)
  \ No newline at end of file
  [\ No newline at end of file repeated many times]

It takes about 150 seconds to run on a 3090 Ti when the model is already on disk.

Epa0953y ago

Maybe this can give a boost for languages like idris or F*, where you can specify much stronger types than in normal languages (with the price that you might have too proove the types manually). The types can help "tame" the AI generated code, and the AI can help generate the proofs.

I also wonder if it could be useful in creating Coq proofs!

wslh3y ago

Very opportune. I am working on security diffs before and after security audit commits [1] reading the whole piece.

[1] https://news.ycombinator.com/item?id=34360102

parasti3y ago

I skimmed the post, but it seems not much was said about how the original diffs are generated. Git generates diffs only on request with varying levels of accuracy depending on the options given. Sometimes the diff completely fails to capture the intent of the change - it shows the path from A to B but not in any semantically meaningful way.

ec1096853y ago

2022: engineers with 3 jobs

2023: engineers with their own AI model, typing “#fixed bugs” and spending the rest of the day by the pool.

Jackson__3y ago

I'm not sure if I'm just imagining it, but there seems to be a lot more negative push-back online to this than there was for copilot.

It makes me wonder if it's related to recent protests in other creative fields in response to AI models, or just a weird dislike of openly released model weights?

abdnafees3y ago

Why now? I mean it's been only 20 odd years or so since modern programming became popular. And, it's not a lot. Let people learn how to code, make mistakes and then learn from those mistakes. Pre-cooked meals are not as good as home cooked goodness.

pklausler3y ago

How good are these LLMs going to be at debugging code, as opposed to writing it?

tbrownaw3y ago

Sounds like basically the inverse of what was on here the other day about automatically generating commit messages from a diff.

Sounds kinda cool, even if trusting it would be a terrible idea.

leo20233y ago

The next idea after this could be: developers draw a system diagram of the architecture, then AI writes the whole system E2E, high performance, distributed.

shul3y ago

Why all the hate? I for one welcome our AI overlords

shireboy3y ago

If this thing is trained on my commit messages we’re all doomed. Or else we’ll be able to type “fixed the thing” and have a whole app written.

j / k navigate · click thread line to collapse

199 comments

100 comments · 26 top-level

pavlov3y ago· 42 in thread

Somehow these GitHub-trained ML code assistants sadden me.

I know the soup is nutritious and healthy 98% of the time, and eating it saves so much time compared to preparing a filet mignon myself. But it’s still brown sludge.

credit_guy3y ago

Take a look at the average faces of women across different countries [1]. They are all strikingly beautiful.

By averaging, a lot of imperfections get diluted away.

Like in Anna Karenina "happy families are all alike, unhappy ones are each in its own way". The defects are idiosyncratic, the commonalities are good.

[1] https://fstoppers.com/portraits/average-faces-women-around-w...

kristopolous3y ago

That's unrelated.

This is absurd.

It's not going to stop people from doing it. The industry is clinically insane.

It allows people who do bad work to do more of it quickly. Before they had to manually shovel garbage into projects but now they have a dumptruck.

It might already be like that. The cheap rates for shady contracting firms that do trash work are probably already using these things

4 more replies

RjQoLCOSwiIKfpm3y ago

Failure is an extremely common and accepted thing in biological systems - your offspring may just die if you have incompatible genes.

Software on the other hand is a logical environment with clear, logical requirements. It ought to work, not just fall apart randomly.

There is no guarantee that sticking the average of one software into a completely different one will satisfy the logical requirements by any means whatsoever.

2 more replies

lou13063y ago

amelius3y ago

"You look average" will be my new pickup line :)

gavinray3y ago

I like this analogy, and I'd never seen this before, thanks for sharing.

(My experience with Copilot and ML-assisted programming has been extremely positive, I would not choose to go without it at this point)

1 more reply

hoosieree3y ago

On the other hand, you can pick the average and be wrong every time: https://www.thestar.com/news/insight/2016/01/16/when-us-air-...

SergeAx3y ago

In my opinion, they are not strikingly beautiful. They are really average good-looking. To be strikingly beautiful a face needs some outstanding features (thus, in fact, "strikingly").

agilob3y ago

> The study also does not reveal how the participants were selected or how large the sample size actually is.

This "study" and your argument are meaningless.

avgcorrection3y ago

Wait. Facial symmetry is beautiful now? Dang it, I didn’t get the memo.

1 more reply

pjc503y ago

krono3y ago

  > Futurists in 1950: Automation will free mankind from meaningless tedium to focus on creative pursuits only humans can master.
  > Techbros in 2023: We coded AI to write all your books, music, and TV so you can focus on the meaningless tedium of your cubicle farm.

From this popular tweet by @stealthygeek https://twitter.com/stealthygeek/status/1618997354199400449

1 more reply

yowzadave3y ago

1 more reply

franga20003y ago

3 more replies

DennisP3y ago

skybrian3y ago

Well, code reviewer anyway. Might be a good idea to add “includes unit tests” in the prompt?

rileymat23y ago

I agree with you, however, the work I see is not that.

What I see is a person who copy and pastes crap around until it works and calls it a day. I think code assistants can and will compete with them.

boredemployee3y ago

Since everyone has different goals and opinions on this, etc, many people will see it in a different way.

sanderjd3y ago

1 more reply

Pandabob3y ago

OpenAI Is reportedly hiring dev contractors to teach the new version of their Codex model[0].

[0]: https://www.semafor.com/article/01/27/2023/openai-has-hired-...

echelon3y ago

There goes the software career.

The six figure salaries won't last another decade. For some of us, maybe, but certainly not most of us.

Learn AI now.

Good luck, everyone.

2 more replies

GuB-423y ago

I am not much into ML code assistants either, though it may change in the future as technology becomes better and more reliable.

Now there is a certain charm being close to "hands on" programming, but if that's the case, go get an Amiga and make a few demos. It won't pay the bills, but it can be fun.

discreteevent3y ago

>But I don't buy the "joy of writing code" argument. Coding is all about making a computer work for you

2 more replies

netr0ute3y ago

derefr3y ago

> an ocean of soup made of other people’s random design decisions and bugs accumulated over fifteen years

gfodor3y ago

Der_Einzige3y ago

You didn't write your prompt well enough. I can ask it to design the software component and code it.

carlbarrdahl3y ago

Maybe something like GPT, style transfer, and OpenAPI combined.

indeyets3y ago

Well, these are not tools for the art-level programming. But it helps to improve productivity of commercial programming a lot. Different genre

nbardy3y ago

SergeAx3y ago

It's okay, you don't have to use AI assistants to program.

neximo643y ago

And yet it is so useful. It is just an assistant. It's quite unlike soup, since you can easily alter it.

indeyets3y ago

I think it might be compared of "cook it yourself" kits of ingredients. Good base but you can alter to your liking

2 more replies

nikau3y ago

Is a bold assumption to assume most code these days isn't just a series of copy paste fragments from stack overflow anyway.

Zetobal3y ago

Eh... I don't have the desire to do all the plumbing in my house and neither in my code.

mstade3y ago

But if they did I'd hope they trust, but verify.

4 more replies

avgcorrection3y ago

Programmers when the plumbing malfunctions: Darn it, why are all abstractions leaky! Why can’t I just plug A and B together and have them work seamlessly! Why is everything BROKEN

Programmers when the plumbing works: But I don’t want to just stitch components together! This is boring.

Jeff_Brown3y ago

Plumbing is tricky. The only people I've heard denigrate plumbers don't know anything about it.

fortyseven3y ago

So don't use it?

williamcotton3y ago

Der_Einzige3y ago

Americans absolutely identify themselves first by their job and second by everything else.

There's a reason they call Europeans "europoors". There are advantages to identifying yourself by your work.

1 more reply

thuuuomas3y ago

Do you truly believe yr StableDiffusion Americana is “not like what anyone else is making”? You’ve got some knots to untangle :)

1 more reply

RjQoLCOSwiIKfpm3y ago· 11 in thread

Prepare for household appliances - washing machines etc. - doing strange things randomly.

Prepare for the support of the vendors of those appliances not taking phone calls anymore, only text communication.

Prepare for the support not understanding the random problems you encounter.

Prepare for the answers you get from support being similarly random.

And maybe, with an unknown probability, prepare for your house burning down and nobody can tell you why.

ly3xqhl8g93y ago

Perhaps certain consumer electronics should come with a label "Programmed by Humans", such as the "Free-Range/Cage-Free" labels.

vagabund3y ago

It's funny seeing the same people who blithely told blue collar workers to "just learn how to code" now act like luddites when innovation comes for their skillset.

Just learn how to be a plumber.

2 more replies

albert_e3y ago

Was thinking exactly the same thing

"No GMO"!

1 more reply

dvngnt_3y ago

yeah I want human compilers none of that GCC crap

1 more reply

oldgradstudent3y ago

> Prepare for the support of the vendors of those appliances not taking phone calls anymore, only text communication.

Even worse, prepare for them to enthusiastically take calls.

napier3y ago

roarcher3y ago

> Prepare for the support of the vendors of those appliances not taking phone calls anymore, only text communication.

indeyets3y ago

You imply, that such tools would lead to lower quality of code. I actually hope for the opposite.

RjQoLCOSwiIKfpm3y ago

If my house burns down due to random bugs in a big appliance, do you think the random underpaid 3rd world developers which will be used do care about that?

I think this will lead to extreme cost cutting measures in choice of the developers which are used.

People who would have previously been totally ineligible to develop software will happily be chosen.

And they won't care about the garbage code they produce as long as it somehow seems to work from the outside.

They'll care about feeding their families in the dire situation they are in, not more.

5 more replies

sanderjd3y ago

I hope for the opposite as well, but I think it's a false hope.

agumonkey3y ago

> Prepare for the support of the vendors of those appliances not taking phone calls anymore, only text communication.

this is a fun pattern I've seen play in other industries

abhijeetpbodas3y ago· 6 in thread

On a philosophical level, AI for writing code has always seemed redundant to me. Here's why:

1. Humans create programming languages which machines can understand. OK.

2. Humans build tools (LSP, treesitter, tags, type checkers and others) to help humans understand code better. OK.

3. Humans build (AI) programs which run on machines so that the computer can understand... computer programs???

(Of course, I get that I use the terms "understand" and "computer programs" very loosely here!)

manmal3y ago

semitones3y ago

The benefit here is that the machine can execute what the AI produces, and humans can understand it / modify it if they need to.

wankle3y ago

1 more reply

jrvarela563y ago

Arguably, we would benefit from even higher level abstractions so the LLM can fit more logic in a single prompt/output.

divs12103y ago

Good point!

Maybe a future AI could generate machine code that could be "disasssembled" into higher level languages.

Not sure if that would be better.

elcomet3y ago

Yeah you do, machines execute code but don't understand it.

indeyets3y ago· 5 in thread

So, it is loosely the same as copilot? I understand that approach is a tad different, but result of converting natural language descriptions into code-changes should be comparable.

And both are trained on large corpus of github sources

Is there a way to test it somehow? Public API maybe?

Kiro3y ago

> converting natural language descriptions into code-changes

noncovalence3y ago

I've found writing a temporary comment can be particularly useful when working with Unicode. For example, something similar to

//insert a unicode dot between each character in the string, and convert the numbers to subscript

saved me a lot of copy-pasting.

indeyets3y ago

I use both. Sometimes it feels easier to write five words of text than starting to write code.

elcapitan3y ago

I use it most of the time as smart auto-completion as well, but sometimes for boilerplate it helps to just write a comment what you want to achieve, basically like a ChatGPT prompt.

bil73y ago

for my day job, no, not frequently. When I'm writing in an unfamiliar language like bash or something, I'll do a little # implement a function that does x, y and z

Kwantuum3y ago· 3 in thread

hellodanylo3y ago

Yeah, I am also struggling to interpret the metrics in this post positively.

The 50% success rate is also best out of 3200 completions. For best out of 1 completion, the success rate is in low single digits.

zaidhaan3y ago

They do note that the models "tend to do better when prompted with longer code generation tasks".

But yes, the choice of scales for the graph was rather peculiar.

kdnvk3y ago

6 billion is by no means large.

spapas823y ago· 3 in thread

I'd really like to see how this would work with my commits... 99% of the messages on my commits are single word, similar to:

- ok

- fix

- done

- test

- nice

prettyStandard3y ago

Garbage in garbage out.

You should fix that.

alchemist1e93y ago

Yeah I bet the people working with them really love the commit messages /s

Or more likely they are working alone.

1 more reply

dizhn3y ago

There's a character limit to commit messages in their training data.

DominikPeters3y ago· 2 in thread

It would have been helpful to show some example generations of the model, unless I've missed them.

bogwog3y ago

This is all I could find: https://twitter.com/carperai/status/1619082410213404672

jerpint3y ago

Yes I agree. My understanding is you have in training dataset the original code, diff + commit message.

So you train the LM to:

Input: code+commit output: diff

moconnor3y ago· 1 in thread

All that to end with “no meaningful improvement over the salesforce codegen model” is a bit disappointing.

youssefabdelm3y ago

indeyets3y ago· 1 in thread

Related discussion: https://news.ycombinator.com/item?id=33271750

return_to_monke3y ago

while from the same company, same lab, I don't think this is what the article you linked is about. To me, that seems like a general purpose LLM and this just for code.

startupsfail3y ago

From the safety perspective (may get important soon), it is perhaps a very bad idea to allow easy execution/injection of arbitrary code into random places with little review.

When designing such systems, please do keep that in mind. Make sure code changes are properly signed and the originating models are traceable.

Same applies to datasets generated by models.

jakear3y ago

PoignardAzur3y ago

We're still in the beginning for these tools, but already they're demonstrating some really exciting capacity.

lettergram3y ago

I view programming as a trade. I’ve spent years honing my skills, I pass wisdom to junior engineers as I can. I review code and provide detailed alternatives.

Where I fear this will leave us is a class of less-skilled engineers and overly optimized AI. Basically, stuck in development.

ilaksh3y ago

Since I am building a website https://aidev.codes to do programming based on natural language descriptions, this is extremely relevant to me.

Does anyone know, what would be the easiest way to try to run their 6B Diff Models thing against my own prompts for my service? Maybe Hugging Face?

mortehu3y ago

I wrote the program between <BEF> and <MSG>, and it generated the following output:

  <NME> diff_model.py
  <BEF> import argparse

  import torch
  import transformers

  def main():
      argparser = argparse.ArgumentParser()
      argparser.add_argument('--checkpoint', default='CarperAI/diff-codegen-2b-v2', choices=['CarperAI/diff-codegen-6b-v2', 'CarperAI/diff-codegen-2b-v2', 'CarperAI/diff-codegen-350m-v2'], help='Model to use')
      args = argparser.parse_args()

      model = transformers.AutoModelForCausalLM.from_pretrained(args.checkpoint)
      tokenizer = transformers.AutoTokenizer.from_pretrained(args.checkpoint)

      # Use CUDA.
      model = model.cuda()

      text = '<NME> diff_model.py\n<BEF> '
      text += open("diff_model.py").read()
      text += '<MSG> Print a message after loading the model\n<DFF>\n'

      completion = model.generate(tokenizer.encode(text, return_tensors="pt").cuda(), max_new_tokens=400)

      completion = tokenizer.decode(completion[0], skip_special_tokens=True)

      print(completion)


  if __name__ == '__main__':
      main()
  <MSG> Print a message after loading the model
  <DFF>
  <DFF> @@ -1,4 +1,5 @@
   '''
  +import sys
   import argparse

   import torch
  @@ -8,6 +9,7 @@ def main():
       tokenizer = transformers.AutoTokenizer.from_pretrained(args.checkpoint)

       # Use CUDA.
  +    print("CUDA:", torch.cuda.is_available())
       model = model.cuda()

       text = '<NME> diff_model.py\n<BEF> '
  @@ -10,3 +12,4 @@ def main():
       completion = tokenizer.decode(completion[0], skip_special_tokens=True)

       print(completion)
  +    sys.exit(0)
  \ No newline at end of file
  [\ No newline at end of file repeated many times]

It takes about 150 seconds to run on a 3090 Ti when the model is already on disk.

Epa0953y ago

I also wonder if it could be useful in creating Coq proofs!

wslh3y ago

Very opportune. I am working on security diffs before and after security audit commits [1] reading the whole piece.

[1] https://news.ycombinator.com/item?id=34360102

parasti3y ago

ec1096853y ago

2022: engineers with 3 jobs

2023: engineers with their own AI model, typing “#fixed bugs” and spending the rest of the day by the pool.

Jackson__3y ago

I'm not sure if I'm just imagining it, but there seems to be a lot more negative push-back online to this than there was for copilot.

It makes me wonder if it's related to recent protests in other creative fields in response to AI models, or just a weird dislike of openly released model weights?

abdnafees3y ago

pklausler3y ago

How good are these LLMs going to be at debugging code, as opposed to writing it?

tbrownaw3y ago

Sounds like basically the inverse of what was on here the other day about automatically generating commit messages from a diff.

Sounds kinda cool, even if trusting it would be a terrible idea.

leo20233y ago

The next idea after this could be: developers draw a system diagram of the architecture, then AI writes the whole system E2E, high performance, distributed.

shul3y ago

Why all the hate? I for one welcome our AI overlords

shireboy3y ago

If this thing is trained on my commit messages we’re all doomed. Or else we’ll be able to type “fixed the thing” and have a whole app written.

j / k navigate · click thread line to collapse