My idea of enjoyable high-quality programming isn’t to dip a spoon into an ocean of soup made of other people’s random design decisions and bugs accumulated over fifteen years, hoping to get a spoonful without hidden crunchy insect bits.
I know the soup is nutritious and healthy 98% of the time, and eating it saves so much time compared to preparing a filet mignon myself. But it’s still brown sludge.
By averaging, a lot of imperfections get diluted away.
Like in Anna Karenina "happy families are all alike, unhappy ones are each in its own way". The defects are idiosyncratic, the commonalities are good.
[1] https://fstoppers.com/portraits/average-faces-women-around-w...
Multi-sourced accumulated unmaintained amateur software without clear provenance or ownership is more like creating a feature I'll call "insta-legacy": now you're responsible for a bunch of code you didn't write that by definition nobody you have access to understands.
This is absurd.
It's not going to stop people from doing it. The industry is clinically insane.
It allows people who do bad work to do more of it quickly. Before they had to manually shovel garbage into projects but now they have a dumptruck.
You know what? It might be fine. Maybe we're going to have a world of fast food programming where minimum wage coders pump out trash and there's going to be Michelin star programmers where you go to for the real stuff.
If that's the case, we'll have to somehow educate the public on the difference so they don't think it's the same thing. McDonald's and The French Laundry are both successful restaurants. That world is possible in programming as well.
It might already be like that. The cheap rates for shady contracting firms that do trash work are probably already using these things
Software on the other hand is a logical environment with clear, logical requirements. It ought to work, not just fall apart randomly.
There is no guarantee that sticking the average of one software into a completely different one will satisfy the logical requirements by any means whatsoever.
Also, when you average you don't really kill "defects", but rather outliers. An "outlier" statement within a program is very likely to do something important, e.g. taking care of a corner case, otherwise it wouldn't be there.
(My experience with Copilot and ML-assisted programming has been extremely positive, I would not choose to go without it at this point)
Also, unrelated: Leo Tolstoy had his number of problems with his wife and wrote them into Anna Karenina. In fact, it is quite contrary: most disfunctional families fall into several textbook scenarios, while happy families has their own distinct inner dynamics, just looking the same on the outside.
This "study" and your argument are meaningless.
> Futurists in 1950: Automation will free mankind from meaningless tedium to focus on creative pursuits only humans can master.
> Techbros in 2023: We coded AI to write all your books, music, and TV so you can focus on the meaningless tedium of your cubicle farm.
From this popular tweet by @stealthygeek https://twitter.com/stealthygeek/status/1618997354199400449What I see is a person who copy and pastes crap around until it works and calls it a day. I think code assistants can and will compete with them.
I love to solve _problems_ and to help people with it, but sometimes I just hate to write code to solve them. I wish my computer could have a clear picture of the solution that is in my mind so I didnt have to write a single line of code, so I could focus on the creative part of the problem solving
But I totally agree with you that it would be a positive outcome if we spent less time writing lines of code and more time using better tools to direct computers in solving problems and (I think just as critically) understanding the dynamics of those solutions. A major facet of my skepticism is that I think progress on that second part seems to be lagging way behind...
I foresee a lot of "we had a team, who have all now left, that used AI to write this system and it's mostly working right except in all these ways, and you need to fix it, good luck!" in all of our futures.
[0]: https://www.semafor.com/article/01/27/2023/openai-has-hired-...
The six figure salaries won't last another decade. For some of us, maybe, but certainly not most of us.
Learn AI now.
Good luck, everyone.
But I don't buy the "joy of writing code" argument. Coding is all about making a computer work for you, and I think that taming AIs to be more efficient without letting it introduce random crap will become both important and enjoyable. I think the techniques we have now are too crude for that, but it will improve. Keep in mind that even if you are writing C, you are already at high level, using libraries and compilers other people wrote, bugs included.
Now there is a certain charm being close to "hands on" programming, but if that's the case, go get an Amiga and make a few demos. It won't pay the bills, but it can be fun.
That is an absolutely valid point of view. But it doesn't apply to everyone. Programming is something that can take me into the zone like nothing else. And it has the added side effect of making me think more precisely about higher level problems as well. It's one of those excercises that help me stop fooling myself (in the Feynman sense).
As a human programmer, is this not what your own brain looks like? What are you doing to the information you take in that allows you to avoid regurgitating the "crunchy insect bits" of your own training corpus?
Maybe something like GPT, style transfer, and OpenAPI combined.
That's not to say plumbing doesn't take skill, it certainly does, but the point of it is that nobody except the next plumber cares how the pipes are laid out, so long as it works and works well. It's when one blows and you have to fix it, or install a second bathroom, that shit really tends to come out. If I'm the one that has to do the work, I can only hope the previous plumber had some idea of what they were doing, and didn't just leave it entirely to automation.
But if they did I'd hope they trust, but verify.
Programmers when the plumbing works: But I don’t want to just stitch components together! This is boring.
There's a reason they call Europeans "europoors". There are advantages to identifying yourself by your work.
Prepare for the same thing with electronics which you didn't consider as containing much software before - central heating units, AC units, fridges, stoves, light switches, LED light bulbs, vacuum cleaners, electric shavers, electric toothbrushes, kids toys, microwave ovens, really anything which consumes electricity.
Prepare for the support of the vendors of those appliances not taking phone calls anymore, only text communication.
Prepare for the support not understanding the random problems you encounter.
Prepare for the answers you get from support being similarly random.
And maybe, with an unknown probability, prepare for your house burning down and nobody can tell you why.
Just learn how to be a plumber.
Even worse, prepare for them to enthusiastically take calls.
Don't worry, someone will plug ChatGPT into a text-to-speech model soon enough, and market it as a way to put the personal touch back into customer support. Maybe they'll even give it a folksy accent.
This is not a tool for generating applications using statistical methods (we have a lot of tools which do that already), but a tool for assisting human persons by taking boring/repetitive tasks from them and letting us focus on the meaning, the goal
I think this will lead to extreme cost cutting measures in choice of the developers which are used.
People who would have previously been totally ineligible to develop software will happily be chosen.
And they won't care about the garbage code they produce as long as it somehow seems to work from the outside.
They'll care about feeding their families in the dire situation they are in, not more.
I think my intuition is that the average quality of software may well improve (good!) but that when issues arise they will be more obscure and harder to debug and fix, because nobody will know what the system is actually doing.
this is a fun pattern I've seen play in other industries
1. Humans create programming languages which machines can understand. OK.
2. Humans build tools (LSP, treesitter, tags, type checkers and others) to help humans understand code better. OK.
3. Humans build (AI) programs which run on machines so that the computer can understand... computer programs???
Aren't computers supposed to be able to understand code already? Wasn't the concept of "computer code" created so as to have something which the computer could understand? Isn't making a (AI) program to help the computer understand computer programs re-inventing the wheel?
(Of course, I get that I use the terms "understand" and "computer programs" very loosely here!)
Arguably, we would benefit from even higher level abstractions so the LLM can fit more logic in a single prompt/output.
Maybe a future AI could generate machine code that could be "disasssembled" into higher level languages.
Not sure if that would be better.
And both are trained on large corpus of github sources
Is there a way to test it somehow? Public API maybe?
Do people actually use Copilot for that? I just let it work its magic uninstructed. I guess it sometimes uses comments and function/variable names for its suggestions but that's about it. 99% of the time it just looks at my code, the context and neighboring files to predict what I'm trying to do.
//insert a unicode dot between each character in the string, and convert the numbers to subscript
saved me a lot of copy-pasting.
The 50% success rate is also best out of 3200 completions. For best out of 1 completion, the success rate is in low single digits.
I think the lesson here is that these models bring a lot more value when: 1. you have unit tests, 2. can afford compute/time to let the model try many solutions, 3. have enough isolation to run unverified code.
But yes, the choice of scales for the graph was rather peculiar.
- ok
- fix
- done
- test
- nice
You should fix that.
Or more likely they are working alone.
So you train the LM to:
Input: code+commit output: diff
Negative results are interesting in their own right. I’d rather read about why this isn’t better at the 6B parameter level than e see a hand wave that, well, the samples are more diverse and look the 350M model is better.
One of the first steps of a misaligned/unhelpful/virus type of a system, attempting to secure its presence would likely be inference/GPU/TPU compute access. And code injection is a vector. There are multiple other vectors.
When designing such systems, please do keep that in mind. Make sure code changes are properly signed and the originating models are traceable.
Same applies to datasets generated by models.
I on the other hand will survive: what sense is an AI to make of such classic messages as David Bowie's excellent "ch-ch-changes!", the five "fix CI maybe???"s in a row, or the eternal "fuck this shit"?
Something I haven't seen explored too much: navigation help. One of the things that takes me the most time when coding is remembering what was the next file / module / function I need to edit and jumping to it.
An autocomplete engine that would suggest jump locations instead of token could help me stay in the flow much longer, with fewer worries about whether I'm introducing subtle bugs because I'm relying on the AI too much.
My concern with AI across all fields are that people won’t gain the fundamental skills necessary for moving the bounds of what’s possible. Certainly, tools like this AI could produce good results. However, the underlying human is still providing the training data. More importantly, humans are producing the trajectory of development.
If humans are no longer capable of pushing the AI systems. Then the AI systems will either cease to improve, or the AI systems will learn to play off each other. In highly complex systems like many programs, I suspect they’ll play off each other and achieve local minimum/maximum locations. Ie because the “game” (program development) can be iterative they’ll constantly improve code. However, because the AI systems don’t interact with all data (particularly real-world data) when a customer shows a sad face at some UI/UX, it won’t completely develop a new feature that matches the desires of the customer.
Where I fear this will leave us is a class of less-skilled engineers and overly optimized AI. Basically, stuck in development.
OpenAI has an 'edit' endpoint but it's 'in beta' and limited to 10-20 requests per minute. They do not acknowledge support requests about this. Azure OpenAI also has this endpoint I think but they ignore me as well.
So for my edits just like everything else I have been relying on text-davinci-003 since it has much more feasible rate limits. I have just been having it output the full new file but maybe this Unified Diff thing is possible to leverage.
Does anyone know, what would be the easiest way to try to run their 6B Diff Models thing against my own prompts for my service? Maybe Hugging Face?
<NME> diff_model.py
<BEF> import argparse
import torch
import transformers
def main():
argparser = argparse.ArgumentParser()
argparser.add_argument('--checkpoint', default='CarperAI/diff-codegen-2b-v2', choices=['CarperAI/diff-codegen-6b-v2', 'CarperAI/diff-codegen-2b-v2', 'CarperAI/diff-codegen-350m-v2'], help='Model to use')
args = argparser.parse_args()
model = transformers.AutoModelForCausalLM.from_pretrained(args.checkpoint)
tokenizer = transformers.AutoTokenizer.from_pretrained(args.checkpoint)
# Use CUDA.
model = model.cuda()
text = '<NME> diff_model.py\n<BEF> '
text += open("diff_model.py").read()
text += '<MSG> Print a message after loading the model\n<DFF>\n'
completion = model.generate(tokenizer.encode(text, return_tensors="pt").cuda(), max_new_tokens=400)
completion = tokenizer.decode(completion[0], skip_special_tokens=True)
print(completion)
if __name__ == '__main__':
main()
<MSG> Print a message after loading the model
<DFF>
<DFF> @@ -1,4 +1,5 @@
'''
+import sys
import argparse
import torch
@@ -8,6 +9,7 @@ def main():
tokenizer = transformers.AutoTokenizer.from_pretrained(args.checkpoint)
# Use CUDA.
+ print("CUDA:", torch.cuda.is_available())
model = model.cuda()
text = '<NME> diff_model.py\n<BEF> '
@@ -10,3 +12,4 @@ def main():
completion = tokenizer.decode(completion[0], skip_special_tokens=True)
print(completion)
+ sys.exit(0)
\ No newline at end of file
[\ No newline at end of file repeated many times]
It takes about 150 seconds to run on a 3090 Ti when the model is already on disk.I also wonder if it could be useful in creating Coq proofs!
2023: engineers with their own AI model, typing “#fixed bugs” and spending the rest of the day by the pool.
It makes me wonder if it's related to recent protests in other creative fields in response to AI models, or just a weird dislike of openly released model weights?
Sounds kinda cool, even if trusting it would be a terrible idea.