It’s surprisingly useful when you’re not sure about how you want to proceed. E.g. While I was trying to make a simple function for printing all palindromic numbers under 10,000, Copilot inferred from the function name what I was trying and suggested a function using threading macros (something I hadn’t yet come across in Clojure). The result was a much neater affair than what I came up with on my own. I feel it could be a fantastic way to build familiarity with a new language.
Won’t this result in more junk code?
I am honestly pleased I am not using a Microsoft IDE and won't be part of all this crap. I have already started creating new projects on GitLab from now on.
I fear hordes of inexperienced developers, mindlessly clicking accept on the first suggestion, and then on the second, third, and so on, until one seems to work. And the more advanced tools like this get, the harder it'll be to review the code and make sure nothing's messed up.
I guess we'll have to think long and hard about what safeguards to build against these risks.
(this what hiring in ML is like)
As an educational tool, being able to generate code and then pick it apart is as important as learning to put code together from nothing.
If the problem is trivial enough that you can completely trust the AI coded solution, then you could have either done it yourself very easily or used a premade solution from a good library or toolkit.
If the problem is not trivial, then you have the outsourcing challenges (which apply to a lot more scenarios than using AI to help you code).
If you are not personally capable of judging the outsourced work, then whether you use AI or type it yourself, you will end up with errors or misfeatures.
If you are capable of judging, then you must pay attention and read/review. So your job shifts from defining the problem and programming a solution to defining the problem and reviewing potential solution(s). Either way, you must focus and think. But again, perhaps you would be better off building a solution composed of known good blocks. <- This should be the future of software development...
Sadly, I think that open source and freedom to (re)invent has worked against us in the long run. If instead of each of us going off and thinking, "I can make a better language/framework", we had built on existing technologies, I daresay we would be further along. To be fair of course, some level of dissatisfaction and divergence would be necessary or we would still be using assembly.
Github does have one thing right though (from a business perspective) - they are making a remedy to a symptom, and in that they can expect longer term revenue than if they actually solved the core problem.
With that in mind I think this problem is totally real and worth solving. If copilot can save me those mundane moments during my day where I have to figure out “how to do this common thing that I already did 100 times” then it’s a win for everyone.
It’s not trying to solve the whole of programming. It’s just a nice tool to let you actually concentrate on the non-automated parts of coding such as: actually translating requirements into code.
We don’t need more code. In general, code is a liability. A tool that helps create more configurations of the same terrible boilerplate incantations that we already have to maintain in a million code based is just adding to the problem.
You don't do it 100 times - you do it once, test it thoroughly, and put it in a library.
I think what's actually going to happen is that copilot will be successful, but will be a much worse version of poor quality outsourcing. There are going to be people "writing code" that don't even have the ability to evaluate the implementation.
You see the same thing in some industries where the institutional knowledge of the baby boomers is disappearing. No one stays at the same job for more than a few years and you can see people making mistakes with things that are out by a factor of 10x or 100x sometimes. Very often people don't have the ability to grasp the basic concepts of the work they're doing. I think the same thing will bleed over to software development a lot over the next decade.
I also have a huge objection to having any code I write used to develop an "AI" for the benefit of a huge corporation. Do I get a cut? I doubt it. That alone should be enough for people to quit using GitHub. They're training they're own replacement and are too dumb to see it IMO.
It's called Lisp. You know, Lisp, the language for AI written way back when brand-new Cadillac Eldorados with tailfins were still on the road that is self-reflective enough to make writing "programs that write programs" an absolute doddle? :)
Good luck getting bigco to sign off on a Lisp project, though. Even if it is of demonstrably profound practical utility.
You're right -- we have a huge problem with trying to trowel the new hotness (currently, "AI/ML" aka statistics) instead of taking advantage of tools that are highly suited to purpose. Instead of letting us use those tools, bigcos instead subscribe to the "programmer-clerk" myth in which programming is a menial task of mostly rote coding undertaken by minions in legion strength. And this affects not only the tools we use but our processes and professional values.
I just want us to stop reinventing the wheel hundreds of times each year. At least paper shuffling and fax sending took long enough that you could get a coffee and have a chat while it was happening. Now instead we toil over configuration files (which ironically Rails, my daily toolbox, aimed to solve a decade ago).
I should be able to define a few data models and relationships, processes, and some business rules. The rest absolutely should be generated for me. If cars worked like this, we would still be custom building wooden wheels.
Will it also learn (i.e. feed GPT) from the code we're writing which it is also helping to write? How do you think it'll learn to deprecate bad practices or evolutions observed in a language (think writing concurrency code in Java 5 vs. Java 9, or any other relevant Programming Language evolutions)?
Most likely not. And at least right now Tabnine's selling point is that you do inference locally (so no need to send code elsewhere) and can be trained on such (vide https://www.tabnine.com/tabnine-vs-github-copilot).
Exact same problem to solve, because it is an important an interesting one, but a totally different approach. Maybe AI assisted somehow, but human curated.
Otherwise, imagine how bad legacy codebases of the future will be when they are full of autocomplete code that nobody understands or cared enough to think through even originally.
The way I see this, GitHub Copilot and the like are true next-gen compilers, translating English into code. They'll only get better with time.
This is a repeated theme of the article. I think it’d be simpler to write the non boilerplaty code, no? Plus who’s going to be excited to have a job as “AI code reviewer”
At the same time code reviews are hard, much harder then writing code.
So making it faster to write boiler code at the cost of harder code reviews seems to not be a good trade of for me.
I've got a sneaking suspicion that this is what we'll be doing in the future, I see Copilot as a taste of what future compilers will be like.
In a few years I expect to have most of an application built by an AI, with me developing the business/core logic by hand.
I can see that too be preferably not with the approach done by GitHub.
So basically shift the cognitive burden on to your coworkers? And what if they are also using copilot hoping that you will sanity check it’s output for them? Tit for tat prisoner’s dilemma and no one even realises they are playing the game..
In the article, the author included this example:
alternate_word_mapping = {words[i]: words_in_english[i] for i in range(len(words))}
That line is probably more readable than the Pythonic enumerate: alternate_word_mapping = {word: words_in_english[i] for i, word in enumerate(words)}
But it superficially reminded me of a style that I see on Leetcode, which rarely uses Python features like enumerate or dict.items.CoPilot was trained on GitHub repos and many repos are mini-projects for learning a new language. Does that mean CoPilot will tend to suggest more generic, less language specific, implementations? If it does, will that change perceptions of what’s idiomatic? And will the volume of old code on GitHub influence CoPilot’s suggestions, making us slow to adopt new language features?
alternate_word_mapping = dict(zip(words, words_in_english))
That’s actually one reason why I don’t think it’s current incarnation is a good fit for learning new apis, it’s been trained on a lot of code, not all of it good.
Seems like something a future version might be able to fix, perhaps by training a new layer using just demonstrably ‘good’ code?
(V late reply as still learning how to keep track of replies on HN. But perhaps you’ll see it anyway.)
As an aside, I really enjoyed the writing style. The subtle humour is better at signalling competence and friendliness than any CV ever could.
I would certainly hope not, since there is barely any code to write:
for line in my_file: ...
For even more convenience in common cases, there is the fileinput module[1] in the standard library.I can see how a boilerplate-generating AI could be helpful in a more boilerplate-heavy language like Java, but a better solution is to use a language that better suits your usecase and lets you express it without the boilerplate.
This worries me.
Potential attackers would have two problems: 1) getting malicious checked into many repos and 2) making sure that these repos find their way into future deployed versions of GPT-3/Codex/CoPilot.
CoPilot generates enough vulnerable code as-is [1], so the extra effort isn't even required.
[0] https://www.bleepingcomputer.com/news/security/linux-bans-un...
[1] https://cyber-reports.com/2021/07/14/devsecai-github-copilot...
I’ll agree with faster and more code, but from the many examples I’ve seen, it’s not better.
I spend much more time figuring out what the requirements even are, refining them, figuring out what that even means in terms of code, figuring out the overall architecture, how it fits in with other systems or existing code, what data formats it uses, how it handles faults, persistence, scale, security. How it interfaces with the outside world (UI or API). Besides the code itself, I also spend a lot of my time on writing tests (which I wouldn't want to pawn off on an AI outside of fuzzing or generating data for property-based tests; unit tests should mirror what the spec dictates and needs to test the correct things) and on writing documentation.
Yes, the code does take up a good chunk of time, but really, its the easy part of my day!
Also, speeding through the code means I'm not thinking about it very deeply. That's when I introduce the most bugs, design flaws or shortcomings that bite me later. I wonder if we'll end up with a situation like the old quote about code reviews: a ten line code review gets a hundred comments/suggestions/questions, a thousand line code review gets a ship it. If much of the code is written for us, will we have the attention span to scrutinize it and understand it deeply? Or will our eyes eventually just glaze over as we go yeah its probably fine, ship it.
Github Copilot and its iterations are the future.
You can complain and whine about what problems are being solved, how it'll affect human developers (making them weaker instead of stronger over time). And to some extent, that's probably true.
But it's still the future and it's coming. It's already here.
Here is the relevant section from Githubs privacy policy [1]
> 6. Contributions Under Repository License
> Whenever you add Content to a repository containing notice of a license, you license that Content under the same terms, and you agree that you have the right to license that Content under those terms. If you have a separate agreement to license that Content under different terms, such as a contributor license agreement, that agreement will supersede.
From GPLv2, "When distributing derived works, the source code of the work must be made available under the same license."
------
This is not about technology, it is a legal endrun around using open source code without open sourcing derived work. It is using AI as a form of "license laundering".
"OpenAI" is not open at all. Truly open AI means the code, the data and the model are all open. OpenAI sold the source to GPT-3 to Microsoft, received $1 billion from them in 2019 and does not make most of their work available except behind a highly exclusive, paid API - https://beta.openai.com/pricing/. Its a joke to call that "open". I urge you to read up on OpenAI and look at what the have actually done.
Their plan in the future is to sell access to Copilot, directly monetizing work they stole from others for free:
> According to GitHub, “If the technical preview is successful, our plan is to build a commercial version of GitHub Copilot in the future.”
I've deleted all my code from github and hope others do the same. Maybe if some bigger profile project starts doing this, we can start to organize around opposing Pilot and OpenAI.
Others have also pointed out similar concerns - see https://news.ycombinator.com/item?id=27687450 for example.
[1] https://docs.github.com/en/github/site-policy/github-terms-o...
At work, we store all our code in our GitHub repo, some public, some private. As-is, I think there's a lot of legal ambiguity around using Copilot, but if all that code just served to teach the model structure of programs and common syntactical constructs, but then it had another layer with our code and its idioms, modules, names, then maybe it would regurgitate our code in a way that's useful and doesn't run afoul of licenses.
I'm thinking of a fast.ai course I did where I took a base model trained on generic image data, and then did transfer learning on top where I fed it labeled images of Go games and Chess games, and with only maybe 100 of those images it learned to distinguish the two with shocking accuracy. As I understand it, the base model taught it how to look for things like lines, corners, contrast, etc, and then it could be easily specialized. Could something similar be the case here?
I don't have a lot of faith in the author's code, if that is there opening statement ("better")
I don't think it will go much further than that and I don't know what that would even look like anyway, unless I could actually start discussing architectural decisions with it like I do with a human pair programmer. I guess you could say that this is what the comments are for, so who knows.
This writes more code and doesn't help design the application. It's a function autocompleter but it doesn't take my abstractions into account.
> ... or just learning to code
I've done some teaching and my mental model for what is necessary to learn a language and, more generally, learning how to program and my view is that the rudimentary boilerplate-y type of stuff that this tool seems to excel at are mindless to most are essential for beginners as part of their learning.
Any educators here with different ideas or thoughts?
I spend much much more time figuring out how to convert requirements to code and how to structure it non-locally (as in how it fits into the whole codebase)
Also if you’re concerned with writing clean code, copy-pasting boilerplate everywhere is not the right approach, you have to actually think about interfaces and abstractions