GitHub Copilot Generated Insecure Code in 40% of Circumstances During Experiment (opens in new tab)

(theinsaneapp.com)

261 pointselsombrero4y ago155 comments

155 comments

94 comments · 37 top-level

lmilcin4y ago· 24 in thread

I thought this should have been expected.

Security starts with deep understanding.

Some standards and practices can help avoid some types of problems, and some are even rather effective (like airgapping your systems), but there isn't any way to assure security in general other than truly understand what you are doing.

I feel like Copilot is the wrong direction to optimize development. This is mostly going to help people with already poor understanding of what they are doing create even more crap.

For a good developer those low level, low engagement activities are not a problem (except maybe for learning stage where you actually want people engaged rather than copy/paste). What it does not help is the important parts of development -- defining domain of your problem, design good APIs and abstractions, understanding how everything works and fits together, understanding what your client needs, etc.

Also, I feel this is going to help increase complexity by making more copies of same structures throughout the codebase.

My working theory about this is this is going to hinder new developers even more than they already are by google and stack*. Every time you are giving new developers an easier way to copy paste code without understanding you are robbing them an opportunity to gain deeper understanding of what they are doing and in effect prevent them from learning and growing.

It is a little bit like giving answers to your kids homework without giving them chance to arrive at the answer or explaining anything about it.

Another way I feel this is going to hurt developers is competition in who can produce most volume of code.

I have already noticed this trend where developers (especially more junior but aspiring to advance) try to outcompete others by producing more code, close more tickets, etc. Right now it means skipping understanding of what is going on in favor of getting easy answers from the Internet.

These guys can produce huge amounts of code with relatively little actual engagement.

To management (especially with wrong incentives) this seems like a perfect worker, because management usually doesn't understand the connection between lack of engagement and planning at design/development time with their later problems (or they don't feel it is them that is going to pay the price).

The Copilot is probably going to make it even more difficult for people who want to do it the right way because even starker difference in false productivity measurements.

andix4y ago

The real evil here is boilerplate code.

I've seen so much boilerplate in the Java or classic .NET Framework world, it's incredible. So many layers of DTOs, Request/Response Models and so on, that could be just generated. Or most of the time even removed completely (that would cost some "architects" their job though).

This is also true for a lot of Redux or Angular/NgRx applications. So much boilerplate, that you can't find the relevant code anymore.

lmilcin4y ago

(I have been professionally programming Java backends for the past 16 years).

Java is not the culprit here.

I think it is something that happened on the way that has something to do with J2EE and patterns craze we had a decade ago or two ago.

It doesn't help that frameworks like Spring and their documentation go out of their way to propagate these boilerplate-heavy patters.

Copying these lazy patterns is shortest, easiest way to get to working solution for a person that doesn't want to put any extra effort. And you can't get punished for doing this. Most developers don't even know there exist any other possibilities than mandatory controller calling service calling database layer and hordes of DTOs some people call "model".

3 more replies

simion3144y ago

An IDE or other tools will generate correct boilerplate code. Seems a gripe from someone that prefers hidden magical code that setups code behind their back.

The evil is that someone trained an AI on random text , not even with some AST, so you have garbage in so no surprise you get garbage out.

A true AI would understand that "the dev wants trough find all lines of text in a file that have this property", the AI just does "this code string is similar to this other code string using this `black box metric`"

3 more replies

fn14y ago

This is because "Copy'n'paste" programming get's more and more common.

I see more and more juniors pasting code or shellcommands from StackOverflow with careless ease, without even pretending anymore that they're interested in how it actually works.

merpnderp4y ago

Should look at Vue with the composition-api layer, there's close to zero boilerplate.

A store in Vue 3 can basically be:

  export default { state: readonly(state), ...setterFunctions }

It doesn't get more easy to read and streamlined than that.

amw-zero4y ago

It is true that lines of code are correlated with bugs. In fact, that's the best predictor of the number of bugs - there was some study somewhere that concluded that.

I still doubt that that's a result of DTOs.

captainmuon4y ago

> Security starts with deep understanding.

I wonder if the way we are approaching it is wrong. We are basically putting text though a deep learning black box. The model might have learned some abstractions, but all in all it is just playing word games and trying to guess the most likely continuation of a string. Maybe we should go into the other direction and base such an AI on a really massive ontology. Instead of unstructured strings, put highly structured facts into the model.

For example, just like in Copilot you'd start with:

    def login_user(username, password):

But the ontology would also know things like:

- This is a web application and this function is going to be called after submitting a form

- Security specialist Bob says you should always hash your passwords

- Specialist Anne says you should use bcrypt

- Tom says Anne is 95% trustworthy

... and thousands of facts more. And then it would take them all into consideration, build a represenation of the problem you are trying to solve, find a strategy, and only in the end generate code.

I have a feeling that there was a qualitiative leap going from simple neural networks and multivariate methods to "deep learning" and modern machine learning, and that this is mainly driven by scale and available computing power. Now what if we try the same thing for ontologies, expert systems, and triple store databases? I think the difference will be between some AI parroting what it read on Wikipedia (direct speach), and a smarter AI being able to reason about what it read on Wikipedia (indirect speach).

lmilcin4y ago

I think one way this could be improved is, instead of giving an exact answer (which is provably impossible to do correctly) maybe it could be possible to point the developer to other repositories where other people were solving similar problem.

There are already services that do this for you and I actually find them useful. For example, I might be trying to use a function from some library and it fails. If I get pointed to some public repositories that use the same library in function for similar purpose, I may learn that I am missing some critical setup. I can also browse different uses of this function/library and get informed on how it is at the very least used successfully by others.

perl4ever4y ago

There was a project started in 1984 to do that:

https://en.wikipedia.org/wiki/Cyc

Supposedly an attempt to assemble a database of "common sense" facts and reasoning.

It has always been controversial and it's not clear what kind of success it's had.

DonHopkins4y ago

You're touching on the "Neat -vs- Scruffy" dichotomy in AI. (But it's not necessarily a dichotomy -- they can be combined!)

https://en.wikipedia.org/wiki/Neats_and_scruffies

From the "Scruffy" side, there's Charles Rich's classic work on "Programmer's Apprentice".

https://dspace.mit.edu/handle/1721.1/6054

https://dspace.mit.edu/bitstream/handle/1721.1/6054/AIM-1004...

>The Programmer's Apprentice Project: A Research Overview

>MIT AI Lab Memo No. 1004, November 1987.

>Rich, Charles; Waters, Richard C.

>Abstract: The goal of the Programmer's Apprentice project is to develop a theory of how expert programmers analyze, synthesize, modify, explain, specify, verify, and document programs. This research goal overlaps both artificial intelligence and software engineering. From the viewpoint of artificial intelligence, we have chosen programming as a domain in which to study fundamental issues of knowledge representation and reasoning. From the viewpoint of software engineering, we seek to automate the programming process by applying techniques from artificial intelligence.

https://dspace.mit.edu/handle/1721.1/41967

https://dspace.mit.edu/bitstream/handle/1721.1/41967/AI_WP_1...

>Plan Recognition in a Programmer's Apprentice. Ph.D. Thesis proposal.

>MIT AI Lab Working Paper 147, May 1977.

>Rich, Charles

>Abstract: Brief Statement of the Problem: Stated most generally, the proposed research is concerned with understanding and representing the teleological structure of engineered devices. More specifically, I propose to study the teleological structure of computer programs written in LISP which perform a wide range of non-numerical computations. The major theoretical goal of the research is to further develop a formal representation for teleological structure, called plans, which will facilitate both the abstract description of particular programs, and the compilation of a library of programming expertise in the domain of non-numerical computation. Adequacy of the theory will be demonstrated by implementing a system (to eventually become part of a LISP Programmer's Apprentice) which will be able to recognize various plans in LISP programs written by human programmers and thereby generate cogent explanations of how the programs work, including the detection of some programming errors.

1 more reply

supernovae4y ago

I don't understand why people fear copilot or blame copilot.

Copilot doesn't bypass peer review, code review, unit testing so on and so forth.

1 more reply

gverrilla4y ago

Do you think only experts should be programming? I'm an amateur programmer, and I think copilot could help me a lot with unimportant things, as you said - I even tried to install it but I'm not on some list. I can read code, and have built a few programs - I've hired around 30 different programmers in my life, and the vast majority clearly are copy-pasters-adapters. The way I see it, that happens because programming is still way more complex than it should be - and copilot will help with that. Maybe you are thinking about elite developers or perhaps developers on big companies, but I think it will be greatly benefitial for us low-level coders amateurs, freelancers and fresh people. Am I wrong?

lmilcin4y ago

> Do you think only experts should be programming? I'm an amateur programmer (...)

Amateur vs professional and novice vs expert are completely separate things.

You can be professional novice just as you can be expert amateur.

Now, the answer to your question is an obvious "NO". To be an expert you have to be a novice first.

The problem rather is "Are you making progress towards being an expert or are you just learning to more efficiently execute your novice workflow?"

> The way I see it, that happens because programming is still way more complex than it should be - and copilot will help with that.

No, it is just an illusion of help.

Just as your son may thank you for help when you give him an answer to his homework. From his point of view you have helped him, true, but from another point of view the point of the task wasn't to deliver answer to the teacher, it was to imprint something valuable on the mind of the child.

2 more replies

saurik4y ago

I think the argument is that an amateur with copilot is going to stay an amateur longer than someone without copilot while simultaneously only helping them create something no one--including them--should rely on: it teaches the wrong habits and helps with the wrong problem.

1 more reply

macksd4y ago

I sympathize with where you're coming from, but the phrase "unimportant things" bothers me. I'm always seeing clients deploy alpha or beta software in production. I see tech companies accumulating tech debt like nobodies business. None of that should happen. And often disasters involving tech get traced back to a cascading failure that started with something considered unimportant.

I love that software is an accessible discipline to hobbyists and that it empowers people. But it needs to be a discipline, top to bottom. We need deep understanding with security and robustness as fundamentals, good practices, and all of that baked into our tools.

lhorie4y ago

From my experience with tutoring, I would say that it won't help. The best way to learn is to mechanically do the work. Having things fed to you yield poorer results, IME.

Another parallel: language learning. You learn more by speaking and writing than merely reading and listening, because the former actually requires you to actively associate grammar rules to your physical actions, whereas consumption has a lower bar of effort since you can infer things from context, gloss over things, etc.

chakkepolja4y ago

This assuming most people using copilot are going to properly read generated code.

LeifCarrotson4y ago

> I feel like Copilot is the wrong direction to optimize development. This is mostly going to help people with already poor understanding of what they are doing create even more crap.

Sometimes you don't need an expert to produce highly secure, highly optimized code.

Have you seen the crap that people buy at Walmart? The furniture is not heirloom furniture, the food is not a 3-star artisanal experience. Have you bought tools at Harbor Freight? They're not the lifetime companion of a tradesman, kept in wood boxes and wrapped in cosmoline after each use. But an awful lot of work gets done with them, common homeowner wisdom is if you need a tool, buy it at Harbor Freight, if you use it enough to wear it out spend 10x to buy a really good one, but most tools you'll only use once or twice.

At workplaces across the country right this minute there are human beings doing rote transcription from one application to another, copy-pasting if they're lucky. That's a waste of effort and intellectual potential, and a hodgepodge of Excel equations or a crappy bit of Copilot glue code could be just the ticket. Yes, if those become the business' secret sauce and sold to customers on the Internet, they ought to put some effort into doing it properly, but there's a ton of work that could be accomplished with low-quality code.

Barrin924y ago

>Have you seen the crap that people buy at Walmart? The furniture is not heirloom furniture,

the difference is that your sofa isn't programmable and networked into every other appliance in your house underpinned by a general purpose computer rife for abuse.

Virtually every piece of software you install is an access point to your machine or your sensitive data. One isolated thing in the analog world breaks down, not a problem. One misconfigured password in a VPN client, and whoops part of your national oil infrastructure goes offline

https://www.reuters.com/business/colonial-pipeline-ceo-tells...

amw-zero4y ago

> Sometimes you don't need an expert to produce highly secure, highly optimized code.

This is one for the ages.

2 more replies

908B64B1974y ago

> I feel like Copilot is the wrong direction to optimize development. This is mostly going to help people with already poor understanding of what they are doing create even more crap.

I can't wait until we start seeing Copilot Natives devs, who had it enabled from the moment they first opened VSCode at their "become an engineer in 3 months" bootcamp.

> To management (especially with wrong incentives) this seems like a perfect worker, because management usually doesn't understand the connection between lack of engagement and planning at design/development time with their later problems (or they don't feel it is them that is going to pay the price).

That's something I really want my competitors to do. Honestly it makes finding stocks to short much easier (or poaching talent...)

metb4y ago

>To management (especially with wrong incentives) this seems like a perfect worker, because management usually doesn't understand the connection between lack of engagement and planning at design/development time with their later problems (or they don't feel it is them that is going to pay the price).

This is how management is in most places I feel, especially when it comes to evaluating junior, and early senior engineers.

bsenftner4y ago

I too feel this is the wrong direction, from the fundamental aspect that trained algorithms contain no comprehension of what they are doing. They are the classic idiot savant. In a technological economy, comprehension of the environment is everything. I do not see how comprehension can be achieved without the elusive General AI, so I do not see this as anything other than a new area of research exposing how vitally important it is to have comprehension.

rented_mule4y ago

For context, I'm a very experienced software engineer (I shipped products before most of my coworkers were born) and I've been using Copilot for 6-8 weeks while creating a challenging (and therefore fun!) new system.

> This is mostly going to help people with already poor understanding of what they are doing create even more crap.

I can see how people who haven't used it at length might come to that conclusion, but my experience with it calls the "mostly" part into question. I'm sure there will be cases of that. But as someone who deeply understands my craft, I'm finding significant benefits.

> What it does not help is the important parts of development -- defining domain of your problem, design good APIs and abstractions, understanding how everything works and fits together, understanding what your client needs, etc.

Quite the contrary! The last time a new tool helped me with those parts as much was when I moved from C++ to Python in 1997. What I experienced in my C++ -> Python transition was that an enormous chunk of my brainpower could shift from language gymnastics to the problem domain. Copilot gives me a similar feeling. It frequently suggests exactly the 1-3 lines of code I was about to type and saves me 30-60 seconds (easily 20 minutes in a full day of coding). Much better than that, it lets my focus stay on better abstractions, APIs, etc.

> Also, I feel this is going to help increase complexity by making more copies of same structures throughout the codebase.

We, as engineers, are still responsible for what we produce. Any tool needs to be used with critical thought. Of course there will be those who don't think enough. And it might even make them look better in the short term. But that will be exposed in the medium to long term - `git blame` will point to them as the authors of problematic code and not Copilot. When such problems arise (or even better, before they arise), some of us who are more experienced need to step up and mentor less experienced folks so that they develop good habits.

A small sample of areas it's helping me...

When I decide that I want to use different representations internally and externally for some data in a class, I initialize the internal member variables. Part way through typing Python's `@property` decorator, it's suggesting the name of the property and exactly how to use the member variables to generate the external representation I want. Over half the time, it's exactly what I was about to type. Maybe a quarter of the time it's not and I just don't accept the suggestion (or do a quick edit). And 5-10% of the time it suggests an approach that is better than what I was thinking. And that's in a very simple use case.

In other scenarios, it often sets up my loops just as I want them. Sometimes it picks column major when I want row major. I just keep typing and as soon as it's clear I want row major, it's suggesting that. Again, occasionally it surprises me with something better - if I just use that one function I rarely have a need for, the inner loop melts away. Why didn't I think of that? Well, now "I" did. The code I'm producing with Copilot is better than the code I would have written without because I'm thinking as I use it.

Where it really saves me time / focus is when I have some tricky calculation or API call that isn't hard, but there's a bunch of little details to get right. One I did yesterday... lookup a value in a dict, but the key needs to be mapped through another dict. Between the original key, the two dicts, and the variable receiving the result there are four variable names, plus one more for the mapped key (to spread it across two statements for readability). Before typing anything, I paused for a second to get the names straight in my head. Before I finished my thought, it suggested the lines, I looked at it for a second to make sure it was right, laughed because it was, and hit tab. It wasn't a hard task, but it helped me stay focused on the bigger picture.

Most of the time this doesn't feel at all like boilerplate. It's picking up my variable names and properly using the data structures I setup in other parts of the code. There's a big misconception that it's just pasting snippets in. It feels very different from that in real usage. Also, it rewards good naming habits. In the example above, how did it know I wanted to map the key through that dict? `key_mapping` was in the variable name. Easier for others to read later and for Copilot to read now.

The system I'm building is definitely better designed because of Copilot. Not because Copilot did any of the design, but because it freed me up to focus on the design more. It will have downsides, but in experienced hands it can be a great tool. I'm not affiliated with Microsoft / Github / OpenAI in any way. I'm just doing better work because I'm using it and doing better work makes me feel good. When the time comes, I'll pay for Copilot out of my own pocket if my company doesn't pay for it.

toastal4y ago· 9 in thread

You are the free labor copilot to train Microsoft GitHub's Copilot tool. You are responsible for any of those insecure code errors and the diligence require. You will be on the hook for resulting problems. But Microsoft and their home-phoning, tracking-embedded editor will get real people to correct and train their machine for free—with their stated plan of later selling that machine back to us later.

I wish there were a “robots.txt” file for Git to disallow certain bots from training on anything I have written.

carl_dr4y ago

> I wish there were a “robots.txt” file for Git to disallow certain bots from training on anything I have written.

It’s simple. If you are concerned by this, don’t host your repositories on GitHub.

tyingq4y ago

You would have to not host your code publicly either, right?

2 more replies

andix4y ago

It is called LICENSE.txt. License your code as GPL and then Copilot can't reproduce bigger parts of your code.

But as long as you give the public access to your code, they can study it and learn from it. Humans and machines.

chrismorgan4y ago

No, the license that you apply is completely irrelevant, and there’s certainly nothing whatsoever special about the GPL. Copilot is completely depending on being effectively exempt from copyright; if that legal theory falls apart, the entire space (and a lot of other machine learning stuff) is utterly doomed. Trouble is, Copilot can’t tell whether it’s reproducing copyrightable chunks of your code, or indeed where what it produces came from, by the very nature of machine learning techniques.

2 more replies

gnomewascool4y ago

I don't think that the GPL gives much more protection than any other FOSS license, here, in practice.

If Copilot were to reproduce a larger part of, say, an MIT-licensed codebase or almost any other permissive licence, then they should legally provide attribution. I'm pretty sure that they don't even have an option to provide such specific attribution, which means that either they believe that the code copied from any one source is below the relevant threshold or they're just ignoring copyright.

JamesSwift4y ago

I would assume github could supercede your license by putting its own claim to your code in the TOS. I doubt they have done that, but just pointing it out.

1 more reply

EamonnMR4y ago

I'd love to see ML-GPL which specifically deals with using licensed property as a training set.

2 more replies

danShumway4y ago

:) I suppose you could always just add so much insecure code to your Github account that your expected value to Copilot is negative.

Although judging from the results of this test it kind of seems like for a lot of accounts that's already happened.

gfiorav4y ago

Plus, GH doesn't care how you licensed your code. It will learn from it and produced licensed code.

1 more reply

shireboy4y ago· 6 in thread

…Compared to 60% of circumstances in the meat-based developer control group? :)

dimitrios14y ago

I love that we always use the average here for these justifications. We just slowly chip away and any and all excellence. 10x memes aside, we all know what it's like to work with a truly talented and productive engineer versus your everyday schmoe collecting a paycheck. It's a story as old as time, and yet here we are doing the exact big factory industrialization techniques other industries have done and that is commoditize the thing that made them exceptional and eliminating artisanship, uniqueness, and ultimately quality and character.

It's a tragedy of the commons of a sort.

lhorie4y ago

Wasn't there a thread here just yesterday about how 6% of some class of AI outperformed a human, but then it turned out that 0% outperformed two humans? That's also literally the lesson Uber learned the hard way when a SDV ran over a person (that zero humans is worse than one, and one is worse than two). This is also the principle behind code review, peer review, QA, middle management bureaucracy, and a whole lot of other things.

The tragedy, IMHO, is that AI models like this encourage centralizing decision making into a single black box (to the extent that external research then benefits the owner of the AI model rather than advancing public commons), whereas in pretty much every other aspect of life, we consider decentralization/redundancy of autonomy to be the solution to robustness problems.

1 more reply

spywaregorilla4y ago

I disagree. 40% is not great, but unlike the masses of developers, this is a single system that can improve over time. Further, a system that can do most of the work but requires a security specialist to polish it is still a useful tool. What's important to recognize is that this is not a terribly novel concept. Unsecure code is written every day.

1 more reply

phreeza4y ago

It may be a tragedy, but I fail to see why it is a tragedy of the commons? Which resource that is a available to all is being overused? High-paying dev jobs? Those are not a commons in the sense that tragedy of the commons implies because lower-quality devs don't stand to benefit by only taking a smaller part of the job.

2 more replies

spywaregorilla4y ago

and of the population that is likely to use copilot in production for their own work? 90%?

lupire4y ago

These are made up numbers. A control group is needed.

mbrevda14y ago· 4 in thread

For comparison, what percentage of human-generated code is secure?

iainmerrick4y ago

It seems reasonable to want Copilot to help you produce code of a reasonable quality.

If it’s just helping you crank out the same bad code more quickly, without learning anything in the process, that’s useful to know. Some people might still want a tool like that, I wouldn’t.

burnished4y ago

Sure. But in order to know if its 'of reasonable quality' you need some sort of baseline to compare it to. What is reasonable quality? I think what your average human does is probably reasonable.

Like, if your average dev will produce insecure code in 80% of samples, then Copilot starts to look really good! But if its closer to 0.01% of code samples, then copilot looks more like an intriguing novelty, not to be brought too near serious work. Much like dippin dots in this regard.

eddieroger4y ago

That's basically where my gut went when I read the headline - so is that of a junior engineer, or really any engineer who hasn't had to think about it, and we don't promote their code directly to prod, either (if we can avoid it).

Copilot shouldn't be able to generate code destined for prod without review any more than should any line of code written by a human.

westurner4y ago

> For comparison, what percentage of human-generated code is secure?

Yeah how did they measure? Did static and dynamic analysis find design bugs too?

Maybe - as part of a Copilot-assisted DevSecOps workflow involving static and dynamic analysis run by GitHub Actions CI - create Issues with CWE "Common Weakness Enumeration" URLs from e.g. the CWE Top 25 in order to train the team, and Pull Requests to fix each issue?: https://cwe.mitre.org/top25/

Which bots send PRs?

COMMENT___4y ago· 4 in thread

It's painful to see that GitHub Copilot is called "AI". For god's sake, it is not AI. It's just an advanced auto-complete for coders. GPT-3 is close to AI, GitHub Copilot is not.

Jesus Christ, please make them stop. Stop using AI as a buzzword.

arthur2e54y ago

GitHub Copilot is literally GPT adapted for code. The paper on OpenAI Codex, the stuff powering Copilot, makes it very clear in the abstract. https://arxiv.org/abs/2107.03374

Either you call both AI or you call neither AI.

(A previous version of the comment stated that it was tuned from GPT-3. This is incorrect; the simpler GPT was used for faster convergence.)

COMMENT___4y ago

If I have to choose, then I would decide not to call them AI at all.

sprafa4y ago

Is it fair to call it an ML based system?

COMMENT___4y ago

Why not? GitHub Copilot was trained with ML to autocomplete your code. But it is not AI.

evolveyourmind4y ago· 3 in thread

Meaning 40% of the code on GitHub is insecure

auggierose4y ago

No. It means that when c = f(a, b), where a, and b are secure, and you have no clue what f does, it might still be the case that c is insecure.

dkersten4y ago

You could train a model on purely secure code and still have it combine it in insecure ways.

0-_-04y ago

And the other way around.

1 more reply

lampe34y ago· 2 in thread

I'm using copilot now for some time and yeah it's more a toy than real help right now.

The only time it really helped when I needed to create a named list of char codes.

When it comes to more complex code than checking the code of copilot takes the same time as writing it. 90% of the time I needed to correct copilot.

For me, tools like linters are way more helpful then. If I could only use ESLint or copilot, I would go 100% of the time with ESLint.

harlekein4y ago

I think another risk with getting Copilot to start out, is that it might nudge you into a direction you wouldn't have gone into otherwise.

Whether that is better or not, I suppose, it depends.

lampe34y ago

I'm working on a HTML Tokenizer in Deno/Typescript.

Copilot only helps with boilerplate code which could be handled by good intellisense.

When it tries to generate a function from the function name it fails so hard that it is more in your way then helpful.

0-_-04y ago· 2 in thread

I fail to see how this is particularly useful information about Copilot. The comparison should be:

1. How many times do people write insecure code when not using Copilot?

2. How many times do people write insecure code when using Copilot?

nextlevelwizard4y ago

It is useful since it means copilot is not taking your job any time soon. i.e. if 40% of the time the human driving the thing is needed to intervene and prevent obvious security flaws then expert is still needed to use the tool.

0-_-04y ago

I think it was obvious from the beginning that it's trained on GitHub code, so it would be surprising if it was better than the average code on GitHub.

In any case, if Copilot can generate code as well as the average programmer without supervision, that means it can already take the job of 50% of programmers. A more useful metric though is how many programmers can a person using Copilot replace by having greater productivity?

Also, in how many programming jobs does security matter? In my job for example it doesn't matter at all.

1 more reply

Vaslo4y ago· 1 in thread

It's learning from existing code, right? Doesn't this say something about developers in general, or is the thought that it uses combinations of code that are insecure?

harlekein4y ago

I don't hold the average developer in very high regard. There are tons of developers who are much better than me and I readily read their books, follow their tweets, blog posts and online talks to learn from them. I hold them in high regard, but these people are not the average developer.

If you would pick any smaller company with a dev team, a freelancer or an agency, your chances of finding a developer who understands and upholds quality code is vastly reduced.

Not to mention a lot of beginners will just push their practice projects to GitHub and never look at it again. I'm also guilty of this, but I never realized Microsoft was training AI with this code. If Copilot is learning from these projects then I'd say the code it regurgitates is not average, but even below average.

gfiorav4y ago· 1 in thread

I think this should be pretty much expected. I'm unfamiliar with how this network is trained, but I'm pretty sure the data ranking is not perfect.

I'm guessing the ranking features are based on the repo stats, contributor stats, etc. Even "good" contributors will make rookie mistakes in certain areas.

Interesting to imagine how GH will try to solve this issue.

jpalomaki4y ago

It might be possible to learn from the change history of the projects. There's likely quite many commits which fix certain security issues, such as SQL injection problems. Maybe even with suitable metadata in the issues or commit messages.

makach4y ago· 1 in thread

..does that means we will be 60% more secure than before?

dkersten4y ago

Only if 100% of the code was insecure when written by a human. Given that humans can think through what code is doing, I don’t think that’s a reasonable assumption.

gnrlst4y ago

I've experienced this first hand: the autosuggest is scarily accurate and insidious at the same time. On numerous occasions I've auto-filled a 10-15 line suggestion that looked like it was exactly what I wanted to do, but made a very critical mistake (e.g. in a For loop, referencing the wrong array despite calling it the right name). Not really security related stuff, but head scratchers that make it harder to debug since I didn't actually write the code.

moretti4y ago

I use Copilot mostly as replacement for intellisense and macros. It helps me automating repetitive tasks. I would never trust Copilot for an algorithm or a snippet, I mean I would treat the code just like anything taken from StackOverflow or Github.

wcarss4y ago

I couldn't find a link to the actual study anywhere in the article: https://arxiv.org/abs/2108.09293

adamsvystun4y ago

It is important to remember that Copilot can improve. 40% is not a bad baseline, but one data point does not give us much info, we should wait and see the rate of improvement.

dexen4y ago

Half joking:

so far GitHub Copilot is more feasible as tool for humans doing code-coverage for its input code, "given enough eyeballs, all bugs are shallow" style. When a developer goes, "huh, Copilot generated insecure code, better report it to the original project it learned it from" - if only Copilot was able to link to the original project, it would all be great and useful.

rcarmo4y ago

As many people have pointed out indirectly, this is almost certainly caused by the training set. Without a bias or ranking for quality, it will just churn out the “best fit” or most popular snippets…

whazor4y ago

Happily having access to GitHub copilot, it very often generates the code that I want. So it saves me from typing and also often saves checking Stack Overflow. I think the libraries/packages you use also play a big influence in how easy it is for copilot to create security flaws. Still, more training against security holes would be appreciated.

Animats4y ago

Well, of course. GPT-3 has no underlying model of meaning. It's just autocomplete with a bigger data set. Used on natural language, it produces text that looks reasonable for about three paragraphs. Then you realize it's just blithering and has nothing to communicate. (Like too many bloggers, but that's another issue.)

wccrawford4y ago

I'm actually impressed with that. There's so much insecure code out there that I'd have expected it to generate insecure code most of the time.

I'd still not use it. But it's an impressive trick.

bottled_poe4y ago

What’s the baseline? 60% may still be superior to the average implementation.

queuebert4y ago

This is exactly what an AGI would do if it wanted to pwn all our systems.

cannabis_sam4y ago

Of course it did, why would github copilot “care about” security, unless the majority of code on github cared about security?

arvindamirtaa4y ago

Unit tests with TONS of assertions, cleaned data from form to ORM object, stuff that look look like you're just through a list and doing the same thing over and over. For these, Copilot is great. I wouldn't trust it to do anything else though.

Nothing more. Nothing less.

eurasiantiger4y ago

I wonder if the Copilot model could somehow be repurposed to analyze the quality of a developer’s code. Seeing how Microsoft owns both GitHub and LinkedIn, it’s a good bet this is something they’re actively researching.

amw-zero4y ago

If it's trained on code that we write, that sounds completely accurate.

softwaredoug4y ago

I will say I’m not looking forward to writing some mundane code today.

It’s interacting with GCS to scan a bucket for an extension, load the data with pandas, and concat some dataframes. It’s something dumb but mildly finicky that’s going to eat up so much time I could be using for higher value work.

Copilot would be very welcome as I do this, instead of annoyingly going off to Google 3 different python libraries and getting it all to work nicely together.

mzs4y ago

the actual paper:

https://arxiv.org/abs/2108.09293

previous discussion including comments from lead author:

https://news.ycombinator.com/item?id=28279365

RhysU4y ago

See also...

You Autocomplete Me: Poisoning Vulnerabilities in Neural Code Completion

https://deepai.org/publication/you-autocomplete-me-poisoning...

luke2m4y ago

So what? Copilot isn't intended to replace humans, just help them out and maybe reduce typing.

monkeydust4y ago

I think where OpenAI Codex (which is what Copilot uses) gets more interesting is when the allow you to fine-tune the model on your own (trusted) code. That could help reduce the time it takes for new engineers to get up to speed for example.

IlliOnato4y ago

My take on Copilot has not changed. I believe it will make programmers that produce junk code more productive, by being able to produce more junk code in less time.

cblconfederate4y ago

Assuming that this is what it learned from its human counterparts, i'm surprised it 's so low.

ransom15384y ago

Has anyone here got past the wait list? I and my team members have been waiting for months.

lvl1004y ago

Why would anyone use this in production? Just use Sourcegraph if you need help that badly.

spyder4y ago

Well, it wasn't trained to output secure code was it?

mullikine4y ago

But this problem is solved using GitHub's CodeQL for searching and filtering generated code. By combining Copilot with GitHub Semantic and GitHub CodeQL, you have a means of writing and generating the code you want in a secure way. This means that you no longer need the original source code that was used to train Codex. Training Codex and selling as a product in the form of Copilot steals the essence of the original source code used to train it, to build the future of programming, while paying nothing back to the original authors. Even Elon Musk was opposed to OpenAI exclusively licensing to Microsoft GPT.

https://edition.cnn.com/2020/09/27/tech/elon-musk-tesla-bill...

It's so transformative that people may allow it to circumvent licenses.

j / k navigate · click thread line to collapse

155 comments

94 comments · 37 top-level

lmilcin4y ago· 24 in thread

I thought this should have been expected.

Security starts with deep understanding.

I feel like Copilot is the wrong direction to optimize development. This is mostly going to help people with already poor understanding of what they are doing create even more crap.

Also, I feel this is going to help increase complexity by making more copies of same structures throughout the codebase.

It is a little bit like giving answers to your kids homework without giving them chance to arrive at the answer or explaining anything about it.

Another way I feel this is going to hurt developers is competition in who can produce most volume of code.

These guys can produce huge amounts of code with relatively little actual engagement.

The Copilot is probably going to make it even more difficult for people who want to do it the right way because even starker difference in false productivity measurements.

andix4y ago

The real evil here is boilerplate code.

This is also true for a lot of Redux or Angular/NgRx applications. So much boilerplate, that you can't find the relevant code anymore.

lmilcin4y ago

(I have been professionally programming Java backends for the past 16 years).

Java is not the culprit here.

I think it is something that happened on the way that has something to do with J2EE and patterns craze we had a decade ago or two ago.

It doesn't help that frameworks like Spring and their documentation go out of their way to propagate these boilerplate-heavy patters.

3 more replies

simion3144y ago

An IDE or other tools will generate correct boilerplate code. Seems a gripe from someone that prefers hidden magical code that setups code behind their back.

The evil is that someone trained an AI on random text , not even with some AST, so you have garbage in so no surprise you get garbage out.

3 more replies

fn14y ago

This is because "Copy'n'paste" programming get's more and more common.

I see more and more juniors pasting code or shellcommands from StackOverflow with careless ease, without even pretending anymore that they're interested in how it actually works.

merpnderp4y ago

Should look at Vue with the composition-api layer, there's close to zero boilerplate.

A store in Vue 3 can basically be:

  export default { state: readonly(state), ...setterFunctions }

It doesn't get more easy to read and streamlined than that.

amw-zero4y ago

It is true that lines of code are correlated with bugs. In fact, that's the best predictor of the number of bugs - there was some study somewhere that concluded that.

I still doubt that that's a result of DTOs.

captainmuon4y ago

> Security starts with deep understanding.

For example, just like in Copilot you'd start with:

    def login_user(username, password):

But the ontology would also know things like:

- This is a web application and this function is going to be called after submitting a form

- Security specialist Bob says you should always hash your passwords

- Specialist Anne says you should use bcrypt

- Tom says Anne is 95% trustworthy

... and thousands of facts more. And then it would take them all into consideration, build a represenation of the problem you are trying to solve, find a strategy, and only in the end generate code.

lmilcin4y ago

perl4ever4y ago

There was a project started in 1984 to do that:

https://en.wikipedia.org/wiki/Cyc

Supposedly an attempt to assemble a database of "common sense" facts and reasoning.

It has always been controversial and it's not clear what kind of success it's had.

DonHopkins4y ago

You're touching on the "Neat -vs- Scruffy" dichotomy in AI. (But it's not necessarily a dichotomy -- they can be combined!)

https://en.wikipedia.org/wiki/Neats_and_scruffies

From the "Scruffy" side, there's Charles Rich's classic work on "Programmer's Apprentice".

https://dspace.mit.edu/handle/1721.1/6054

https://dspace.mit.edu/bitstream/handle/1721.1/6054/AIM-1004...

>The Programmer's Apprentice Project: A Research Overview

>MIT AI Lab Memo No. 1004, November 1987.

>Rich, Charles; Waters, Richard C.

https://dspace.mit.edu/handle/1721.1/41967

https://dspace.mit.edu/bitstream/handle/1721.1/41967/AI_WP_1...

>Plan Recognition in a Programmer's Apprentice. Ph.D. Thesis proposal.

>MIT AI Lab Working Paper 147, May 1977.

>Rich, Charles

1 more reply

supernovae4y ago

I don't understand why people fear copilot or blame copilot.

Copilot doesn't bypass peer review, code review, unit testing so on and so forth.

1 more reply

gverrilla4y ago

lmilcin4y ago

> Do you think only experts should be programming? I'm an amateur programmer (...)

Amateur vs professional and novice vs expert are completely separate things.

You can be professional novice just as you can be expert amateur.

Now, the answer to your question is an obvious "NO". To be an expert you have to be a novice first.

The problem rather is "Are you making progress towards being an expert or are you just learning to more efficiently execute your novice workflow?"

> The way I see it, that happens because programming is still way more complex than it should be - and copilot will help with that.

No, it is just an illusion of help.

2 more replies

saurik4y ago

1 more reply

macksd4y ago

lhorie4y ago

From my experience with tutoring, I would say that it won't help. The best way to learn is to mechanically do the work. Having things fed to you yield poorer results, IME.

chakkepolja4y ago

This assuming most people using copilot are going to properly read generated code.

LeifCarrotson4y ago

> I feel like Copilot is the wrong direction to optimize development. This is mostly going to help people with already poor understanding of what they are doing create even more crap.

Sometimes you don't need an expert to produce highly secure, highly optimized code.

Barrin924y ago

>Have you seen the crap that people buy at Walmart? The furniture is not heirloom furniture,

the difference is that your sofa isn't programmable and networked into every other appliance in your house underpinned by a general purpose computer rife for abuse.

https://www.reuters.com/business/colonial-pipeline-ceo-tells...

amw-zero4y ago

> Sometimes you don't need an expert to produce highly secure, highly optimized code.

This is one for the ages.

2 more replies

908B64B1974y ago

> I feel like Copilot is the wrong direction to optimize development. This is mostly going to help people with already poor understanding of what they are doing create even more crap.

I can't wait until we start seeing Copilot Natives devs, who had it enabled from the moment they first opened VSCode at their "become an engineer in 3 months" bootcamp.

That's something I really want my competitors to do. Honestly it makes finding stocks to short much easier (or poaching talent...)

metb4y ago

>To management (especially with wrong incentives) this seems like a perfect worker, because management usually doesn't understand the connection between lack of engagement and planning at design/development time with their later problems (or they don't feel it is them that is going to pay the price).

This is how management is in most places I feel, especially when it comes to evaluating junior, and early senior engineers.

bsenftner4y ago

rented_mule4y ago

> This is mostly going to help people with already poor understanding of what they are doing create even more crap.

> Also, I feel this is going to help increase complexity by making more copies of same structures throughout the codebase.

A small sample of areas it's helping me...

toastal4y ago· 9 in thread

I wish there were a “robots.txt” file for Git to disallow certain bots from training on anything I have written.

carl_dr4y ago

> I wish there were a “robots.txt” file for Git to disallow certain bots from training on anything I have written.

It’s simple. If you are concerned by this, don’t host your repositories on GitHub.

tyingq4y ago

You would have to not host your code publicly either, right?

2 more replies

andix4y ago

It is called LICENSE.txt. License your code as GPL and then Copilot can't reproduce bigger parts of your code.

But as long as you give the public access to your code, they can study it and learn from it. Humans and machines.

chrismorgan4y ago

2 more replies

gnomewascool4y ago

I don't think that the GPL gives much more protection than any other FOSS license, here, in practice.

JamesSwift4y ago

I would assume github could supercede your license by putting its own claim to your code in the TOS. I doubt they have done that, but just pointing it out.

1 more reply

EamonnMR4y ago

I'd love to see ML-GPL which specifically deals with using licensed property as a training set.

2 more replies

danShumway4y ago

:) I suppose you could always just add so much insecure code to your Github account that your expected value to Copilot is negative.

Although judging from the results of this test it kind of seems like for a lot of accounts that's already happened.

gfiorav4y ago

Plus, GH doesn't care how you licensed your code. It will learn from it and produced licensed code.

1 more reply

shireboy4y ago· 6 in thread

…Compared to 60% of circumstances in the meat-based developer control group? :)

dimitrios14y ago

It's a tragedy of the commons of a sort.

lhorie4y ago

1 more reply

spywaregorilla4y ago

1 more reply

phreeza4y ago

2 more replies

spywaregorilla4y ago

and of the population that is likely to use copilot in production for their own work? 90%?

lupire4y ago

These are made up numbers. A control group is needed.

mbrevda14y ago· 4 in thread

For comparison, what percentage of human-generated code is secure?

iainmerrick4y ago

It seems reasonable to want Copilot to help you produce code of a reasonable quality.

If it’s just helping you crank out the same bad code more quickly, without learning anything in the process, that’s useful to know. Some people might still want a tool like that, I wouldn’t.

burnished4y ago

Sure. But in order to know if its 'of reasonable quality' you need some sort of baseline to compare it to. What is reasonable quality? I think what your average human does is probably reasonable.

eddieroger4y ago

Copilot shouldn't be able to generate code destined for prod without review any more than should any line of code written by a human.

westurner4y ago

> For comparison, what percentage of human-generated code is secure?

Yeah how did they measure? Did static and dynamic analysis find design bugs too?

Which bots send PRs?

COMMENT___4y ago· 4 in thread

It's painful to see that GitHub Copilot is called "AI". For god's sake, it is not AI. It's just an advanced auto-complete for coders. GPT-3 is close to AI, GitHub Copilot is not.

Jesus Christ, please make them stop. Stop using AI as a buzzword.

arthur2e54y ago

GitHub Copilot is literally GPT adapted for code. The paper on OpenAI Codex, the stuff powering Copilot, makes it very clear in the abstract. https://arxiv.org/abs/2107.03374

Either you call both AI or you call neither AI.

(A previous version of the comment stated that it was tuned from GPT-3. This is incorrect; the simpler GPT was used for faster convergence.)

COMMENT___4y ago

If I have to choose, then I would decide not to call them AI at all.

sprafa4y ago

Is it fair to call it an ML based system?

COMMENT___4y ago

Why not? GitHub Copilot was trained with ML to autocomplete your code. But it is not AI.

evolveyourmind4y ago· 3 in thread

Meaning 40% of the code on GitHub is insecure

auggierose4y ago

No. It means that when c = f(a, b), where a, and b are secure, and you have no clue what f does, it might still be the case that c is insecure.

dkersten4y ago

You could train a model on purely secure code and still have it combine it in insecure ways.

0-_-04y ago

And the other way around.

1 more reply

lampe34y ago· 2 in thread

I'm using copilot now for some time and yeah it's more a toy than real help right now.

The only time it really helped when I needed to create a named list of char codes.

When it comes to more complex code than checking the code of copilot takes the same time as writing it. 90% of the time I needed to correct copilot.

For me, tools like linters are way more helpful then. If I could only use ESLint or copilot, I would go 100% of the time with ESLint.

harlekein4y ago

I think another risk with getting Copilot to start out, is that it might nudge you into a direction you wouldn't have gone into otherwise.

Whether that is better or not, I suppose, it depends.

lampe34y ago

I'm working on a HTML Tokenizer in Deno/Typescript.

Copilot only helps with boilerplate code which could be handled by good intellisense.

When it tries to generate a function from the function name it fails so hard that it is more in your way then helpful.

0-_-04y ago· 2 in thread

I fail to see how this is particularly useful information about Copilot. The comparison should be:

1. How many times do people write insecure code when not using Copilot?

2. How many times do people write insecure code when using Copilot?

nextlevelwizard4y ago

0-_-04y ago

I think it was obvious from the beginning that it's trained on GitHub code, so it would be surprising if it was better than the average code on GitHub.

Also, in how many programming jobs does security matter? In my job for example it doesn't matter at all.

1 more reply

Vaslo4y ago· 1 in thread

It's learning from existing code, right? Doesn't this say something about developers in general, or is the thought that it uses combinations of code that are insecure?

harlekein4y ago

If you would pick any smaller company with a dev team, a freelancer or an agency, your chances of finding a developer who understands and upholds quality code is vastly reduced.

gfiorav4y ago· 1 in thread

I think this should be pretty much expected. I'm unfamiliar with how this network is trained, but I'm pretty sure the data ranking is not perfect.

I'm guessing the ranking features are based on the repo stats, contributor stats, etc. Even "good" contributors will make rookie mistakes in certain areas.

Interesting to imagine how GH will try to solve this issue.

jpalomaki4y ago

makach4y ago· 1 in thread

..does that means we will be 60% more secure than before?

dkersten4y ago

Only if 100% of the code was insecure when written by a human. Given that humans can think through what code is doing, I don’t think that’s a reasonable assumption.

gnrlst4y ago

moretti4y ago

wcarss4y ago

I couldn't find a link to the actual study anywhere in the article: https://arxiv.org/abs/2108.09293

adamsvystun4y ago

It is important to remember that Copilot can improve. 40% is not a bad baseline, but one data point does not give us much info, we should wait and see the rate of improvement.

dexen4y ago

Half joking:

rcarmo4y ago

whazor4y ago

Animats4y ago

wccrawford4y ago

I'm actually impressed with that. There's so much insecure code out there that I'd have expected it to generate insecure code most of the time.

I'd still not use it. But it's an impressive trick.

bottled_poe4y ago

What’s the baseline? 60% may still be superior to the average implementation.

queuebert4y ago

This is exactly what an AGI would do if it wanted to pwn all our systems.

cannabis_sam4y ago

Of course it did, why would github copilot “care about” security, unless the majority of code on github cared about security?

arvindamirtaa4y ago

Nothing more. Nothing less.

eurasiantiger4y ago

amw-zero4y ago

If it's trained on code that we write, that sounds completely accurate.

softwaredoug4y ago

I will say I’m not looking forward to writing some mundane code today.

Copilot would be very welcome as I do this, instead of annoyingly going off to Google 3 different python libraries and getting it all to work nicely together.

mzs4y ago

the actual paper:

https://arxiv.org/abs/2108.09293

previous discussion including comments from lead author:

https://news.ycombinator.com/item?id=28279365

RhysU4y ago