Demo of an OpenAI language model applied to code generation [video] (opens in new tab)

Bert is a language model. It's trained to predict the next character in a sequence. It does not have any capacity to "understand" programming, or anything at all. It can also not produce outputs that are not similar to the examples it's been trained on. Like all neural net models it can interpolate between its examples, but it can't extrapolate to regions of the sample space it's never seen. This is why I say it lacks the ability to innovate.

I'm not sure how you would combine AutoML-Zero with Bert. How do you mean?

yazr6y ago

What do you think is a more productive path leading to "AutoCode" ?!

A. Add external definitions or reward formalism to make the code-space easier to search?

B. Keep adding code trees, execution traces, comments, memory dumps and learn from those?

My own instinct is that AlphaZero was a lot more convincing than AlphaStar, so lots of (A) is definitely needed

DJHenk6y ago

> In other words: everyone can relax. This will not take your job. Or mine

Of course not. This technology converts writing code into bug hunting in pre-written code. Finding bugs in code that you did not write is way harder than writing the code yourself.

So if anything, this makes programming harder, not easier, and we will need more programmers, not less.

OOPMan6y ago

Oh dear.

And then the model trains itself on the buggy code written and poorly debugged by these extra coders and then so on and so forth.

Codepocalypse.

Kill it with fire!

westurner6y ago

> At best this is like having an exceptionally smart autocomplete function that can look up code snippets on SO for you (provided those code snippets are no longer than one line).

Yeah, all it could do for you is autocomplete around what it thinks the specification might be at that point in time.

> But what if Andy gets another dinosaur, a mean one? -- Toy Story (1995)

joshuak6y ago

I agree completely with your expectation of the abilities of such a system.

However, I think very little programming labor is employed in the construction of new algorithms or even most business logic, even a casual stroll through github reveals a staggering amount of reimplementation.

I think the promise here is the ability to code in a more conceptual way with less fiddling with the finicky details.

Swizec6y ago

> I think the promise here is the ability to code in a more conceptual way with less fiddling with the finicky details.

This is basically how product managers code. Or former engineers turned engineering managers. Or even team leads. Hell, maybe like an architect?

You come up with a rough sketch, design the system, think through a couple edge cases, tell the computer what you need, and the computer figures out the details for you. Similar to being a high level engineer that designs/defines/codes the broad strokes of something and then lets the lower level minions handle details.

We made a similar leap when compilers were invented.

I agree- that's why I think such a system is not capable of innovation.

In the same way, that's why I think it would be a useful tool: it promises to automate away the kind of coding that most programmers can do with eyes closed and that's the most boring and repetitive part of the job.

Like, without trying to demean it, it sounds like a great boilerplate generator.

gradys6y ago

I'd put it differently. This is going to take your job, just like an assembly programmer from the 70s might consider Python to have basically taken their job. In software, the job is constantly eating itself and transforming.

It's part of the job to continually incorporate new capabilities and lever yourself up.

BaronSamedi6y ago

I agree. While this is well done, it seems to be copying human programming techniques rather than allowing the AI to create code that it thinks is optimal. I think there is the potential to evolve efficient and secure code that is free from the constraints we impose on it due to the way our minds work. Such code may not be intelligible to us but could very well be much better than what we could write.

random328406y ago

An AI like this can hold a hell of a lot more information in its head at one point than a human. Each decision it makes is based on way more context, it can manipulate the problem using much more information, much faster. The problem is that it can't think in abstractions.

If AI gets to the point where it has a reasonable understanding of the shape of the data & the basic spatial manipulations being applied (not far off IMO), I'd expect it to be waaaaaay better at discovering certain types of new algorithms than humans. It can handle thinking about algorithms that have millions of independently moving parts in a way a human can't.

Humans have the edge deriving algorithms that require a sequence of high-level steps on an abstraction. "Do this, then we get a thing, then we do some stuff to the thing, stretch it, squash it, massage it." AI sucks at that, it doesn't think in the same kind of flexible abstractions.

But imagine if you build an understanding of how the code will be compiled & how that will interact with the cache into the AI. That's very difficult for humans because you can't think about all those mechanics at once, we have to focus on one at a time. An AI that really gets it? I could see it writing a better sorting algorithm for a specific, complex datatype than a human could, or at the very least having the competetive edge because it can do it basically instantly.

izabera6y ago

How often does the average programmer come up with a new sorting algorithm?

pharke6y ago

Yeah I'm thinking it would be more useful to have a really well indexed library of functions accessible by search.

gameswithgo6y ago

alphago and alphastar were certainly creative. this project in its current state may not have that capacity but it also may not be a huge leap to get there.

bobly_today6y ago· 8 in thread

So are we all going to be out of a job?

ben_w6y ago

When AI can reliably convert business-speak into efficient bug-free code, everyone will be out of a job, because the business owners will ask it to write them another AI to replace every other task their business does.

hendzen6y ago

Until the AI starts generating business-speak...

tanilama6y ago

No, not remotely.

If people are truly novice, with zero programming experience, how would they know the code is correct? If not, how to debug it?

I would say this is more promising for scenarios to generate more formularitive things like business report generation. But even that, it requires in depth understanding of what those data/tables really means, and how to handle exceptions, etc.

gnramires6y ago

Kind of. A theme in programming since the beginning has been automation. You're using a computer anyway to do a more or less well defined task, naturally programming itself is one of the prime targets for automation.

Programming languages are automation tools. Libraries. Frameworks. It should be pretty clear this is a lasting trend, and doesn't necessarily mean programmers will have more or less jobs (due to well known effects of automation such as enabling new applications and increasing demand from increased productivity). It does mean you probably need to keep learning to stay relevant, and use those tools to your advantage!

Not entirely, but might will lower the barrier to entry and make you require fewer developers.

cjlovettOP6y ago

Not yet, but I think our role will get more and more "meta"

0xdeadbeefbabe6y ago

Programming in yaml happened somewhere today.

not if we can get a head start on the market for artisanal code developers.

mring336216y ago· 6 in thread

Amazing!

So the developer's role will shift to:

1) writing good enough descriptions of the code to be generated by the AI model

2) fixing any little issues in the generated code

cjlovettOP6y ago

Yeah, I guess developers will have to all become data scientists to help train these A.I's to write better code :-) Perhaps there will be a new business model around selling your higher quality code to help train the A.I to be better and better... so we need to label code "good" and "crap" so the A.I. can avoid learning from crappy code :-)

swiley6y ago

It’s not that we don’t have enough code it’s that the code doesn’t do what we want.

To get this you can just grab any random 18 year old who knows js and have them hack something out. No one hires 18 year old js hackers though and there’s a reason for that.

detay6y ago

that would be supervised ai training, until 3) programmers become obsolete.

ttul6y ago

I don't think programmers will become obsolete. They will just waste less time on boilerplate and reading Stackoverflow articles to figure out how to do XYZ. Why not have an AI do that for you so that you can focus on the creative stuff? Programming tools help us work more productively, which leads to larger and more complex systems in less time - a win for everyone.

mooseburger6y ago

Once programmers become obsolete, it's likely a matter of days until the human race becomes obsolete.

3) writing the in-house or market competitive service alternative ;)

IdiocyInAction6y ago· 5 in thread

How does this do compared to other models? Is this a totally cutting edge result? On the surface, it seems quite impressive, but sans an environment to try it out with, I cannot be entirely sure. Still, this does make me question whether I chose a safe career, haha.

The thing is, I'd really need to see a live demo to see how good this is. Making mistakes is actually kind of a big issue; as most people know, debugging code is harder than writing it. And a lot of the language models which can write impressive-seeming text also generate masses of garbage. There's no way to know whether this was cherrypicked or not.

The mere fact that it can extract meaning from text like this is already really impressive though.

bglazer6y ago

I've read a fair number of papers on neural program synthesis lately. To me, these seemed to be obviously cherry picked examples, so you can't really evaluate the whole system based on them.

However, this is fairly impressive for a couple reasons. First, the system constructs programs from natural language descriptions, rather than examples of input-output pairs or a formal specification, which are the most common settings for program synthesis. Second, they're generating full blown python, not a smaller, domain specific language.

Finally, and this is pretty mind-blowing, is the seamless, idiomatic use of loops, branches, and function calls. I haven't seen previous program synthesis tools able to generate such complex code. They're typically limited to simple linear programs with less than about 100 lines. Complex control flow and function calls are still beyond their reach for the most part.

I'm not an active researcher in neural program synthesis, so my statements may not reflect the current state of the art.

I honestly thought that the most promising route forward for program synthesis would be a model that incorporated knowledge of the syntax and semantics of code. Most likely, a model that manipulated, or at least had some view of, the program's AST. This seems to be just throwing a giant Transformer model at github.

Fine tuning a vanilla language model on a giant corpus of code feels like a dead end for the field, long-term. It seems obvious to me that humans are doing something more than just statistical pattern recognition and generation when we write and reason about code.

Then again, it's hard to argue with results. I'm sure lots of pre-neural network voice recognition researchers were in love with the elegance of their hidden markov models.

Edit: Also, everyone should go try the FlashFill feature in Microsoft excel. As far as I know, it's the only example of program synthesis shipped in a consumer facing production system, and it works shockingly well.

> Fine tuning a vanilla language model on a giant corpus of code feels like a dead end for the field, long-term. It seems obvious to me that humans are doing something more than just statistical pattern recognition and generation when we write and reason about code.

Yeah, this is the main reason why I would be interested in more examples. But, if this thing was trained on all of GitHub, I could imagine that it come up with decent-looking code for a lot of examples; a beefy, smarter Google with some rudimentary contextual understanding, if you will. Still, the presence of any mistakes is a no-go and I'd be really interested how it reacts to more realistic, specific requirements.

But yeah, I'd figure a model for code generation would have to have some kind of knowledge of syntax and semantics, rather than doing pure statistical pattern matching, to be of any real use. It would not only have to generate, but also to debug its code (I wonder whether you could do that purely with statistical pattern recognition). I might be wrong, of course, but I would be surprised if that is enough to write complex code.

>> Edit: Also, everyone should go try the FlashFill feature in Microsoft excel. As far as I know, it's the only example of program synthesis shipped in a consumer facing production system, and it works shockingly well.

And it's not a giant language model trained on a gigantic dataset. Rather, if memory serves, it's a buch of task-specific DSLs and rules, all hand-written from scratch.

MauranKilom6y ago

I am also hedging my hopes of this working on "more realistic" scenarios. It does produce code that looks natural to us, but i expect it to show clear "seams" where its understanding of something isn't deep enough.

But maybe this is just a question of how much compute (and network size/"depth") you invest. On a certain level we're also just some recurrent LSTM :)

bo10246y ago

Ha. You hit the nail on the head. There is no rigorous way to measure AI-generated anything. (to my knowledge) So every demo is "ooh look at this" and actual performance is not scientifically evaluated, because we don't know how. This includes images, text, etc.

neil_s6y ago· 4 in thread

I had trouble accessing the relevant video snippet even after going through the conference registration, so here's a summary.

You can view the demo at https://twitter.com/i/broadcasts/1OyKAYWPRrWKb starting around 29:00.

It's Sam Altman demoing a massive Open AI model that was trained on GitHub OSS repos using a Microsoft supercomputer. It's not Intellicode, but the host says that they're working on compressing the models to a size that could be feasible in Intellicode. The code model uses English-language comments, or simply function signatures, to generate entire functions. Pretty cool.

modeless6y ago

Link to timestamp: https://www.pscp.tv/Microsoft/1OyKAYWPRrWKb?t=29m19s

sama6y ago

Thanks, but it's Sam McCandlish doing the demo (and the project).

KhoomeiK6y ago

I'm confused. Is that not you doing the OpenAI demo around 29:00?

BubRoss6y ago

Great, even more lag coming in the next version of visual studio.

grensley6y ago· 4 in thread

Wow, this has the ability to be a total gamechanger. You have to be really observant about the bugs though, I would have totally missed the one with the price discount without executing it.

netsec_burn6y ago

By lowering the barrier of entry of programming further, I wonder if we'll see more bugs (like the price discount) as a result of this?

colordrops6y ago

Similar problem to automated driving - as long as it's better than most humans, occasional bugs will be ok. Virtually no software is bug free.

It's much more difficult problem than automated driving though - for software, the space of intents of the user is orders of magnitude greater in size. It's the job of the model to determine the intent of the "programmer". Perhaps we could meet the model half way and come up with heavily-structured natural language to communicate intent.

bufferoverflow6y ago

You still need a programmer to find the bugs. I think it's actually harder to spot and fix a bug, than to write a simple method that involves one.

joshuak6y ago

I notice that the bug was in the user's failure to communicate the intent of the scalar. Presumably with regular use users would learn to be more clear and/or anticipate the likely fixes to ambiguous labels.

Also, since it would be used to build tests as well, I'd expect such misunderstandings to be pretty obvious. I would be willing to bet you'd see a net reduction in bugs, and a substantial reduction in typo related bugs.

But if you mean by lowered barrier of entry you mean the population of programmers would be less competent, yes bugs in the design might increase, however being able to more quickly get to the point of evaluating a design is a great way to learn better design.

symplee6y ago· 4 in thread

Can this freaky A.I. also generate the corresponding unit tests?

Or, for TDD, generate the unit tests first based on the function name and description. Then, if the dev updates any of those tests, or adds more tests, use that information in auto generating the appropriate code.

simonhughes226y ago

Towards the end of that section, he mentions they have also used it to generate unit tests. I doubt it's doing full TDD, but it seems they are part of the way there.

swiley6y ago

That actually sounds bad.

cjlovettOP6y ago

Great question, I actually think writing test code is harder than writing the product code.

pseudosudoer6y ago

At the end of the chat with OpenAI they mention that their model can be used for generating unit tests as well.

tanilama6y ago· 3 in thread

I mean it is cool.

But there is the thing, the natural description of a function is not always this unambiguous.

When you are telling a function to 'compute XYZ', what you are actually doing is 'check whether X.a exists, if so execute branch 1), else branch 2)'.

If the logic gets really complicated, then describing it accurately in human language isn't necessarily faster than doing it in code directly. Otherwise, we don't need invent programming languages like at all, we can just write compilers to interpret and execute human languages.

And I am interested, as whether the model itself is conditioned on the type constraint of class. It is neat that they pick Python in this case. But if it is Java or other static typed language, would this system condition its generation not only the natural text, but also the resulted type system? My bet, per my understanding of the language modeling approach they use is, they are not doing this, due to very high complexity and cost of the training, and domain adaptation.

Overall, this again is an interesting demo. But I think for code generation based on human language to be useful, we are really in a scenario, that you need to go 99% accurate for it to be remotely practical.

nerdponx6y ago

This might be more useful for a task like "read files off a list, and download them in parallel, with no more than 20 concurrent downloads." That particular task might be a one-liner in some programming languages, but there are a lot of programs like that which need significant bookkeeping and/or boilerplate even though their plain-language description of intended behavior is not complicated.

Or implementing a sophisticated protocol that has a formal specification. If you can express the correct behavior in some kind of pithy pseudocode, a tool like this could "compile" that to code in various programming languages. Like a super-powered version of SWIG.

MiroF6y ago

I agree that code generation of complex functions is hard.

But I think the example given of unit testing - ie. natural language description of specific behavior of function -> code is extremely useful.

tanilama6y ago

Unit testing is a good use case.

But that would require the condition on the type system, meaning the code-gen needs to understand the object's interface, which while not impossible in current techniques, but hard enough due computation complexity.

Again I don't dispute this tool being interesting. But claims it to be ground breaking or game changing is simply not right.

Majority of programmers time, is not typing down the code. It is to look at the comment/description, think about it, edit some code, then rethink then edit again.

This tool has potential to solve some typing time, but it still not going to things fundamentally.

swalsh6y ago· 3 in thread

These are just baby steps, but holy shit is that impressive. It kind of feels like working with offshore devs, but it's in real time.

nnq6y ago

...that's mildly insulting

LAMike6y ago

Only if you assume what shore he's referring to.

29athrowaway6y ago

I've worked with developers from all around the globe. While it's true that some cannot even write fizzbuzz, some others can be extremely brilliant individuals with an excellent work ethic.

corbins6y ago· 3 in thread

Mirror: https://twitter.com/i/broadcasts/1OyKAYWPRrWKb

dang6y ago

Ok, we've changed to that from https://mybuild.microsoft.com/sessions/6c6ecd46-c39c-49d8-ba.... Thanks!

If anyone knows a way to link to the start of the demo at 28m30s, or thereabouts, we can modify it again. (Edit: maybe https://blog.twitter.com/en_us/topics/product/2018/video-tim... can be used to make that work?)

modeless6y ago

This sort of works: https://www.pscp.tv/Microsoft/1OyKAYWPRrWKb?t=29m19s

It's not obvious that it works until you hit the play button and it starts at the right time. Seems like the only way to get it to play automatically is to embed it in a tweet like this: https://twitter.com/modeless/status/1263222139840167936

yread6y ago

Pretty amazing. Starts at around 28:00

datlife6y ago· 3 in thread

I can't see anything

dang6y ago

We've changed from https://mybuild.microsoft.com/sessions/6c6ecd46-c39c-49d8-ba... to a more accessible video.

spery6y ago

Video should be there but their site is frustrating. Both for visibility and reliability.

alchemyromcom6y ago

Maybe their AI can whip up something better soon.

ipsum26y ago· 3 in thread

Is there a way to watch this without an account?

dang6y ago

We've changed from https://mybuild.microsoft.com/sessions/6c6ecd46-c39c-49d8-ba... to a URL that anyone can view.

ipsum26y ago

Thanks dang!

pjmlp6y ago

Eventually they are uploaded to YouTube and Channel9.

Vysero6y ago· 3 in thread

I would much rather have an AI that is capable of interpreting what I say as code. So if I say:

Build me a class which computes the larger of two integers.

The AI is smart enough to write it.

reducesuffering6y ago

Did you watch the video? The AI is interpreting the comments as instructions on what code to generate. That's 95% of the solution, since the comments are just english and there already exists an abundance of NLU models in things like Alexa, Google, etc, that take speech input and produce english output, like the code comments.

digeverything6y ago

This is exactly how it works. Incredible. Checkout the mirror on Twitter shared by +corbins. https://twitter.com/i/broadcasts/1OyKAYWPRrWKb

nilkn6y ago

You should watch the video. It's closer to what you're suggesting than I think you realize.

parksy6y ago· 2 in thread

I have thought about this before but I can see that logical errors are introduced which must be manually tested and reviewed anyway, so what if a more reliable approach could be achieved by training these data sets on test cases alongside passing code?

This way developers just write unit tests or functional tests, and the AI generates code and retrains itself until the code passes for all tests. This could happen silently in the background as the developer defines the tests.

A number of natural language test frameworks exist, Behat for example lets you define tests such as:

Feature: Multiple site support

  Background:
    Given a global administrator named "Greg"
    And a blog named "Greg's anti-tax rants"
    And a customer named "Wilson"
    And a blog named "Expensive Therapy" owned by "Wilson"

  Scenario: Wilson posts to his own blog
    Given I am logged in as Wilson
    When I try to post to "Expensive Therapy"
    Then I should see "Your article was published."

  Scenario: Greg posts to a client's blog
    Given I am logged in as Greg
    When I try to post to "Expensive Therapy"
    Then I should see "Your article was published."

It could still fit the dream of describing to a computer what kind of program you want and having it figure out the plumbing.

Anyway interesting work. Very interesting. I remember a few colleagues laughed at me no more than 5 years ago when I suggested that AI would eventually write code. And here it is, in an early version, flawed surely but only set to improve.

Edit to add: This subject while insanely interesting to me is well out of my wheelhouse. I'm guessing there's possibly semantic structure to the above that the type of model being used in the demo can't deal with? Like this one use-case has to co-exist in an entire ecosystem of dependencies and related entities... Could the model cope with that or is it just calculating the likelihood of the next character like other models I've seen, but with insane accuracy when it comes to code?

BaronSamedi6y ago

Instead of Test Driven Development, Test Only Development? I like that idea. This reminds me of an article I read a while ago on co-evolutionary training in genetic programming: one algorithm evolving to do something, with another evolving to break it.

parksy6y ago

Yeah that's a good way of putting it. Also has a catchy name, "TOD".

Ultimately as well we don't care what the code looks like, if it passes all tests then it "works". You probably don't even need to generate the code in a high level language, if people aren't ever going to really read it.

You'd probably need tests designed to ensure the code is executes quickly enough and automatically generate edge case test data so you don't end up with a blog where you can only post articles with the titles in the exact test data heh.

The future seems interesting for us developer types anyway. If a product designer could express their requirements in plain language developers would only really need to be around for cases where the models failed and more training data was needed to improve them.

sailingparrot6y ago· 2 in thread

I'am a bit confused, is this built by OpenAI or Microsoft? Microsoft released the paper IntelliCode Compose: Code Generation Using Transformer [1] 4 days ago and there is no attribution to anyone from OpenAI in it.

Are those two entirely separate and yet exactly similar initiatives?

[1]: https://arxiv.org/abs/2005.08025v1

p1esk6y ago

IntelliCode Compose is built around a multi-layer generative pretrained transformer model for code (GPT-C), which is avariant of the GPT-2

GPT-2 is built by OpenAI

sailingparrot6y ago

I am aware of this, I am referring to the video, where Sam Altman (CEO of OpenAI) is presenting the demo and saying "we have built", while Kevin Scott (CTO of MSFT) is saying that it's the first time he has seen that. So this is clearly marketed as OpenAI's work, not just saying that the model is based on their work.

chrisco2556y ago· 2 in thread

Is this a demo of their AI 'autocomplete' tech that they've built into Visual Studio and VS Code?

nickysielicki6y ago

It includes a segment with Sam Altman doing python code generation from nothing other than signatures and comment strings. Pretty incredible -- assuming the demo isn't entirely smoke and mirrors.

raggi6y ago

You can get an equivalent demo now with tabnine

brenden26y ago· 2 in thread

I can't even imagine what it's like to have so much money that you can spend time working on things like this which are so incredibly unlikely to ever become useful. Congrats and I hope you guys discover a great product some day.

ignoranceprior6y ago

> incredibly unlikely to ever become useful

Want to bet on that?

brenden26y ago

I don't have any money left for gambling.

monkeydust6y ago· 2 in thread

As a product person wondering how much more productive this will make my engineers? In the surface looks impressive.

MauranKilom6y ago

I assume you have read a few scientific papers before.

This tool is the programming equivalent of an AI writing scientific papers based on an abstract. It can follow all the formalities really well. It can write beautiful English sentences. It might write formulas or produce graphs. It will dot the i's and cross the t's.

But it's unclear whether what it says is actually correct and logically coherent when used in the main part of the paper (and not just for the introduction of the paper) or just pleasing-sounding nonsense.

It can certainly help make engineers look more productive, just like it could help someone write papers at record speed. Whether the results can/will have any deeper value is yet to be determined. Maybe it will just be used for the "boring" tasks - like the paper introduction.

My personal fear is that it will be very good at writing code that looks ok, even though there is a serious flaw. Essentially, programmers tend to become good at spotting irregularities in the code corresponding to common human errors. The mistakes of this AI might be much harder to spot because they don't stand out in the same way.

nmfisher6y ago

The samples are pretty impressive, and that's coming from someone who's intimately familiar with the internals of these these kind of models.

From bitter experience, though, I also know how unreliable these models can be. It's possible (indeed, likely) they generated 20 samples, threw away 19 that were garbage and just showed you the one that looked nice.

If - and that's a big if - it reliably generates good-quality code, then it would probably be a nice productivity boost (~25%, particularly when it comes to tests).

Based on what I've seen over the past few years, though, I'm skeptical.

alpb6y ago· 2 in thread

I tried signing in with my Microsoft account as well, nope, they want you to definitely go ahead and fill out a registration form for Build conference https://register.build.microsoft.com/, not gonna happen. Hope they learn not to paywall conferences of this kind, their competition just puts it out on YouTube live.

dang6y ago

We've since changed the URL from https://mybuild.microsoft.com/sessions/6c6ecd46-c39c-49d8-ba... to one that anyone can view.

pjmlp6y ago

No problem, there are plenty of us already on the sessions.

gradys6y ago· 1 in thread

I worked on project very much like this last summer, a transformer language model applied to code completion.

You'd be surprised how easy it is to get a model that performs as well as what you see in the video. And it's even easier now that people have built great libraries for fine-tuning generative language models.

I encourage you to try it yourself! There are many interesting extensions for people to explore:

- Use bi-directional context (vanilla GPT-2 only sees backward context)

- Integrate with semantic analysis tools.

- Experiment with different context representations. You condition the model on an arbitrary sequence of N tokens. It's not necessarily the case that you should spend that whole budget on the N tokens that came immediately before. What about including the imports at the top of the file? What about the docstrings for functions that were just used? What about the filepath of the current file?

Don't look at something like this as though watching your job be automated away. Look at it as a tool that you can master and use to move up the stack.

arielroth6y ago

Did you explore all of these things? What were your results?

cjlovettOP6y ago· 1 in thread

Hey, now we have a reason to write proper unambiguous code comments :-)

tanilama6y ago

I'd rather take code in that case.

Avi-D-coder6y ago· 1 in thread

GPT2 is known to be unable to track and bind variables, scaling purely associative models beyond the trivial examples is going to be difficult or more likely impossible.

This will end up being a better tabnine. Models like GPT2 are still just approximating intelligence, they are not rationally cognizing.

gradys6y ago

Curious what analysis you're referencing here.

While I don't doubt people have shown that various transformer models have certain limitations, I'm pretty bullish on transformer models in general.

Here's a post exploring the application of transformers to symbolic mathematics for instance: https://medium.com/analytics-vidhya/solving-differential-equ...

unixhero6y ago· 1 in thread

Uhm. What if you could use this to produce code to improve ML libraries. Quite recursive or what.

spookyuser6y ago

  def build_agi(CEV):
    """Build an AGI with the specified CEV"""

neatze6y ago· 1 in thread

Not going to build my hopes up, but looking forward for automated tests generation.

adeledeweylopez6y ago

Honestly I think that will be harder for this kind of system than writing this sort of code. It takes a certain kind of creativity to think of ways that code could potentially fail, and it also requires fairly deep understanding of the intent behind a feature.

boolcow6y ago· 1 in thread

When is OpenAI planning to actually solve a hard problem? They have spent a huge amount of money and time creating useless demos so far.

Creating flashy AI demos relatively easy. Creating important AI products that actually operate in the real world is the difficulty.

forgotmyhnacc6y ago

Does it matter? OpenAI is run as a research lab, not a startup. If they run out of money, the investors will eat the loss.

consultutah6y ago· 1 in thread

Is there anyway to get to the video? I'm registered for build, but the page is all but empty...

dang6y ago

We've switched the URL from https://mybuild.microsoft.com/sessions/6c6ecd46-c39c-49d8-ba... to a video that anyone can view.

cjlovettOP6y ago· 1 in thread

Kevin Scott demos a new AI that is writing code in collaboration with a developer... very cool!

dang6y ago

Please post these in a form that doesn't require jumping through complex hoops to watch or read. If that means waiting until they're uploaded to a more accessible place, that's fine. On HN there's no harm in waiting. https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu...

Also, please don't rewrite titles to make them baity. That's against the site guidelines: https://news.ycombinator.com/newsguidelines.html

Edit: other users have helpfully posted a link to the video that anyone can view, so we've switched to that from https://mybuild.microsoft.com/sessions/6c6ecd46-c39c-49d8-ba... and restored the submission.

simonhughes226y ago

This is really cool. However, I doubt it can write more than very simple functions. That may be enough to be useful however. It would be nice if they created a demo page where we could try this out. This use case is a little different than the auto-complete one.

jfoster6y ago

I wonder if this could be trained on just bug fix commits from GitHub in order to produce a model that could suggest bug fixes for an existing code base.

Jach6y ago

I don't see it replacing (or even much augmenting) professional programming any time soon... My predicted use case for this is mostly with non-programmers. They'll be instructed to write in English what they want to be done, and behind the scenes this will attempt to generate code, execute it, and give the results. A fun demo would be writing "Download the recipe on this webpage (paste link) and order the ingredients from Safeway". If it could generate its own billing and shipping storage to remember indefinitely after getting it from the user, then generate the relevant web scraping / web driving or API code for various websites, that'd be pretty sweet.

f47il6y ago

Relevant section https://youtu.be/fZSFNUT6iY8

rpiguy6y ago

Donald Knuth would be proud! (it appears proper commenting is very important to the AI's ability to generate code)

imranq6y ago

Where this would be most useful is automated testing suites just by specifying what you are testing for. A product manager looking to test portions of a system that absolutely need to work can specify code comments and generate 1000s of tests this way.

This is a gamechanger for ensuring the reliability of software. Many more people can be involved in the software development process, and inject their domain knowledge into it.

Are there any plans to open source the model? I would love to play around with it.

Debonnys6y ago

Glad to see it learned to use spaces instead of tabs.

In all seriousness, the demo really looks amazing. I'm curious to see more elaborate, real world examples though.

raghavgoyal146y ago

Imagine all the Stackoverflow accepted answers funneled into your code just because the answers were repeatedly used multiple times in the training data.

AJRF6y ago

Very cool work.

However; I fear this moves software engineering closer to the role of something like plumbing.

I've despaired at the state of most software I've used since as far back as I can remember, except when it comes to tools that have the maturity of something like linux, git, emacs, vim and the unix tools.

For software to get good - it needs to be deeply understood by at least one person working on it. If you train an army of warrior drones who get full line autocompletion first they'll start forgetting what types this method takes as its parameters, they'll be less likely to explore codebases instead plugging in the first autocompletion that comes to their editor.

There bosses will of course want this in the name of "Getting Shit Done". We already have this sort of divide between developers, those who heavily lean on their tools and those who use minimal editor help. Once you are forced to learn a tool because your tool isn't spoon feeding you, you have a chance to better reason from first principles using the code you have available. I don't think it's a shock that a very high percentage of the very best developers use emacs or vim with minimal tooling.

I am aware that this whole comment has subtle tones of superiority and elitism and I am genuinely sorry for that but in my experience it's just true that people who lean really hard on their IDEs to do everything for them are less able to develop creative solutions and you can tell from having conversations with them that they don't really understand what they are doing.

random328406y ago

Is there an example of something like this, but trained on the actual abstract syntax tree manipulations that are going on behind the scenes?

That seems like it would be considerably more effective, because you're removing the noise/overhead of parsing the text and giving a much clearer model of what's being manipulated to the AI.

yeldarb6y ago

I was very surprised how well it did mimicking the StackOverflow archives when I trained GPT-2 on them last year: https://stackroboflow.com (Only the 345M weights were released back then; now I'm curious how much better 1.5B would do.)

Bjorkbat6y ago

Looks cool. If you want to temper your expectations though, play some AI Dungeon.

woile6y ago

Can someone explain me how these kind of softwares are shared? Would I need to train it again? Or usually the trained models are provided?

Is this one in particular open source?

sabujp6y ago

I think have something autogenerate tests would be a good first start

debbiedowner6y ago

How can I try it? And what is the compute cost?

mirekrusin6y ago

Comment driven development, nice.

master_yoda_16y ago

the title is misleading

pdeligia6y ago

This is super cool!

darepublic6y ago

I don't want to believe

testeur6y ago

def is_even(x):

rauf116y ago

find odd numbers from list

j / k navigate · click thread line to collapse

152 comments

133 comments · 48 top-level

YeGoblynQueenne6y ago· 16 in thread

So that's basically program synthesis from natural language (ish) specifications (i.e. the comments).

In other words: everyone can relax. This will not take your job. Or mine.

____________

[1] I apologise to the people who know me and who will now be falling off their chairs. OK down there?

gwern6y ago

raghavgoyal146y ago

I agree on the tab-completion part. Something like Gmail's smart-compose could have potentially huge benefits here.

Edit: typo

I'm not sure how you would combine AutoML-Zero with Bert. How do you mean?

yazr6y ago

What do you think is a more productive path leading to "AutoCode" ?!

A. Add external definitions or reward formalism to make the code-space easier to search?

B. Keep adding code trees, execution traces, comments, memory dumps and learn from those?

My own instinct is that AlphaZero was a lot more convincing than AlphaStar, so lots of (A) is definitely needed

DJHenk6y ago

> In other words: everyone can relax. This will not take your job. Or mine

Of course not. This technology converts writing code into bug hunting in pre-written code. Finding bugs in code that you did not write is way harder than writing the code yourself.

So if anything, this makes programming harder, not easier, and we will need more programmers, not less.

OOPMan6y ago

Oh dear.

And then the model trains itself on the buggy code written and poorly debugged by these extra coders and then so on and so forth.

Codepocalypse.

Kill it with fire!

westurner6y ago

> At best this is like having an exceptionally smart autocomplete function that can look up code snippets on SO for you (provided those code snippets are no longer than one line).

Yeah, all it could do for you is autocomplete around what it thinks the specification might be at that point in time.

> But what if Andy gets another dinosaur, a mean one? -- Toy Story (1995)

joshuak6y ago

I agree completely with your expectation of the abilities of such a system.

I think the promise here is the ability to code in a more conceptual way with less fiddling with the finicky details.

Swizec6y ago

> I think the promise here is the ability to code in a more conceptual way with less fiddling with the finicky details.

This is basically how product managers code. Or former engineers turned engineering managers. Or even team leads. Hell, maybe like an architect?

We made a similar leap when compilers were invented.

I agree- that's why I think such a system is not capable of innovation.

Like, without trying to demean it, it sounds like a great boilerplate generator.

gradys6y ago

It's part of the job to continually incorporate new capabilities and lever yourself up.

BaronSamedi6y ago

random328406y ago

izabera6y ago

How often does the average programmer come up with a new sorting algorithm?

pharke6y ago

Yeah I'm thinking it would be more useful to have a really well indexed library of functions accessible by search.

gameswithgo6y ago

alphago and alphastar were certainly creative. this project in its current state may not have that capacity but it also may not be a huge leap to get there.

bobly_today6y ago· 8 in thread

So are we all going to be out of a job?

ben_w6y ago

hendzen6y ago

Until the AI starts generating business-speak...

tanilama6y ago

No, not remotely.

If people are truly novice, with zero programming experience, how would they know the code is correct? If not, how to debug it?

gnramires6y ago

Not entirely, but might will lower the barrier to entry and make you require fewer developers.

cjlovettOP6y ago

Not yet, but I think our role will get more and more "meta"

0xdeadbeefbabe6y ago

Programming in yaml happened somewhere today.

not if we can get a head start on the market for artisanal code developers.

mring336216y ago· 6 in thread

Amazing!

So the developer's role will shift to:

1) writing good enough descriptions of the code to be generated by the AI model

2) fixing any little issues in the generated code

cjlovettOP6y ago

swiley6y ago

It’s not that we don’t have enough code it’s that the code doesn’t do what we want.

To get this you can just grab any random 18 year old who knows js and have them hack something out. No one hires 18 year old js hackers though and there’s a reason for that.

detay6y ago

that would be supervised ai training, until 3) programmers become obsolete.

ttul6y ago

mooseburger6y ago

Once programmers become obsolete, it's likely a matter of days until the human race becomes obsolete.

3) writing the in-house or market competitive service alternative ;)

IdiocyInAction6y ago· 5 in thread

The mere fact that it can extract meaning from text like this is already really impressive though.

bglazer6y ago

I've read a fair number of papers on neural program synthesis lately. To me, these seemed to be obviously cherry picked examples, so you can't really evaluate the whole system based on them.

I'm not an active researcher in neural program synthesis, so my statements may not reflect the current state of the art.

Then again, it's hard to argue with results. I'm sure lots of pre-neural network voice recognition researchers were in love with the elegance of their hidden markov models.

And it's not a giant language model trained on a gigantic dataset. Rather, if memory serves, it's a buch of task-specific DSLs and rules, all hand-written from scratch.

MauranKilom6y ago

But maybe this is just a question of how much compute (and network size/"depth") you invest. On a certain level we're also just some recurrent LSTM :)

bo10246y ago

neil_s6y ago· 4 in thread

I had trouble accessing the relevant video snippet even after going through the conference registration, so here's a summary.

You can view the demo at https://twitter.com/i/broadcasts/1OyKAYWPRrWKb starting around 29:00.

modeless6y ago

Link to timestamp: https://www.pscp.tv/Microsoft/1OyKAYWPRrWKb?t=29m19s

sama6y ago

Thanks, but it's Sam McCandlish doing the demo (and the project).

KhoomeiK6y ago

I'm confused. Is that not you doing the OpenAI demo around 29:00?

BubRoss6y ago

Great, even more lag coming in the next version of visual studio.

grensley6y ago· 4 in thread

Wow, this has the ability to be a total gamechanger. You have to be really observant about the bugs though, I would have totally missed the one with the price discount without executing it.

netsec_burn6y ago

By lowering the barrier of entry of programming further, I wonder if we'll see more bugs (like the price discount) as a result of this?

colordrops6y ago

Similar problem to automated driving - as long as it's better than most humans, occasional bugs will be ok. Virtually no software is bug free.

bufferoverflow6y ago

You still need a programmer to find the bugs. I think it's actually harder to spot and fix a bug, than to write a simple method that involves one.

joshuak6y ago

symplee6y ago· 4 in thread

Can this freaky A.I. also generate the corresponding unit tests?

simonhughes226y ago

Towards the end of that section, he mentions they have also used it to generate unit tests. I doubt it's doing full TDD, but it seems they are part of the way there.

swiley6y ago

That actually sounds bad.

cjlovettOP6y ago

Great question, I actually think writing test code is harder than writing the product code.

pseudosudoer6y ago

At the end of the chat with OpenAI they mention that their model can be used for generating unit tests as well.

tanilama6y ago· 3 in thread

I mean it is cool.

But there is the thing, the natural description of a function is not always this unambiguous.

When you are telling a function to 'compute XYZ', what you are actually doing is 'check whether X.a exists, if so execute branch 1), else branch 2)'.

nerdponx6y ago

MiroF6y ago

I agree that code generation of complex functions is hard.

But I think the example given of unit testing - ie. natural language description of specific behavior of function -> code is extremely useful.

tanilama6y ago

Unit testing is a good use case.

Again I don't dispute this tool being interesting. But claims it to be ground breaking or game changing is simply not right.

Majority of programmers time, is not typing down the code. It is to look at the comment/description, think about it, edit some code, then rethink then edit again.

This tool has potential to solve some typing time, but it still not going to things fundamentally.

swalsh6y ago· 3 in thread

These are just baby steps, but holy shit is that impressive. It kind of feels like working with offshore devs, but it's in real time.

nnq6y ago

...that's mildly insulting

LAMike6y ago

Only if you assume what shore he's referring to.

29athrowaway6y ago

I've worked with developers from all around the globe. While it's true that some cannot even write fizzbuzz, some others can be extremely brilliant individuals with an excellent work ethic.

corbins6y ago· 3 in thread

Mirror: https://twitter.com/i/broadcasts/1OyKAYWPRrWKb

dang6y ago

Ok, we've changed to that from https://mybuild.microsoft.com/sessions/6c6ecd46-c39c-49d8-ba.... Thanks!

modeless6y ago

This sort of works: https://www.pscp.tv/Microsoft/1OyKAYWPRrWKb?t=29m19s

yread6y ago

Pretty amazing. Starts at around 28:00

datlife6y ago· 3 in thread

I can't see anything

dang6y ago

We've changed from https://mybuild.microsoft.com/sessions/6c6ecd46-c39c-49d8-ba... to a more accessible video.

spery6y ago

Video should be there but their site is frustrating. Both for visibility and reliability.

alchemyromcom6y ago

Maybe their AI can whip up something better soon.