Competitive Programming with AlphaCode (opens in new tab)

(deepmind.com)

678 pointsyigitdemirag4y ago397 comments

397 comments

217 comments · 51 top-level

qualudeheart4y ago· 29 in thread

Calling it now: If current language models can solve competitive programming at an average human level, we’re only a decade or less off from competitive programming being as solved as Go or Chess.

Deepmind or openAI will do it. If not them, it will be a Chinese research group on par with them.

I’ll be considering a new career. It will still be in computer science but it won’t be writing a lot of code. There’ll be several new career paths made possible by this technology as greater worker productivity makes possible greater specialization.

buscoquadnary4y ago

The problem is this view continues to view software engineers as people that write code, that's not what my job is, it is figuring out how to solve a business problem using technology, and getting people on board with that solution and updating and refining it.

This viewpoint seems to me to be very similar to the idea of 3rd generation languages replacing developers because programming will be so easy, it isn't about how easy it is to write code, I function as a limited mentat taking all the possible requirements, tradeoffs constraints, analyzing them and then building the model, then I write out the code, the code artifact is not the value I add. The artifact is how I communicate the value to the world.

This doesn't make programmers redundant anymore than Ruby, PHP, or Java made developers redundant because it freed them from having to manually remember and track memory usage and pointers, it is at most a tool to reduce the friction of getting what is in my head into the world.

I control the code and whoever controls the code controls the business. I posses the ability to make out the strands of flow control and see the future state of the application. For I am the Sr. Software engineer and I have seen where no Project Manager can see.

Apologies to Frank Herbet I just finished listening to Dune.

EDIT:

I got off track at the end but my point is that no matter how good the tools for developing the code are, they will never replace a software engineer anymore than electric drills and power saws replace home builders. It merely elevates our work.

qualudeheart4y ago

I actually agree with you on that. I had another comment further down the thread where I said that software engineering can’t be fully automated by anything short of artificial general intelligence.

As humans we have a coherent world model that current AI systems are nowhere near close to having.

That coherent world model is a necessary precondition for both understanding a business goal and implementing a program to solve it. AlphaCode can do the second part but not the first.

AlphaCode doesn’t have that world model and even if it did it still wouldn’t autonomously act on it, just follow orders from humans.

Competitive programming is going to be solved much earlier than programming in a business context will, because it’s completely independent of business requirements. It’s at most half as hard of a problem .

lugu4y ago

If I am given the ability to produce a program by formalizing the fuzzy requirements I am given, I will not hesitate to abuse this option. I can see a future where there is be a "market" for specifications to be composed together.

Analyzing the requirements is a hard problem when we do it with our brain. But our job would be very different if all we had to do it to write down the constraints, and press a button to see an error: invalid requirements, can't support this and that at the same time.

abecedarius4y ago

Three months ago in the Copilot thread I was saying

> in 5 years will there be an AI that's better than 90% of unassisted working programmers at solving new leetcode-type coding interview questions posed in natural language?

and getting pooh-poohed. https://news.ycombinator.com/item?id=29020401 (And writing that, I felt nervous that it might not be aggressive enough.)

There's this general bias in discussions of AI these days, that people forget that the advance they're pooh-poohing was dismissed in the same way as probably way off in the indefinite future, surprisingly recently.

hackinthebochs4y ago

The issue is these techniques are growing in capabilities exponentially, while we have a habit of extrapolating linearly. Some saw the glaring deficits in copilot then reasoned that linear improvements is still glaring deficits. I don't know that this bias can ever be corrected. A large number of intelligent people simply will never be convinced general AI is coming soon no matter what evidence is presented.

1 more reply

udev4y ago

Yes, for very precise, comprehensive text descriptions of problems.

It will take a far-far more advanced AI to write such descriptions for real-world problems.

Writing requirements for a project is difficult work, and not for technical reasons, but for human reasons (people don't know what they want exactly, people have trouble imagining things they haven't seen yet, people are irrational, people might want something that is different from what they need, etc.)

In this regard, we are safe for a few more decades at least.

tluyben24y ago

Yes, they have been trying to create 'sufficiently formal human readable text' to spec out projects; not detailed enough to execute by a computer but formal and precise enough so humans know exactly what they are getting. That still doesn't work at all and that is between humans. If the specs are clear enough, the act of programming is already mostly not the issue, however, they never are. I am looking forward to ML helping me writing boring code (which CoPilot already does, but again, that's not really where time/energy is spent anyway) and protect against security issues, scalability issues and all kinds of bugs (it could rewrite algo's it knows; it could recommend libraries that I should use instead of the crap I rolled myself etc).

qualudeheart4y ago

Fully automating software engineering won’t happen until AGI. As a good Yuddite I expect us to have bigger problems when that happens.

You need an agent with a large and coherent world model, in order to understand how your programs relate to the real world, in order to solve business tasks.

This isn’t something any program synthesis tech currently available can do, because none of it has a coherent world model.

GPT-3 comes closest to this, but isn’t able to engage in any kind of planning or abstract modeling, beyond semi coherent extrapolations from training data.

Maybe scaling up GPT by a few more orders of magnitude would work, by generating an emergent world model along the way.

1 more reply

andy_ppp4y ago

I would actually argue the programmers job has never been 100% writing the code, it’s always been interpreting, fixing and decoding the ideas of others.

2 more replies

f38zf5vdt4y ago

A programming genie that grants programming wishes to the general public. Since most of what I do on a daily basis is engineering solutions based on tradeoffs, I can only imagine the number of programmers needed to debug solutions given by the programming genie in response to poorly described feature requests.

If we become mechanics of the software AI vehicles of the future, so be it.

keewee74y ago

AI is being aggressively applied to areas where AI practitioners are domain experts. Think programming, data analysis etc.

Programmers and data scientists might find ourselves among the first half of knowledge workers to be replaced and not among the last as we previously thought.

Der_Einzige4y ago

I'm already anticipating having the job title of "Query Engineer" sometime in the next 30 years, and I do NLP including large scale language model training. :(

qualudeheart4y ago

One of the big venture capitalists predicted “prompt engineering” as a future high paid and high status position.

Essentially handling large language models.

Early prompt engineers will probably be drawn from “data science” communities and will be similarly high status, well but not as well paid, and require less mathematical knowledge.

I’m personally expecting an “Alignment Engineer” role monitoring AI systems for unwanted behavior.

This will be structurally similar to current cyber security roles but mostly recruited from Machine Learning communities, and embedded in a broader ML ecosystem.

2 more replies

zerr4y ago

The thing is, Competitive Programming (CP) is a completely different discipline/subject with its own trivia knowledge and tricks. CP uses Computer Science the same way as e.g. Biology uses Mathematics. It has very little in common with a real world software development.

qualudeheart4y ago

I said as much in another comment.

Automating the software development profession proper is going to be much harder and will require autonomous agents with coherent world models, because that’s what you need to act in a business context.

Jensson4y ago

This is in line with what other code generation AI's have accomplished.

To reach average level at codeforces you need to be able to apply a standard operation like a sort, or apply a standard math formula, as the first 1-2 problems in the easy contests are just that. It is impressive that they managed to get this result in real contests with real unaltered questions and see that it works. But generalizing this to harder problems isn't as easy, as there you need to start to device original algorithms instead of just applying standard algorithms, for such problems the model needs to understand computer science instead of just mapping language to algorithms.

phendrenad24y ago

Calling it now: Your prediction is off by an order of magnitude or two (10 years -> 100 years, or 1000 years)

muds4y ago

It can be really tempting to think about research progression on a "linear" timescale but more often than not it eventually ends up following an "exponential" curve because of technical debt. And there appears to be a _lot_ of techniques used here which we don't fully understand.

I wouldn't be surprised if a specifically engineered system ten years from now wins an ICPC gold medal but I'm pretty sure that a general purpose specification -> code synthesizer that would actually threaten software engineering would require us to settle a lot of technical debts first -- especially in the area of verifying code/text generation using large language models.

simpleguitar4y ago

It doesn't even have to be average human.

Let's say AI only gets to 10% (or 20% or 30% or whatever, it doesn't really matter), that's a huge number of jobs being lost.

Imagine having a machine write all the "simple/boring" code for you. Your productivity will go through the roof. The smartest programmer who can most effectively leverage the machine could replace many hundreds of programmers.

I should brush up on my plumbing and apply for a plumbing license soon. (I think plumbing is safer than electricians, because many CS people have good EE foundations).

csee4y ago

You're extrapolating across very different types of problems. Go and Chess have unlimited training data. Competitive programming does not.

raphlinus4y ago

To me, that's actually one of the more interesting questions. It's possible to grade the output of the AI against objective criteria, like does it run, and resources consumed (RAM, CPU time, and, particularly of interest to me, parallel scaling, as GPU algorithms are too hard for most programmers). To what extent can you keep training by having the AI generate better and better solutions to a relatively smaller input pool of problems? I skimmed the paper to see how much they relied on this but didn't get a clear read.

lugu4y ago

Depending on what you want to do, you can either choose an industry with very fuzzy requirements (to stay near the programming side) or one with very complex but with strict requirements (to benefit from those coding robots). I guess we will need simulators for most of what we do in order to train those robots.

ww5204y ago

Didn’t we all (collectively) have this discussion the last time someone put the math functions in a library and rendered math calculation programmers obsolete?

solididiot4y ago

>> There’ll be several new career paths made possible by this technology as greater worker productivity makes possible greater specialization.

Can you list a few?

pkaye4y ago

How long before it can write the code without plagiarizing code from online?

falcor844y ago

How long before the typical human coder can do so?

1 more reply

stnmtn4y ago

Humans study CS for 5 years, reading code from online to be able to solve these problems.

EVa5I7bHFq9mnYK4y ago

Don't worry, there are a lot of much simpler jobs, like drivers or cashiers that will surrender to AI before coder's job does. So UBI will be implemented long before that happens.

solididiot4y ago

I wouldn't be so sure. Programmers (and drivers and cashiers) can "survive" in poverty like millions others already do. This transformation is coming in waves that keep the proverbial frog in the pan.

hmate94y ago· 22 in thread

Between this and OpenAI's Github Copilot "programming" will slowly start dying probably. What I mean by that is that sure, you have to learn how to program, but our time will be spent much more on just the design part and writing detailed documentation/specs and then we just have one of these AIs generate the code.

It's the next step. Binary code < assembly < C < Python < AlphaCode

Historically its always been about abstracting and writing less code to do more.

streetcat14y ago

First, If this is correct, if alpha code succeeded, this will bring to its own demise.

I.e. as soon as it starts replacing humans, it will not have enough human generated training data, since all of programming will be done by models like himself.

Second, alphacode was specifically trained for competitive programming :

1. short programs. 2. Each program has 100's of human generated solutions.

However, commercial program are:

1. long. 2. Have no predefined answer or even correct answer. 3. Need to use/reuse a lot of legacy code.

AnIdiotOnTheNet4y ago

> as soon as it starts replacing humans, it will not have enough human generated training data, since all of programming will be done by models like himself.

As a natural born pessimist, I can't help but feel that by the time we get to that point we'll just keep blundering forward and adapting our world around the wild nonsense garbage code the model ends up producing in this scenario.

After all, that's basically what we've done with the entire web stack.

chroem-4y ago

Reinforcement learning and adversarial training can render both of those concerns as non-issues in practice.

1 more reply

diehunde4y ago

My bet would be that it will never happen in a reasonable time frame. And also by that logic, writing that "documentation/spec" would just mean learning a new programming language the AI engine can parse making it as useful as a compiler. Anyone who has been writing and designing software for a while knows the cycle is way more complex than take some input and write code.

Let me know when the AI engine is able to do complex refactoring or adding features that keeps backwards compatibility, find a bug in a giant codebase by debugging a test case or write code that's performant but also maintainable.

Enginerrrd4y ago

I agree, from a totally different angle. Let's take something I know better as an example: Structural engineering. Structural engineering should be a "solved problem". It seems, ostensibly, relatively simple compared to a more open ended activity like "programming".(For "technical reasons", it ends up being more similar than you might think.) Still, you are ultimately dealing with the same materials, the same physics, and very similar configurations.

And yet, despite the fact that we have programs to help calculate all the things, test code-required load-combinations, even run simulations and size individual components... it turns out that, it doesn't actually save that much work, and you still need an engineer to do most of it. And not just because of regulatory requirements. It's just, that's not the hard part. The hard part is assembling the components and specifications, specifying the correct loads based on location-specific circumstances, coming up with coherent and sensible design ideas, chasing down every possible creative nook and cranny of code to make something that was originally a mistake actually work, and know when the model is just wrong for some reason and the computer isn't simulating load paths accurately.

Specifying the inputs and interpreting results is still about as much work as it was before you started with all the fancy tools. Those tools still have advantages mind you, and they do make one slightly more efficient. Substantially so in some cases, but most of the time it still comes out as a slight assist rather than a major automation.

1 more reply

fvold4y ago

I hear that.

Machine Learning also has a long way to go before it can take a long, rambling mess of a meeting and somehow generate a halfway usable spec from it. I mean, the customer says they want X, but X is silly in this context, so we'll give them Y and tell them it's "X-like, but faster". For example, SQL is "Blockchain-like, but faster" for a lot of buzzword use-cases of blockchain.

ctoth4y ago

You ever notice how the "let me know when" part of this keeps changing? Let me know when computers can ... play Go/understand a sentence/compose music/write a program/ ...

But surely they'll never be able to do this new reference class you have just now come up with, right?

2 more replies

YeGoblynQueenne4y ago

Possibly interesting trivium: automated debugging was first described in 1982, in Ehud Shapiro's PhD thesis titled "Algorithmic Program Debugging" (it's what it sounds like and it can also generate programs by "correcting" an empty program):

https://en.wikipedia.org/wiki/Algorithmic_program_debugging

Of course all this targeted only Prolog programs so it's not well-known at all.

1 more reply

wittycardio4y ago

Solving competitive programming problems is essentially solving hard combinatorial optimization problems. Throwing a massive amount of compute and gradient descent at the problem has always been possible. If I'm not mistaken what this does is reduce the representation of the problem to a state where it can run gradient descent and then tune parameters. The real magic is in finding structurally new approaches. If anything I'd say algorithms and math continue to be the core of programming. The particular syntax or level of abstraction don't matter so much.

chroem-4y ago

> Solving competitive programming problems is essentially solving hard combinatorial optimization problems.

True, but if you relax your hard requirements of optimality to admit "good enough" solutions, you can use heuristic approaches that are much more tractable. High quality heuristic solutions to NP-hard problems, enabled by ML, are going to be a big topic over the next decade, I think.

1 more reply

jdlshore4y ago

> If anything I'd say algorithms and math continue to be the core of programming.

I disagree; I think the core of programming is analyzing things people want and expressing solutions to those wants clearly, unambiguously, and in a way that is easy to change in the future. I'd say algorithms and math are a very small part of this work.

1 more reply

mhzsh4y ago

Creating a higher level abstraction is something people have been trying to do for decades with so-called 4th-generation languages. At some point, abstracting away too much makes a tool too cookie-cutter, and suddenly deviating from it causes more difficulty.

visarga4y ago

Maybe it's not more abstraction we need, just automating the drudgery. Abstractions are limited - by definition they abstract things away, they are brittle.

vvilliamperez4y ago

Read: Ruby on Rails

pjmorris4y ago

I'd note that assembly, C, and Python didn't replace 'programming' but were expected to do so. I'd wager that what you now call 'detailed documentation/specs' will still be called programming in 10 or even 20 years.

falcor844y ago

If you could change a sentence in the documentation and then run a ~1min compilation to see the resulting software, it would be a very different kind of programming. I suppose it'll give a new meaning to Readme-Driven-Development.

629514134y ago

Model-driven development and code generation from UML were once supposed to be the future. It will be interesting to see how much further this approach takes us.

Assuming ANNs resemble the way human brain function you'd also expect them to introduce bugs. And so the actual humans beings would partake in debugging too.

Inufu4y ago

I agree, I expect programmers will just move up the levels of abstraction. I enjoyed this recent blog post on the topic: https://eli.thegreenplace.net/2022/asimov-programming-and-th...

hackinthebochs4y ago

The "problem" is that as you move up the levels of abstraction, you need fewer people to do the same amount of work. Unless the complexity of the work scales as well. I've always felt that programmers would be the first class of knowledge workers to be put out of work by automation. This may be the beginning of the end for the programming gravy train.

6 more replies

bmc75054y ago

I disagree that programming is dying -- tools like Copilot will lead to a Renaissance in the art of computer programming by enabling a larger population to design programs and explore the implications of their design choices. I wrote a short essay [1] on the history automated programming and where I think it is heading in the future.

[1]: https://breandan.net/public/programming_with_intelligent_mac...

qualudeheart4y ago

You also get to specialize harder. You’ll be able to move into more advanced programming styles. I’m thinking of formally verifiable C/C++ programs for safety critical applications, and code using advanced concepts from programming language theory.

The programming languages of the future are going to make Rust look like Python. That’ll be in part because you as an individual programmer aren’t weighed down by as much boilerplate as you were pre-copilot, pre-alphacode and pre- the more advanced coding assistants of the future.

elwell4y ago

> writing detailed documentation/specs

That's what code is.

FiberBundle4y ago· 15 in thread

It never ceases to amaze me what you can do with these transformer models. They created millions of potential solutions for each problem, used the provided examples for the problems to filter out 99% of incorrect solutions and then applied some more heuristics and the 10 available submissions to try to find a solution.

All these approaches just seem like brute-force approaches: Let's just throw our transformer on this problem and see if we can get anything useful out of this.

Whatever it is, you can't deny that these unsupervised models learn some semantic representations, but we have no clue at all what that actually is and how these model learn that. But I'm also very sceptical that you can actually get anywhere close to human (expert) capability in any sufficiently complex domain by using this approach.

noduerme4y ago

>> filter out 99% of incorrect solutions

And next year they can filter out 99.99%. And the year after that, 99.9999%. So literally, an exponentially greater number of monkey/typewriting units. (An AI produced Shakespeare play coming soon).

>> we have no clue at all what that actually is and how these model learn

This is why I'm super cool-to-cold about the AI/deep learning classes being sold to young people who would otherwise be learning fundamental programming skills. It appears to me like trying to teach someone to ride a horse before they understand what skin, bones, muscles, animals, and horses are.

>>get anywhere close to human (expert) capability in any sufficiently complex domain

You can get close enough to scalp a lot of billionaires, but at the end of the day it's always going to be human coders banging our heads against management, where they ask for shit they can't visualize and it's our job to visualize how their employees/customers will use it. Yes it involves domain specific knowledge, but it also requires, er, having eyeballs and fingers, and understanding how a biological organism uses a silicon-based device. That's kind of the ultimate DS knowledge, after all. Now, lots of coders just copy-pasta a front end, but after all the hooplah here I'd be extremely surprised if in ten years an AI has caught up to your basic web mill in Indonesia when it comes to building a decent website.

TOMDM4y ago

Surely if your discrimintator gets orders of magnitude better like your describing, we could train the transformer GAN style, and reduce the dependence on generating so many examples to throw away.

parentheses4y ago

i like that you drew a connection with monkeys on typewriters.

briga4y ago

Another way to frame it is that these models still perform very poorly at the task they're designed to do. Imagine if real programmer needed to write a solution a hundred times before they were able to achieve (average) performance. You'd probably wonder if it was just blind luck that got them to the solution. You'd also fire them. What these models are very good at doing is plagiarizing content, so part of me wonders if they aren't just copying previous solutions with slight adjustments.

Vetch4y ago

> Imagine if real programmer needed to write a solution a hundred times

To be fair, a lot of creative work requires plenty of trial and error. And since no problems are solved from scratch, all things considered, the most immediate contributors to your result and you might have iterated through tens of dozens of possibilities.

My advantage as a human is I can often tell you why I am eliminating this branch of the search space. The catch is my reasoning can be flawed. But we do ok.

> just copying previous solutions with slight adjustments.

It's not just doing that, Copilot can do a workable job providing suggestions for an invented DSL. A better analogy than autocomplete is inpainting missing or corrupted details based on a surrounding context. Except instead of a painting we are probabilistically filling in patterns common in solutions to leetcode style problems. Novelty beyond slight adjustments comes in when constraints are insufficient to pin down a problem to a known combination of concepts. The intelligence of the model is then how appropriate its best guesses are.

The limitations to GPT3 codex and AlphaCode seems to be they're relatively weak at selection and that they require problem spaces with enough data to distill a sketch of and how to inpaint well in them. Leetcode style puzzles are constructed to be soluble in a reasonable number of lines, are not open ended and have a trick to them. One can complain that while we're closer to real world utility, we're still restricted to the closed worlds of verbose apis, games and puzzles.

While lots of commenters seem concerned about jobs, I look forward to having the dataset oliphaunt and ship computer from Fire Upon Deep someday soon.

2 more replies

plutonorm4y ago

How do you know the inner workings of the mind don't operate in a similar manner? How many different solutions to the problem are constructed within your mind before the correct one 'just arrives'?

1 more reply

faizshah4y ago

I was really impressed with a lot of the GPT3 stuff I had seen people showing so I gave it a spin myself. I was surprised by how repetitive it seemed to be, it would write new sentences but it would repeat the same concepts among similar prompts. I wish I saved the examples, it was like when a chat bot gets in a loop but GPT3 varied the sentence structure. I think that if you look closely at transformer models outputs you can expect the same sort of thing. Its like in high school when people would copy homework but use different wording.

I also think generally in ML and DL the overarching progress gets hyped but in the background there are murmurs about the limitations in the research community. Thats how we end up with people in 2012 saying FSD is a couple years away but in 2022 we know we aren't even close yet. We tend to oversell how capable these systems are.

1 more reply

MattRix4y ago

They specifically stated that they tested it on 10 challenges that were newer than their training data, so it couldn’t just be plagiarizing content.

bricemo4y ago

What do you think then is the difference between going from 50th to 99.9th percentile in their other domains? Is there something materially different between ago, protein folding, or coding? (I don’t know the answer, just curious if anyone else does)

YeGoblynQueenne4y ago

>> What do you think then is the difference between going from 50th to 99.9th percentile in their other domains? Is there something materially different between ago, protein folding, or coding?

Yes, it's the size of the search space for each problem. The search space for arbitrary programs in a language with Universal Turing Machine expressivity is infinite. Even worse, for any programming problem there are an infinite number of candidate programs that may or may not solve it and that differ in only minute ways from each other.

For Go and protein structure prediction from sequences the search space is finite, although obviously not small. So there is a huge difference in the complexity of the problems right there.

Btw, I note yet again that AlphaCode performs abysmally badly on the formal benchmark included in the arxiv preprint (see Section 5.4, and table 10). That makes sense because AlphaCode is a very dumb generate-and-test, brute-force search approach that doesn't even try to be smart and tries to make up for the lack of intelligence with an awesome amount of computational resources. Most work in program synthesis is also basically a search through the space of programs, but people in the field have come up with sophisticated techniques to avoid having to search an infinite number of programs- and to avoid having to generate millions of program candidates, like DeepMind actually brags about:

At evaluation time, we create a massive amount of C++ and Python programs for each problem, orders of magnitude larger than previous work.

They say that as if generating "orders of magnitude more" progams than previous work is a good thing, but it's not. It means their system is extremely bad at generating correct programs. It is orders of magnitude worse than earlier systems, in fact.

(The arxiv paper linked from the article quantifies this "massive" amount as "millions"; see Section 4.4).

1 more reply

FiberBundle4y ago

Well with respect to Go the fundamental difference afaict is that you can apply self-supervised learning, which is an incredibly powerful approach (But note e.g. that even this approach wasn't successful in "solving" Starcraft). Unfortunately it's extremely difficult to frame real-world problems in that setting. I don't know anything about protein-folding and don't know what Deepmind uses to try to solve that problem, so I cannot comment on that.

2 more replies

jahewson4y ago

That’s a big question but I’m tempted to answer it with a yes. A protein sequence contains a complete description of the structure of a protein but a coding question contains unknowns and the answers contain subjective variability.

derangedHorse4y ago

We have a clue as to what it is (these are just functions at the end of the day) but don't know how the model's learned parameters relate to the problem domain. I saw a talk (maybe of Jeff Dean?) a while back that discussed creating models that could explain why certain features weighed more than others. Maybe with more approaches targeted towards understanding, these algorithms could start to seem less and less like a semantically opaque computational exercise, and more in line with how we humans think about things.

mikesabbagh4y ago

github autopilot scares me every time I write code on my personal pc and get those auto-suggestions. I am happy we dont have it at work yet.

It is clear writing code will soon be something of the past; maybe it is a bad idea to train our children to code. Let's make sure we milk every penny before the party is over!

evouga4y ago

Maybe… maybe… tools like Copilot will allow us to work at a higher level of abstraction (like optimizing compilers have allowed us to do).

I say maybe because so far the code that Copilot has generated for me has been impressive for what it is, but riddled with obvious and subtle bugs. It’s like outsourcing my function implementations to a C-student undergraduate intern. I definitely wouldn’t use any of its code without close scrutiny.

AI will make some software engineering tasks more efficient and more accessible but human programmers are not going anywhere any time this side of the Singularity.

msoad4y ago· 15 in thread

This seems to have a narrower scope than GitHub Copilot. It generates more lines of code to a more holistic problem vs. GitHub Copilot that works as a "more advanced autocomplete" in code editors. Sure Copilot can synthesize full functions and classes but for me, it's the most useful when it suggests another test case's title or writes repetitive code like this.foo = foo; this.bar = bar etc...

Having used Copilot I can assure you that this technology won't replace you as a programmer but it will make your job easier by doing things that programmers don't like to do as much like writing tests and comments.

stupidcar4y ago

Having used Copilot for a while, I am quite certain it will replace me as a programmer.

It appears to me that when it comes to language models, intelligence = experience * context. Where experience is the amount what's encoded in the model, and context is the prompt. And the biggest limitation on Copilot currently is context. It behaves as an "advanced autocomplete" because it all is has to go on is what regular autocomplete sees, e.g. the last few characters and lines of code.

So, you can write a function name called createUserInDB() and it will attempt to complete it for you. But how does it know what DB technology you're using? Or what your user record looks like? It doesn't, and so you typically end up with a "generic" looking function using the most common DB tech and naming conventions for your language of choice.

But now imagine a future version of Copilot that is automatically provided with a lot more context. It also gets fed a list of your dependencies, from which it can derive which DB library you're using. It gets any locatable SQL schema file, so it can determine the columns in the user table. It gets the text of the Jira ticket, so it can determine the requirements.

As a programmer a great deal of time is spent checking these different sources and synthesising them in your head into an approach, which you then code. But they are all just text, of one form or another, and language models can work with them just as easily, and much faster, than you can.

And one the ML train coding gets running, it'll only get faster. Sooner or later Github will have a "Copilot bot" that can automatically make a stab at fixing issues, which you then approve, reject, or fix. And as thousands of these issues pile up, the training set will get bigger, and the model will get better. Sooner or later it'll be possible to create a repo, start filing issues, and rely on the bot to implement everything.

karmasimida4y ago

Copilot is cool and all.

I didn't find reading largely correct but still often wrong code is a good experience for me, or it adds up any efficiency.

It does do a very good job in intelligently synthesize boilerplate for you, but be Copilot or this AlphaCode, they still don't understand the coding fundamentals, in the sense causatively, what would one instruction impact the space of states.

Still, those are exciting technology, but again, there is a big if whether such machine learning model would happen at all.

solarmist4y ago

I'm skeptical it'll replace programmers, as in no more human programmers, but agree in the sense 100% human programmers -> 50%, 25%, 10% human programmers + computers doing most of the writing of actual code.

I see it continuing to evolve and becoming a far superior auto-complete with full context, but, short of actual general AI, there will always be a step that takes a high-level description of a problem and turns it into something a computer can implement.

So while it will make the remaining programmers MUCH more productive, thereby reducing the needed number of programmers, I can't see it driving that number to zero.

1 more reply

TSiege4y ago

I have a feeling this is the correct read in terms of progression. But I'm skeptical if it'll ever be able to synthesize a program entirely. I imagine that in the future we'll have some sort of computer language more like written language that will be used by some sort of AI to generate software to meet certain demands, but might need some manual connections when requirements are hazy or needs a more human touch in the UI/UX

1 more reply

Hgsb4y ago

Google Ambiguity.

chongli4y ago

repetitive code like this.foo = foo; this.bar = bar etc...

This sort of boilerplate code is best solved by the programming language. Either via better built-in syntax or macros. Using an advanced machine learning model to generate this code is both error-prone and a big source of noise and code bloat. This is not an issue that will go away with better tooling; it will only get worse.

xmprt4y ago

I don't think I agree. Most people spend more time reading than writing code so programming languages should be optimized to be easier to read whereas tooling should be made to simplify writing code. New syntax or macros sounds like it would make the language harder to read. I agree that an advanced machine learning model for generating boilerplate code isn't the right approach but I also don't think we should extend languages for this. Tooling like code generators and linters are a good middle ground.

2 more replies

valyagolev4y ago

it is a very similar argument to the one for powerful IDEs and underwhelming languages. to be fair, it’s not necessarily fruitless - e.g. with smalltalk. i fail to see the analoguous smalltalk-style empowerment of language using AI but perhaps something is there.

anyway. programming is automation; automation of programming is abstraction. using AI to write your code is just a bad abstraction - we are used to them

jxcole4y ago

I feel like you are very defensive here and I want to be sure we take time to recognize this as a real accomplishment.

Seriously though, I do doubt I can be fully replaced by a robot any time soon, it may be the case that soon enough I can make high-level written descriptions of programs and hand them off to an AI to do most of the work. This wouldn't completely replace me, but it could make developers 50x productive. The question is how elastic is the market...can the market grow in step with our increase in productivitiy?

Also, please remember that as with anything, within 5 years we should see vast improvements to this AI. I think it will be an important thing to watch.

nsxwolf4y ago

Yesterday, I spent several hours figuring out if the business requirement for "within the next 3 days" meant 3 calendar days or 72 hours from now. Then about 10 minutes actually writing the code. Everyone thought my efforts were very valuable.

1 more reply

visarga4y ago

The GPT family has already shown more than 50x productivity increase by being able to solve not one, but hundreds and perhaps thousands of tasks on the same model. We used to need much more data, and the model would be more fragile, and finding the right architecture would be a problem. Now we plug a transformer with a handful of samples and it works.

I just hope LMs will prove to be just as useful in software development as they are in their own field.

0xdeadbeefbabe4y ago

> but it could make developers 50x productive

More likely it will translate the abstraction level by some vector of 50 elements.

thomasahle4y ago

If you make developers 50x more efficient, won't you need 50x fewer developers?

4 more replies

sharemywin4y ago

To me it's not about it's current capabilities. It's the trajectory. This tech wasn't even a thing 2 years ago. There's billions being poured into it and every time someone uses these tools there's more free training data.

ipnon4y ago

The big question seems to be whether par with professional programmers is a matter of increasing training set and flop size, or whether different model or multi-model architectures are required.

It does look like we've entered an era where programmers who don't use AI assistants will be disadvantaged, and that this era has an expiration date.

FemmeAndroid4y ago· 14 in thread

This is extremely impressive, but I do think it’s worth noting that these two things were provided:

- a very well defined problem. (One of the things I like about competitive programming and the like is just getting to implement a clearly articulated problem, not something I experience on most days.) - existing test data.

This is definitely a great accomplishment, but I think those two features of competitive programming are notably different than my experience of daily programming. I don’t mean to suggest these will always be limitations of this kind of technology, though.

e4e78a064y ago

I don't think it's quite as impressive as you make it out to be. Median performance in a Codeforces programming competition is solving the easiest 1-2 problems out of 5-6 problems. Like all things programming the top 1% is much, much better than the median.

There's also the open problem of verifying correctness in solutions and providing some sort of flag when the model is not confident in its correctness. I give it another 5 years in the optimistic case before AlphaCode can reliably compete at the top 1% level.

ctoth4y ago

This is technology that simply didn't exist in any form 2 years ago. For no amount of money could you buy a program that did what this one does. Having been watching the growth of Transformer-based models for a couple years now really has hammered home that just as soon as we figure out how an AI can do X, X is no longer AI, or at least no longer impressive. How this happens is with comments like yours, and I'd really like to push back against it for once. Also 5 years? So assuming that we have all of the future ahead of us, to think that we only have 5 years left of being the top in programming competitions seems like it's somehow important and shouldn't be dismissed with "I don't think it's quite as impressive as you make it out to be."

2 more replies

xorcist4y ago

You don't think it's impressive, yet you surmise that a computer program could compete at a level of the top 1% of all humans in five years?

That's wildly overstating the promise of this technology, and I'd be very surprised if the authors of this wouldn't agree.

1 more reply

Jensson4y ago

Top 1% competitive programming level means that it can start solving research problems, problem difficulty and creativity needed for problems goes up exponentially for harder problems and programming contests have lead to research papers before. It would be cool if we got there in 5 years but I doubt it. But if we got there it would revolutionize so many things in society.

Groxx4y ago

I do kinda wonder if it'd lead to as good results if you just did a standard "matches the most terms the most times" search against all of github.

I have a suspicion it would - kinda like Stack Overflow, problems/solutions are not that different "in the small". It'd have almost certainly given us the fast square root trick verbatim, like Github's AI is doing routinely.

thomasahle4y ago

Can't rule it out, but if Alphacode gets to top 1% in five years, that's when it can basically do algorithms research. We can ask it to come up with new algorithms for all the famous problems and then just have to try and understand it's solutions :O

jakub_g4y ago

100% agree. Someone (who?) had to take time and write the detailed requirements. In real jobs you rarely get good tickets with well defined expectations; it's one of most important developer's jobs to transform fuzzy requirement into a good ticket.

(Side note: I find that many people skip this step, and go straight from fuzzy-requirement-only-discussed-on-zoom-with-Bob to code; open a pull request without much context or comments; and then a code reviewer is supposed to review it properly without really knowing what problem is actually being solved, and whether the code is solving a proper problem at all).

ctoth4y ago

So what happens when OpenAI releases TicketFixer 0.8 which synthesizes everything from transcripts of your meetings to the comments to the JIRA ticket to the existing codebase and spits out better tickets to feed into the programming side?

2 more replies

ohwellhere4y ago

Is the next step in the evolution of programming having the programmer become the specifier?

Fuzzy business requirements -> programmer specifies and writes tests -> AI codes

2 more replies

jensensbutton4y ago

Maybe the problem transformation will be both the beginning _and_ end of the developer's role.

machiaweliczny4y ago

But it's easy to create AI conversation that will refine problem.

baobabKoodaa4y ago

> One of the things I like about competitive programming and the like is just getting to implement a clearly articulated problem

English versions of Codeforces problems may be well-defined but they are often very badly articulated and easy to misunderstand as a human reader. I still can't understand how they got AI to be able to generate plausible solutions from these problem statements.

zbobet20124y ago

They used the tests. The specification being very approximate is fine, because they had a prebuilt way to "check" if their result was good.

1 more reply

elb20204y ago

I think they will always be limitations.

Software is, ultimately, always about humans. Software is always there to serve a human need. And the "intelligence" that designs software will always, at some level, need to be intelligence that understands the human mind, with all it's knowledge, needs, and intricacies. There are no shortcuts to this.

So, I think AI as a replacement for software development professionals, that's currently more like a pipe dream. I think AI will give us powerful new tools, but I do not think it will replace, or even reduce, the need for software development professionals. In total it might even increase the need for software development professionals, because it adds another level to the development stack. Another level of abstraction, and another level of complexity that needs to be understood.

37ef_ced34y ago· 12 in thread

The example problem (essentially, is T a subsequence of S with deletions of size N) is a classic problem with no doubt dozens of implementations in AlphaCode's training set.

And yet, what a garbage solution it produces.

To illustrate the difference between intelligence and regurgitation, someone tell me what CoPilot generates for this:

  // A Go function to swap the sixth bit and seventeenth bit of a 32-bit signed integer.

Here is a human solution:

  func swap(x int32) int32 {
      const mask = 1 << 5
      var (
          xor1 = (x>>11 ^ x) & mask
          xor2 = xor1 << 11
      )
      return x ^ xor1 ^ xor2
  }

CoPilot cannot reason numerically like this (understand "seventeenth bit" and "sixth bit" and generate the right code for that combination). It needs to understand the size of the gap between the bits, i.e., 11, and that's too hard.

quenix4y ago

Here is what CoPilot wrote in Python:

  # A function to swap the sixth bit and seventeenth bit of a 32-bit signed integer.
  def swap_bits(x):
      # Get the value of the sixth bit.
      bit6 = x & (1 << 5)
  
      # Get the value of the seventeenth bit.
      bit17 = x & (1 << 16)
  
      # Swap the values of the sixth and seventeenth bit.
      bit6 = bit6 >> 5
      bit17 = bit17 >> 16

      # Combine the swapped values of the sixth and seventeenth bit.
      x = x ^ (bit6 << 16)
      x = x ^ (bit17 << 5)

      return x

It actually synthesised about 10 viable solutions. Some interesting ones:

  def swap_six_seventeen(x):
      # Get the binary representation of the integer.
      binary = bin(x)[2:]
      # Add zeros to the beginning of the binary representation.
      binary = '0' * (32 - len(binary)) + binary
      # Swap the sixth and seventeenth bit.
      binary = binary[:5] + binary[17] + binary[5:17] + binary[18:]
      # Convert the binary back to an integer.
      return int(binary, 2)

omnicognate4y ago

The first one (swap_bits) sets both bits to the same value, which is the original two bits XORed together. Eg.

    bin(swap_bits(0b_1_0000000000_0_00000))
    '0b10000000000100000'

    bin(swap_bits(0b_0_0000000000_1_00000))
    '0b10000000000100000'

    bin(swap_bits(0b_1_0000000000_1_00000))
    '0b0'

    bin(swap_bits(0b_0_0000000000_0_00000))
    '0b0'

The second one converts the value to a string and uses string operations, which is wildly inefficient and a very common mistake made by inexperienced programmers unaware of bitwise operations (so presumably common in the training set). It also attempts to swap the 6th and 17th most significant bits rather than the 6th and 17th least significant bits, i.e. counts in the opposite direction to the first one (the comment doesn't specify but typically you count from the least significant bit in these situations).

Worse, though, it gets the string manipulation completely wrong. I think it's trying for `binary[:5] + binary[16] + binary[6:16] + binary[5] + binary[17:]`, i.e. characters 1-5, then character 17, then characters 7-16, then character 6, then characters 18-32. The manipulation it does just completely mangles the string.

I'm very keen to try Github Copilot if they ever admit me to the beta (I've been waiting forever) and will adopt it enthusiastically if it's useful. However, this is exactly what I've pessimistically expected. Analysing these truly awful implementations to identify the subtle and bizarre misbehaviours has taken me far, far longer than it would have taken me to just write and test a working implementation myself. And I'm supposed to evaluate 10 of these to see if one of them might possibly do the right thing?!?!

1 more reply

37ef_ced34y ago

All wrong. These "solutions" are not correct.

UncleMeat4y ago

The xors are wrong. Stick in the integer 1<<16.

c4wrd4y ago

Ask and you shall receive, here is the output it gave me: // Write a Go function to swap the sixth bit and seventeenth bit of a 32-bit signed integer.

    package main
    
    import (
     "fmt"
     "math"
    )
    
    func main() {
     var a int32 = 0b1010101010101010101010101010101010101010101010101010101010101010
     fmt.Printf("%b\n", a)
     fmt.Printf("%b\n", swapBits(a, 6, 17))
    }
    
    func swapBits(a int32, i int, j int) int32 {
     // convert to binary
     bin := fmt.Sprintf("%b", a)
     // get the bits
     bit1 := bin[i-1 : i]
     bit2 := bin[j-1 : j]
     // swap the bits
     bin = bin[:i-1] + bit2 + bin[i:]
     bin = bin[:j-1] + bit1 + bin[j:]
     // convert back to int
     return int32(bin2int(bin))
    }
    
    func bin2int(bin string) int64 {
     var sum int64
     for i, v := range bin {
      if v == '1' {
       sum += int64(math.Pow(2, float64(len(bin)-i-1)))
      }
     }
     return sum
    }

37ef_ced34y ago

Ridiculous. It's a joke, right? Perhaps the most inefficient and naive solution ever?

CoPilot is regurgitating some solution from its training set, the solution of an inept programmer who would manipulate bits via conversion to string... yikes.

1 more reply

deanmen4y ago

You can do it without a subtraction

     unsigned int swapbits(unsigned int a) {
     bool bit6 = a & (1 << 5); bool bit17 = a & (1 << 16); 
    if (bit6 == bit17) return a; //bits are the same, do nothing
     return (a ^ (1 << 5) ^ (1 << 16)); 
     // flip both 6th and 17th bits }

37ef_ced34y ago

And, to be clear, this is a human solution.

Not as efficient as mine, but kudos.

1 more reply

altcognito4y ago

Would we be able to generate unit tests? Strikes me that this would be important to verify given that we didn't even "write" the code. At some point we might not even be looking at the generated code? I almost guarantee that's what is going to happen eventually.

37ef_ced34y ago

You can see it happening already.

Solutions are posted, and they're wrong.

But the CoPilot user can't see the code is wrong.

dskloet4y ago

There's really no need for an 11 in the code. I'd say that makes the code worse, not better.

37ef_ced34y ago

This is a toy problem to illustrate that CoPilot cannot write code that requires mathematical reasoning. It regurgitates solutions from the training set, via a mixed internal reresentation.

2 more replies

doctor_eval4y ago· 10 in thread

I sometimes read these and wonder if I need to retrain. At my age, I’ll struggle to get a job at a similar level in a new industry.

And then I remember that the thing I bring to the table is the ability to turn domain knowledge into code.

Being able to do competitive coding challenges is impressive, but a very large segment of software engineering is about eliciting what the squishy humans in management actually want, putting it into code, and discovering as quickly as possible that it’s not what they really wanted after all.

It’s going to take a sufficiently long time for AI to take over management that I don’t think oldies like me need to worry too much.

atleta4y ago

The thing is that we don't know. What I also have been seeing for a while (like for at least for a decade) that whatever profession seemed to be in danger, whichever profession came out on top on (guess) lists like "these will be replaced by AI soon", each and every one of them thought that it can't happen to them and they all had (and continue to have) explanations, usually involving how that jobs needs human ingenuity. (Unlike all the others, of course :) )

Now completely I agree with you that a significant part of our job is understanding and structuring the problem, but I'm not sure it can't be done in another way. We usually get taking in when we think about what machines will be able to do by thinking that just because we use intelligence (general/human intelligence) to solve the task it means that it's a requirement. Think chess. Or even calculating (as in, with numbers). Or go. Etc.

The funny thing is that we don't know, until someone does it. I've been thinking for a while that a lot of what I do could be done by a chat bot. Asking clarification questions. Of course, I do have a lot of background knowledge and that's how I can come up with those questions, but that knowledge is probably easy to acquire from the internet and then use it as training data. (Just like we have an awful lot of code available, we have a lot of problem descriptions, questions, comments and some requirement specifications/user guides.)

The hard part would probably be not what we have learned as a software developer, but the things we have learned while we were small kids and also the things that we have learned since, on the side. I.e. being a reasonable person. Understanding what people usually do and want. So the shared context. But I'm not sure it's needed that much.

So yeah, I can imagine a service that will talk to a user about what kind of app they want (first just simpler web sites, web shops, later more and more complicated ones) and then just show them "here is what it does and how it works". And then you can say what you'd like to be changed. The color or placement of a button (earlier versions) or even the association type between entities (oh, but a user can have multiple shipping addresses).

GuB-424y ago

I think programmers are relatively "safe" from AI for the simple reason they are the ones who talk to AI.

The job of programmers is to have machines do stuff so that humans don't have to, and of course, they do it for themselves too. Scripts, libraries, compilers, they are just tools to avoid flipping bits by hand. If something like copilot is not embraced by all programmers, it is that it is often less than helpful, and even then, some have adopted it. If we have super-advanced AI that can have a high level understanding of a problem and writes the app for you, then it is not much more than a super-compiler, and there will be programmers who will tell the super-compiler what to do, think of it as a new, super high level programming language. The job will evolve, but there will always be someone who tells the computer what to do.

And if there is no one needed to tell the computer what to do, that's what some people call "the singularity". Programming, or its evolution will probably be the last technical job. Social jobs may continue further, simply because humans like humans because they are human. Maybe the oldest profession will also be the last profession.

5 more replies

zzt1234y ago

I sometimes get the feeling that all my coding is actually a class of mathematical transforms that I have no idea how to define but feel very strongly that it is definable and AI-able.

Well it’d a curious day when an AlphaGo moment hits coding. Would be funny if it happened at the same time as Fed rate increases and destabilizing world events this year (the path from median human to top human is shallow). Mass firing of a few million highly paid redundancies out of the blue? Would be quite a sight.

Or maybe it wouldn’t happen that way, but rather it would pave the way for a leaner set of startups that were built with the power to do the same thing at the same or better velocity with an order of magnitude or fewer people.

nonameiguess4y ago

What professions are these? Chat bots didn't eliminate human CSRs. OCR didn't eliminate human data entry. Object detection hasn't eliminated human intelligence analysts. Machine translation hasn't eliminated human translators. Humans still make a living as professional Chess and Go players. Truck drivers were supposed to be on the chopping block a decade ago, yet they're more in demand now than ever. Human radiologists haven't gone anywhere. Even GPT-N hasn't eliminated human writers. Human transcriptionists haven't even been eliminated. We just have a lot more videos that automatically get shitty transcriptions instead of none at all now.

wantsanagent4y ago

Have you tried out Github Copilot yet? I find it super interesting. It turns writing code into more of a write some comments, let it write the code, review the code, realize what I actually need it to do, revise the comments, then tighten up the generated code.

Most surprisingly I can quickly tackle domains that require libraries I don't know because a combination of code generation and IDE hinting means I can write comments and pseudo code and the tool then provides at least a first pass best method to use.

Can't say if I write better code with Copilot but it's worth experiencing!

OOPMan4y ago

I've been playing with Copilot as well.

It's very good at handling boilerplate and making contextual suggestions.

I don't see it eating my cake, but it's definitely a very useful tool for saving time.

iab4y ago

I think a good yardstick for this is something that is generative; so, for instance, can the system generate a good programming challenge question? This is still a no.

nopinsight4y ago

Although most good developers will likely keep their jobs for the foreseeable future, the relative importance, and payoff, of different skills might change.

Lower-level coding could become more and more automated, raising the values and wages of complementary skills such as requirements elicitation and understanding of business impact from technological decisions. [1]

Some of these, however, can be done by businesspeople who know how to think and express their ideas precisely, such that a neural model can turn them into a decent draft of code. (These days, many more youths learn to code before going into other fields. They have training for thinking precisely.) There can be fewer job opportunities for some groups of developers.

Thus, a hedge against possible job loss is still required. Owning substantial equity in a company/startup and other assets would be one good strategy.

[1] https://en.wikipedia.org/wiki/Complementary_good

jsiaajdsdaa4y ago

AI might take over management quicker than you think. If the objective is to get the rocket into space, AI might know the requirements better than humans at this rate.

nobody04y ago

I agree with you on this, since we humans want things in a vague form, and it's still very hard for computers to infer insights from those ambiguous requirements. Not very easy to do that by taking derivatives of function compositions.

algon334y ago· 7 in thread

How suprising did you guys find this? I'd have said there was a 20% chance of this performing at the median+level if I was asked to predict things beforehand.

machiaweliczny4y ago

I am surprised, as recently OpenAI had ~25% of easy problems and ~2% in competitive problems. Seems like DeepMind is ahead in this topic as well.

Actually I think Meta AI had some interesting discovery recently that could possibly improve NNs in genral, so probably this as well.

I am not in field but wonder if some other approaches like Tsetlin machines would be more useful for programming.

algon334y ago

Somehow I have never heard of Tsetlin machines before this. Are you talking about this https://ai.facebook.com/blog/the-first-high-performance-self... result by MetaAI?

1 more reply

hackinthebochs4y ago

I didn't find it very surprising, but then I tend to be more optimistic than average about the capabilities of transformer models and the prospect of general AI in the relatively near term.

marcusbuffett4y ago

I would have guessed around the same chance, this was surprising to me after playing around with copilot and not being impressed at all.

baobabKoodaa4y ago

I would have said there is a ~0% chance of this happening within our lifetimes.

Isinlor4y ago

There is a prediction market called Metaculus.

TL;DR In 2020 community of 169 people and the best forecasters were assigning ~15% that it will happen by July 2021.

More specifically, on Dec 31, 2016 in partnership with Center for the Study of Existential Risk, Machine Intelligence Research Institute, and The Future of Life Institute they asked:

How long until a machine-learning system can take a simple text description and turn it into a program coded in C/Python?

https://www.metaculus.com/questions/405/when-will-programs-w...

First 19 forecasters in March 2017 were predicting mid-2021, the best forecasters were predicting late 2024. When the question closed in 2020 the community was predicting January 2027 and the best forecasters were predicting March 2030.

The question resolved on July 2021 when Codex was published.

Community and the best forecasters were assigning ~15% that it will happen by July 2021.

I'm currently 14th best forecaster there and I was predicting 33% before July 2021. It was my last prediction, and it was made on October 2018.

I'm also predicting 75% that we will have AGI by 2040 as defined in this question:

https://www.metaculus.com/questions/3479/when-will-the-first...

20% that it will happen before 2030.

There is also stronger operationalization:

https://www.metaculus.com/questions/5121/when-will-the-first...

My prediction here is 60% before 2040 and 5% before 2030.

I have also "canary in the coal mine" questions:

When will AI achieve competency on multi-choice questions across diverse fields of expertise? Community predicts 50% before 2030, I agree.

https://www.metaculus.com/questions/5276/ai-competence-in-di...

When will AI be able to learn to play Montezuma's Revenge in less than 30 min? Community predicts 50% before 2025, I think 50% before 2027.

https://www.metaculus.com/questions/5460/ai-rapidly-learning...

algon334y ago

For some reason I forgot to check metaculus for this. Thanks for the reminder.

mirrorlake4y ago· 5 in thread

I've been wondering this for a while:

In the future, code-writing AI could be tasked with generating the most reliable and/or optimized code to pass your unit tests. Human programmers will decide what we want the software to do, make sure that we find all the edge cases and define as many unit tests as possible, and let the AI write significant portions of the product. Not only that, but you could include benchmarks that pit AI against itself to improve runtime or memory performance. Programmers can spend more time thinking about what they want the final product to do, rather than getting mired in mundane details, and be guaranteed that portions of software will perform extremely well.

Is this a naive fantasy on my part, or actually possible?

qayxc4y ago

> Is this a naive fantasy on my part, or actually possible?

Possible, yes, desirable, no.

The issue I have with all these end-to-end models is that they're a massive regression. Practitioners fought tooth and nails to get programmers to acknowledge correctness and security aspects.

Mathematicians and computer scientists developed theorem solvers to tackle the correctness part. Practitioners proposed methodologies like BDD and "Clean Code" to help with stability and reliability (in terms of actually matching requirements now and in the future).

AI systems throw all this out of the window by just throwing a black box onto the wall and scraping up whatever sticks. Unit tests will never be proof for correctness - they can only show the presence of errors, not their absence.

You'd only shift the burden from implementation (i.e. the program) to the tests. What you actually want is a theorem prover that proofs the functional correctness in conjunction with integration tests that demonstrate the runtime behaviour if need be (i.e. profiling) and references that link implementation to requirements.

The danger lies in the fact that we already have a hard time getting security issues and bugs under control with software that we should be able to understand (i.e. fellow humans wrote and designed it). Imagine trying to locate and fix a bug in software that was synthesised by some elaborate black box that emitted inscrutable code in absence of any documentation and without references to requirements.

EVa5I7bHFq9mnYK4y ago

It seems to me that writing an exhausting set of unit cases is harder than writing the actual code.

aduitsis4y ago

Otherwise the AI will just over-fit the unit test case subset.

machiaweliczny4y ago

First you need really good infra to make it easy to test working multiple solutions for AI but I think this will be bleeding edge in 2030.

EDIT: with in-memory DBs I can imagine AI assisted mainframe than can solve 90% of business problems.

phreeza4y ago

And a second AI to generate additional test cases similar to yours (which you accept as also in scope) to avoid the first AI gaming the test.

jonas_kgomo4y ago· 5 in thread

Genuine question, what are the reasons to be a software engineer without much ML knowledge in 2022. Seems like a wake up call for developers

jonas_kgomo4y ago

7 months ago, I asked natfriedman the same question, of which he responded: "We think that software development is entering its third wave of productivity change. The first was the creation of tools like compilers, debuggers, garbage collectors, and languages that made developers more productive. The second was open source where a global community of developers came together to build on each other's work. The third revolution will be the use of AI in coding. The problems we spend our days solving may change. But there will always be problems for humans to solve."

https://news.ycombinator.com/item?id=27676266&p=2

eulers_secret4y ago

> what are the reasons to be a software engineer without much ML knowledge in 2022.

I'm not quite sure what you're asking, but my reason is that I do not enjoy working on/with ML. I'd personally rather quit the industry.

But I work in embedded/driver development. I do not worry about ML models replacing me yet, but if I were just gluing together API calls I would be a bit worried and try to specialize.

slingnow4y ago

Genuine question: what are the reasons to be a carpenter without much robotics / automation knowledge in 2022. Seems like a wakeup call for carpenters.

qualudeheart4y ago

Find something that’s hard and interesting. Someone will probably have a business trying to solve it and will hire you.

0xdeadbeefbabe4y ago

I hope you are right, but just to answer the question: all those other AI winters.

1 more reply

gfd4y ago· 4 in thread

Relevant blogpost on codeforces.com (the competitive programming site used): https://codeforces.com/blog/entry/99566

Apparently the bot would have a rating of 1300. Although the elo rating between sites is not comparable, for some perspective, mark zuckerberg had a rating of ~1k when he was in college on topcoder: https://www.topcoder.com/members/mzuckerberg

baobabKoodaa4y ago

The median rating is not descriptive of median ability, because a large number of Codeforces competitors only do one or a few competitions. A very small number of competitors hone their skills over multiple competitions. If we were to restrict our sample to competitors with more than 20 competitions, the median rating would be much higher than 1300. It's amazing that Alphacode achieved a 1300 rating, but compared to humans who actually practice competitive coding, this is a low rating.

To clarify, this is a HUGE leap in AI and computing in general. I don't mean to play it down.

YeGoblynQueenne4y ago

>> To clarify, this is a HUGE leap in AI and computing in general. I don't mean to play it down.

Sorry, but it's nothing of the sort. The approach is primitive, obsolete, and its results are very poor.

I've posted this three times already but the arxiv preprint includes an evaluation against a formal benchmark dataset, APPS. On that more objective measure of performance, the best performing variant of AlphaCode tested, solved 25% of the easiest tasks ("introductory") and less than 10% of the intermediary ("interview") and advanced ("competition") tasks.

What's more, the approach that AlphaCode takes to program generation is primitive. It generates millions of candidate programs and then it "filters" them by running them against input-output examples of the target programs taken from the problem descriptions. The filtering still leaves thousands of candidate programs (because there are very few I/O examples and the almost random generation can generate too many programs that pass the tests, but still don't solve the problem) so there's an additional step of clustering applied to pare this down to 10 programs that are finally submitted. Overall, that's a brute-force, almost random approach that is ignoring entire decades of program synthesis work.

To make an analogy, it's as if DeepMind had just published an article boasting of its invention of a new sorting algorithm... bubblesort.

3 more replies

gfd4y ago

You can find the rating distribution filtered for >5 contests here: https://codeforces.com/blog/entry/71260

I am rated at 2100+ so I do agree that 1300 rating is low. But at the same time it solved https://codeforces.com/contest/1553/problem/D which is rated at 1500 which was actually non-trivial for me already. I had one wrong submit before getting that problem correct and I do estimate that 50% of the regular competitors (and probably the vast majority of the programmers commenting in this thread right now) should not be able to solve it within 2hrs.

4 more replies

shihab4y ago

For comparison, I used to be a very average, but pretty regular user about 5 years ago. I could reliably solve easiest 2 out of 5 problems, 3 in my lucky days.

My rating is 1562.

jakey_bakey4y ago· 4 in thread

At the risk of sounding relentlessly skeptical - surely by training the code on GitHub data you're not actually creating an AI to solve problems, but creating an extremely obfuscated database of coding puzzle solutions?

ogogmad4y ago

We validated our performance using competitions hosted on Codeforces, a popular platform which hosts regular competitions that attract tens of thousands of participants from around the world who come to test their coding skills. We selected for evaluation 10 recent contests, each newer than our training data. AlphaCode placed at about the level of the median competitor, marking the first time an AI code generation system has reached a competitive level of performance in programming competitions.

[edit] Is "10 recent contests" a large enough sample size to prove whatever point is being made?

YeGoblynQueenne4y ago

The test against human contestants doesn't tell us anything because we have no objective measure of the ability of those human coders (they're just the median in some unknown distribution of skill).

There's more objective measures of performance, like a good, old-fashioned, benchmark dataset. For such an evaluation, see table 10 in the arxiv preprint (page 21 of the pdf), listing the results against the APPS dataset of programming tasks. The best performing variant of AlphaCode solves 25% of the simplest ("introductory") APPS tasks and less than 10% of the intermediary ("interview") and more advanced ones ("competition").

So it's not very good.

Note also that the article above doesn't report the results on APPS. Because they're not that good.

solididiot4y ago

Does it need to solve original problems? Most of the code we write is dealing with the same problems in a slightly different context each time.

As others say in commends it might be the case where we meet in the middle. Us writing some form of tests for AI-produced code to pass.

qualudeheart4y ago

That’s been a common objection to Copilot and other recent program synthesis papers.

The models regurgitate solutions to problems already encountered in the training set. This is very common with Leetcode problems and seems To still happen with harder competitive programming problems.

I think someone else in this thread even pointed put an example of AlphaCode doing the same thing.

ahgamut4y ago· 3 in thread

I find almost every new advance in deep learning is accompanied by contrasting comments: it's either "AI will soon automate programming/<insert task here>", or "let me know when AI can actually do <some-difficult-task>". There are many views on this spectrum, but these two are sure to be present in every comment section.

IIUC, AlphaCode was trained on Github code to solve competitive programming challenges on Codeforces, some of which are "difficult for a human to do". Suppose AlphaCode was trained on Github code that contains the entire set of solutions on Codeforces, is it actually doing anything "difficult"? I don't believe it would be difficult for a human to solve problems on Codeforces when given access to the entirety of Github (indexed and efficiently searchable).

The general question I have been trying to understand is this: is the ML model doing something that we can quantify as "difficult to do (given this particular training set)"? I would like to compute a number that measures how difficult it is for a model to do task X given a large training set Y. If the X is part of the training set, the difficulty should be zero. If X is obtained only by combining elements in the training, maybe it is harder to do. My efforts to answer this question: https://arxiv.org/abs/2109.12075

In recent literature, the RETRO Transformer (https://arxiv.org/pdf/2112.04426.pdf) talks about "quantifying dataset leakage", which is related to what I mentioned in the above paragraph. If many training samples are also in the test set, what is the model actually learning?

Until deep learning methods provide a measurement of "difficulty", it will be difficult to gauge the prowess of any new model that appears on the scene.

pedrosorio4y ago

> Suppose AlphaCode was trained on Github code that contains the entire set of solutions on Codeforces, is it actually doing anything "difficult"?

They tested it on problems from recent contests. The implication being: the statements and solutions to these problems were not available when the Github training set was collected.

From the paper [0]: "Our pre-training dataset is based on a snapshot of selected public GitHub repositories taken on 2021/07/14" and "Following our GitHub pre-training dataset snapshot date, all training data in CodeContests was publicly released on or before 2021/07/14. Validation problems appeared between 2021/07/15 and 2021/09/20, and the test set contains problems published after 2021/09/21. This temporal split means that only information humans could have seen is available for training the model."

At the very least, even if some of these problems had been solved exactly before, you still need to go from "all of the code in Github" + "natural language description of the problem" to "picking the correct code snippet that solves the problem". Doesn't seem trivial to me.

> I don't believe it would be difficult for a human to solve problems on Codeforces when given access to the entirety of Github (indexed and efficiently searchable).

And yet, many humans who participate in these contests are unable to do so (although I guess the issue here is that Github is not properly indexed and searchable for humans?).

[0] https://storage.googleapis.com/deepmind-media/AlphaCode/comp...

ahgamut4y ago

> They tested it on problems from recent contests. The implication being: the statements and solutions to these problems were not available when the Github training set was collected.

Yes, and I would like to know how similar the dataset(s) were. Suppose the models were trained only on greedy algorithms and then I provided a dynamic programming problem in the test set, (how) would the model solve it?

> And yet, many humans who participate in these contests are unable to do so (although I guess the issue here is that Github is not properly indexed and searchable for humans?).

Indeed, so we don't know what "difficult" means for <human+indexed Github>, and hence we cannot compare it to <model trained on Github>.

My point is, whenever I see a new achievement of deep learning, I have no frame of reference (apart from my personal biases) of how "trivial" or "awesome" it is. I would like to have a quantity that measures this - I call it generalization difficulty.

Otherwise the datasets and models just keep getting larger, and we have no idea of the full capability of these models.

1 more reply

usrbinbash4y ago

> The implication being: the statements and solutions to these problems were not available when the Github training set was collected.

But similar ones were, because the amount of code puzzles suitable for such contests is finite. There are differences, but when you have seen a few string-compare problems, you have a pretty good grasp of what they look like, what common paths to solutions are, etc.

d0mine4y ago· 3 in thread

It reminds me that median reputation on StackOverflow is 1. All AlphaSO would have to do is to register to receive median reputation on SO ;) (kidding aside AlphaCode sounds like magic)

Inventing relational DBs hasn't replaced programmers, we just write custom DB engines less often. Inventing electronic spreadsheets hasn't deprecated programmers, it just means that we don't need programmers for corresponding tasks (where spreadsheets work well).

AI won't replace programmers until it grows to replace the humanity as a whole.

falcor844y ago

>AI won't replace programmers until it grows to replace the humanity as a whole.

Yes, but after seeing this progress in the former, my time estimate of time remaining until the latter had just significantly shortened.

d0mine4y ago

Given close to zero chances of a safe AI, I'm optimistic that AI is a much tougher problem and we are not significantly closer to the solution than e.g., in 60s when computer vision was a summer project.

There is a progress in certain domains (such as image recognition) but (outside specialized tasks) gigantic language models look like no more than impressive BS generators.

qualudeheart4y ago

I don’t even think the “will AI replace human programmers” question is that interesting anymore. My prediction is that a full replacement won’t happen until we achieve general artificial intelligence, and have it treat programming as it would any other problem.

Elsewhere ITT I’ve claimed that to fully automate programming you also need a model of the external world that’s on par with a humans.

Otherwise you can’t work a job because you don’t know how to do the many other tasks that aren’t coding.

You need to understand what the business goals are and how your program solves them.

BoardsOfCanada4y ago· 3 in thread

Do I understand it correctly that it generated (in the end) ten solutions that then were examined by humans and one picked? Still absolutely amazing though.

thomasahle4y ago

No human examination was done.

But it generated 10 solutions which it ran against the example inputs, and picked the one that passed.

Actually I'm not sure if it ran the solutions against the example inputs or the real inputs.

aliceryhl4y ago

They used the real inputs. The example inputs were used to filter out which candidates to submit for the 10 tries.

aliceryhl4y ago

No, they gave the algorithm 10 tries and tested all of them, and said that it was solved if any one of them worked.

blt4y ago· 2 in thread

I am always surprised by the amount of skepticism towards deep learning on HN. When I joined the field around 10 years ago, image classification was considered a grand challenge problem (e.g. https://xkcd.com/1425/). 5 years ago, only singularity enthusiast types were envisioning things like GPT-3 and Copilot in the short term.

I think many people are uncomfortable with the idea that their own "intelligent" behavior is not that different from pattern recognition.

I do not enjoy running deep learning experiments. Doing resource-hungry empirical work is not why I got into CS. But I still believe it is very powerful.

qayxc4y ago

This scepticism shouldn't surprise you. Not being sceptical is just an indicator that you've not been in the field for long enough.

30 years ago, the end of programming was prophesised, because 5th generation languages (5GL) and visual programming would enable everybody to design and build software.

20 years ago, low-code and application builders were said to revolutionise the industry and allow people in business roles to build their applications using just a few clicks. End-to-end model-driven design and development (e.g. using Rational Rose and friends) were to put an end to bugs and maintenance problems.

10 years ago it was new programming languages (e.g. Rust, Go, Swift, ...) and a shift to functional programming that was advertised as being "the future".

Today it's back to "no code", e.g. tool-(AI-)driven development that's all the rage.

It's not so much being "uncomfortable" or clinging to the exceptionalism of the human mind. It's just experience. Every decade saw its great big hype and technological breakthrough, but the lofty promises didn't hold water.

Note that this doesn't mean nothing changed - model driven development still has its niche, visual programming is widely used in video production, rendering and game development. Features of functional programming have been added to many "legacy" languages and many of the newly introduced programming languages have become mainstream.

The same will happen with AI generated software. There a large portion of the "mechanical" process of programming will be done by AI. Large and complex software systems with changing requirements, however, will still be designed and implemented primarily by people.

Programming is a conversation between humans and machines. AI will in many cases shift the conversation closer to the human side, but fundamentally it'll still be the same thing.

I like to think of it as the difference between writing your program in assembly and writing it in Haskell; different approaches, same basic activity.

hnfong4y ago

I think you and GP are talking about different things.

You're saying a lot of so-called technological breakthrough is more hype than substance. The GP is saying that people tend to dismiss actual breakthroughs as mundane stuff. Once $method is published that solves $hardproblem, people comment as if $hardproblem was never hard in the first place, and moves the goalposts a bit saying "if $harderproblem can be solved, then that would be profound".

I think the truth is (obviously) somewhere in between. Btw, I dare you go back to a 1980s programming environment and tell me that the programming paradigm shifts are just hype :D My one-liner python scripts can probably do much more than an average coder writing assembly... and given modern hardware my code runs faster too!

1 more reply

rabbits774y ago· 2 in thread

What I always find missing from these Deep Learning showcase examples are an honest comparison to existing work. It isn’t like computers haven’t been able to generate code before.

Maybe the novelty here is working from the English language specification, but I am dubious just how useful that really is. Specifications are themselves hard to write well too.

And what if the “specification” was some Lisp code testing a certain goal, is this any better then existing Genetic Programming?

Maybe it is better but in my mind it is kind of suspicious that no comparison is made.

I love Deep Learning but nobody does the field any favors by over promising and exaggerating results.

thephyber4y ago

I have fiddled with genetic programming. I don't think there is a good solution for a useful metric for comparing one code generator against another, so I don't think DeepMind should care.

Most of the genetic programming results code generated by my algos doesn't compile. Very occasionally the random conditions exist to allow it to jump over a "local maxima" and come up with a useful candidate source code. Sometimes the candidates compile, run, and produce correct results.

The time it takes to run varies vastly with parameters (like population, how the mutation function works, how the fitness function weights/scores, etc).

Personally I really like that these DeepMind announcements don't get lost in performance comparisons, because inevitably those would get bogged down in complaints like "the other thing wasn't tuned as well as this one was". Let 3rd party researchers who have access to both do that work, independently.

1 more reply

wantsanagent4y ago

Specs are hard to write because the person interpreting them may not understand you, but it's the iteration time and cost that kills you.

Make me a sandwich -> two weeks and $10k isn't viable

Make me a sandwich -> 2 seconds and free, totally viable

pretendscholar4y ago· 2 in thread

I am a little bitter that it is trained on stuff that I gave away for free and will be used by a billion dollar company to make more money. I contributed the majority of that code before it was even owned by Microsoft.

Permit4y ago

Can you elaborate and give some history? What code did you contribute, and how did it end up being used by Microsoft and then DeepMind?

2 more replies

visarga4y ago

Paying it forward, it will help others in turn.

1 more reply

EGreg4y ago· 1 in thread

To me, coding in imperative languages are one of the hardest things to produce an AI for with current approaches (CNN’s, MCTS and various backpropagation). Something like Cyc would seem to be a lot more promising…

And yet, I am starting to see (with GitHub’s Copilot, and now this) a sort of “GPT-4 for code”. I do see many problems with this, including:

1. It doesn’t actually “invent” solutions on its own like AlphaZero, it just uses and remixes from a huge body of work that humans put together,

2. It isn’t really ever sure if it solved the problem, unless it can run against a well-defined test suite, because it could have subtle problems in both the test suite and the solution if it generated both

This is a bit like readyplayer.me trying to find the closest combination of noses and lips to match a photo (do you know any open source alternatives to that site btw?)

But this isn’t really “solving” anything in an imperative language.

Then again, perhaps human logic is just an approaching with operations using low-dimensional vectors, able to capture simple “explainable” models while the AI classifiers and adversarial training produces far bigger vectors that help model the “messiness” of the real world and also find simpler patterns as a side effect.

In this case, maybe our goal shouldn’t be to get solutions in the form of imperative language or logic, but rather unleash the computer on “fuzzy” inputs and outputs where things are “mostly correct 99.999% of the time”. The only areas where this could fail is when some intelligent adversarial network exploits weaknesses in that 0.001% and makes it more common. But for natural phenomena it should be good enough !

qualudeheart4y ago

Can you write more about how Cyc would help? The idea behind Cyc is cool but I don’t think I’ve seen anyone discuss using it for program synthesis.

prideout4y ago· 1 in thread

It is obvious to me that computer programming is an interesting AI goal, but at the same time I wonder if I'm biased, because I'm a programmer. The authors of AlphaCode might be biased in this same way.

I guess this makes sense though, from a practical point of view. Verifying correctness would be difficult in other intellectual disciplines like physics and higher mathematics.

thomasahle4y ago

Just make it output a proof together with the program.

1 more reply

udev4y ago· 1 in thread

I am thinking whether this result can create a type of loop that can self-optimize.

We have AI to generate reasonable code from text problem description.

Now what if the problem description text is to generate such a system in the first place?

Would it be possible to close the loop, so to speak, so that over many iterations:

- text description is improved

- output code is improved

Would it be possible to create something that converges to something better?

machiaweliczny4y ago

I am actually trying this. Basically by asking questions to AI and teaching it to generate code / google when it doesn't know something. The other process checks if code is valid and either ask it to get more context or executes code and feeds back to file :)

2 more replies

throwaway57524y ago· 1 in thread

Most people here are programmers (or otherwise involved in the production of software). We shouldn't look at RPA and other job automation trends dispassionately. SaaS valuations aren't were they are (and accounting doesn't treat engineering salary as cost of goods sold) because investors believe that they will require armies of very well paid developers in perpetuity.

countvonbalzac4y ago

what?

xibalba4y ago· 1 in thread

Between developments like this (and Copilot [Is there a general accepted word for this class of things e.g. "AI Coders"?) and the move toward fully remote, I predict the mean software engineering salary in the United States will be lower in 10 years (in real dollars) than it is today.

evouga4y ago

I think this is a safe bet, but I would make it with or without the presence of AI Coders. We're clearly in the middle of Tech Bubble 2.0 and it's sure to pop in the next 10 years (and probably much sooner, given the recent crypto and NASDAQ rumblings).

1 more reply

mcast4y ago· 1 in thread

The year is 2025, Google et al. are now conducting technical on-site interviews purely with AI tools and no human bias behind the camera (aside from GPT-3's quirky emotions). The interview starts with a LC hard, you're given 20 minutes -- good luck!

jakey_bakey4y ago

I think Amazon already tried this and it had surprisingly racist results

wilde4y ago· 1 in thread

Oh sweet! When can skip the bullshit puzzle phone screens?

errcorrectcode4y ago

Ali Group CAPTCHA's or Android unlock?

NicoJuicy4y ago· 1 in thread

I would stop programming if all we needed to write was unit tests :p

FartyMcFarter4y ago

To compensate, lots of people would start programming if that happened though. Many scientists would be interested in solving their field's problems so easily - certainly maths would benefit from it.

1 more reply

ensan4y ago· 1 in thread

Wake me up when an AI creates an operating system on the same level of functionality as early-years Linux.

errcorrectcode4y ago

That will happen faster than you can conceive because you won't be aware of the progress until it is announced.

And, have you tried polling? I hear it keeps the CPU warm in winter. Interrupts are so ... this just in, Nike's stock jump 3% ... Where was I? Did I save my task context properly? Did I reenable interrupts?

agentultra4y ago

This is kind of neat. I wonder if it will one day be possible for it to find programs that maintain invariant properties we state in proofs. This would allow us to feel confident that even though it's generating huge programs that do weird things a human might not think of... well that it's still correct for the stated properties we care about, ie: that it's not doing anything underhanded.

londons_explore4y ago

> AlphaCode placed at about the level of the median competitor,

In many programming contests, a large number of people can't solve the problem at all, and drop out without submitting anything. Frequently that means the median scoring solution is a blank file.

Therefore, without further information, this statement shouldn't be taken to be as impressive as it sounds.

aidenn04y ago

> Creating solutions to unforeseen problems is second nature in human intelligence

If this is true then a lot of the people I know lack human intelligence...

mwattsun4y ago

Seems to me that this accelerates the trend towards a more declarative style of programming where you tell the computer what you want to do, not how to do it

erwincoumans4y ago

It would be interesting if a future 'AlphaZeroCode' with access to a compiler and debugger can learn to code, generating data using self-play. Haven't read the paper yet, seems some impressive milestone.

mrsuprawsm4y ago

Does this mean that we can all stop grinding leetcode now?

timetotea4y ago

If you want some video explanation https://youtu.be/Qr_PCqxznB0

errcorrectcode4y ago

And this is how we reach the technological singularity and how programmers become as equivalently out-of-demand as piano tuners: self-programming systems.

AI will eat any and all knowledge work because there's very little special a human can do that a machine won't be able to do eventually, and much faster and better. It won't be tomorrow, but the sands are inevitably shifting this way.

knowmad4y ago

I agree with most of the comments I've read in this thread. Writing code to solve a well defined narrowly scoped problem isn't that hard or valuable. It's determining what the problem actually is and how software could be used to solve it that is challenging and valuable.

I would really like to see more effort in the AI/ML code generation space being put into things like code review, and system observation. It seems significantly more useful to use these tools to augment human software engineers rather than trying to tackle the daunting and improbable task of completely replacing them.

*Note: as a human software engineer I am biased

thomasahle4y ago

Next they can train it on kaggle, and we'll start getting closer to the singularity

tasubotadas4y ago

I just hope that this shows how useless competitive programming is that it can be replace by the Transformer-model.

Additionally, people should REALLY rething their coding interviews if they can be solved by a program.

1 more reply

derelicto4y ago

Hey, honest question: how does one get into competitive programming? I imagine it goes far beyond just leetcoding but honestly i don't even know where to start.

a-dub4y ago

> In our preprint, we detail AlphaCode, which uses transformer-based language models to generate code at an unprecedented scale, and then smartly filters to a small set of promising programs

if you're using a large corpus of code chunks from working programs as symbols in your alphabet, i wonder how much entropy there actually is in the space of syntactically correct solution candidates.

deepbream4y ago

This result is well worth a meme.

https://opensea.io/assets/0x495f947276749ce646f68ac8c2484200...

nsikorr4y ago

I suspect these code generating AIs will bring the singularity at some point in the future. Even if we don’t manage to create an artificial general intelligence, they will. I imagine they will learn to code on super human levels through self play just like AlphaGo and AlphaZero did. This will be awesome.

dantodor4y ago

Great. Now the only thing remaining is POs being able to come with a clear spec and I'm out of job

thorwwaskeas4y ago

Since they used the tests this is not something you can do if you don't have a rich battery of tests.

Perhaps many problems are something like finite automata and the program discover the structure of the finite automata and also an algorithm for better performance.

YeGoblynQueenne4y ago

>> AlphaCode ranked within the top 54% in real-world programming competitions, an advancement that demonstrates the potential of deep learning models for tasks that require critical thinking.

Critical thinking? Oh, wow. That sounds amazing!

Let's read further on...

>> At evaluation time, we create a massive amount of C++ and Python programs for each problem, orders of magnitude larger than previous work. Then we filter, cluster, and rerank those solutions to a small set of 10 candidate programs that we submit for external assessment.

Ah. That doesn't sound like "critical thinking", or any thinking. It sounds like massive brute-force guessing.

A quick look at the arxiv preprint linked from the article reveals that the "massive" amount of prorgams generated is in the millions (see Section 4.4). These are "filtered" by testing them against program input-output (I/O) examples given in the problem descriptions. This "filtering" still leaves a few thousands of candidate programs that are further reduced by clustering to "only" 10 (which are finally submitted).

So it's a generate-and-test approach rather than anything to do with reasoning (as claimed elsewhere in the article) let alone "thinking". But why do such massive numbers of programs need to be generated? And why are there still thousands of candidate programs left after "filtering" on I/O examples?

The reason is that the generation step is constrained by the natural-language problem descriptions, but those are not enough to generate appropriate solutions because the generating language model doesn't understand what the problem descriptions mean; so the system must generate millions of solutions hoping to "get lucky". Most of those don't pass the I/O tests so they must be discarded. But there are only very few I/O tests for each problem so there are many programs that can pass them, and still not satisfy the problem spec. In the end, clustering is needed to reduce the overwhelming number of pretty much randomly generated programs to a small number. This is a method of generating programs that's not much more precise than drawing numbers at random from a hat.

Inevitably, the results don't seem to be particularly accurate, hence the evaluation against programs written by participants in coding competitions, which is not any objective measure of program correctness. Table 10 on the arxiv preprint lists results on a more formal benchmar, the APPS dataset, where it's clear that the results are extremely poor (the best performing AlphaCode variant solves 20% of the "introductory" level problems, though outperforming earlier approaches).

Overall, pretty underwhelming and a bit surpirsing to see such lackluster results from DeepMind.

softwaredoug4y ago

I think CoPilot, etc will be revolutionary tools AND I think human coders are needed. Specifically I love CoPilot for the task of "well specified algorithm to solve problem with well-defined inputs and outputs". The kind of problem you could describe as a coding challenge.

BUT, our jobs have a lot more complexity

- Local constraints - We almost always work in a large, complex existing code base with specific constraints

- Correctness is hard - writing lots of code is usually not the hard part, it's proving it correct against amorphous requirements, communicated in a variety of human social contexts, and bookmarked.

- Precision is extremely important - Even if 99% of the time, CoPilot can spit out a correct solution, the 1% of the time it doesn't creates a bevy of problems

Are those insurmountable problems? We'll see I suppose, but we begin to verge on general AI if we can gather and understand half a dozen modalities of social context to build a correct solution.

Not to mention much of the skill needed in our jobs has much more to do with soft skills, and the bridge between the technical and the non technical, and less to do with hardcore heads-down coding.

Exciting times!

jdrc4y ago

I think it would be interesting the train a system end-to-end with assembly code instead of various programming languages. This would make it a much more generic compiler

alasdair_4y ago

The interesting stuff happens once AlphaCode gets used to improve the code of AlphaCode.

jdrc4y ago

"And so in 2022 the species programmus programmicus went extinct"

pedrobtz4y ago

What about finding bugs, zero-day exploits?

zmmmmm4y ago

Has nobody yet asked it to write itself?

j / k navigate · click thread line to collapse

397 comments

217 comments · 51 top-level

qualudeheart4y ago· 29 in thread

Calling it now: If current language models can solve competitive programming at an average human level, we’re only a decade or less off from competitive programming being as solved as Go or Chess.

Deepmind or openAI will do it. If not them, it will be a Chinese research group on par with them.

buscoquadnary4y ago

Apologies to Frank Herbet I just finished listening to Dune.

EDIT:

qualudeheart4y ago

As humans we have a coherent world model that current AI systems are nowhere near close to having.

That coherent world model is a necessary precondition for both understanding a business goal and implementing a program to solve it. AlphaCode can do the second part but not the first.

AlphaCode doesn’t have that world model and even if it did it still wouldn’t autonomously act on it, just follow orders from humans.

lugu4y ago

abecedarius4y ago

Three months ago in the Copilot thread I was saying

> in 5 years will there be an AI that's better than 90% of unassisted working programmers at solving new leetcode-type coding interview questions posed in natural language?

and getting pooh-poohed. https://news.ycombinator.com/item?id=29020401 (And writing that, I felt nervous that it might not be aggressive enough.)

hackinthebochs4y ago

1 more reply

udev4y ago

Yes, for very precise, comprehensive text descriptions of problems.

It will take a far-far more advanced AI to write such descriptions for real-world problems.

In this regard, we are safe for a few more decades at least.

tluyben24y ago

qualudeheart4y ago

Fully automating software engineering won’t happen until AGI. As a good Yuddite I expect us to have bigger problems when that happens.

You need an agent with a large and coherent world model, in order to understand how your programs relate to the real world, in order to solve business tasks.

This isn’t something any program synthesis tech currently available can do, because none of it has a coherent world model.

GPT-3 comes closest to this, but isn’t able to engage in any kind of planning or abstract modeling, beyond semi coherent extrapolations from training data.

Maybe scaling up GPT by a few more orders of magnitude would work, by generating an emergent world model along the way.

1 more reply

andy_ppp4y ago

I would actually argue the programmers job has never been 100% writing the code, it’s always been interpreting, fixing and decoding the ideas of others.

2 more replies

f38zf5vdt4y ago

If we become mechanics of the software AI vehicles of the future, so be it.

keewee74y ago

AI is being aggressively applied to areas where AI practitioners are domain experts. Think programming, data analysis etc.

Programmers and data scientists might find ourselves among the first half of knowledge workers to be replaced and not among the last as we previously thought.

Der_Einzige4y ago

I'm already anticipating having the job title of "Query Engineer" sometime in the next 30 years, and I do NLP including large scale language model training. :(

qualudeheart4y ago

One of the big venture capitalists predicted “prompt engineering” as a future high paid and high status position.

Essentially handling large language models.

Early prompt engineers will probably be drawn from “data science” communities and will be similarly high status, well but not as well paid, and require less mathematical knowledge.

I’m personally expecting an “Alignment Engineer” role monitoring AI systems for unwanted behavior.

This will be structurally similar to current cyber security roles but mostly recruited from Machine Learning communities, and embedded in a broader ML ecosystem.

2 more replies

zerr4y ago

qualudeheart4y ago

I said as much in another comment.

Jensson4y ago

This is in line with what other code generation AI's have accomplished.

phendrenad24y ago

Calling it now: Your prediction is off by an order of magnitude or two (10 years -> 100 years, or 1000 years)

muds4y ago

simpleguitar4y ago

It doesn't even have to be average human.

Let's say AI only gets to 10% (or 20% or 30% or whatever, it doesn't really matter), that's a huge number of jobs being lost.

I should brush up on my plumbing and apply for a plumbing license soon. (I think plumbing is safer than electricians, because many CS people have good EE foundations).

csee4y ago

You're extrapolating across very different types of problems. Go and Chess have unlimited training data. Competitive programming does not.

raphlinus4y ago

lugu4y ago

ww5204y ago

Didn’t we all (collectively) have this discussion the last time someone put the math functions in a library and rendered math calculation programmers obsolete?

solididiot4y ago

>> There’ll be several new career paths made possible by this technology as greater worker productivity makes possible greater specialization.

Can you list a few?

pkaye4y ago

How long before it can write the code without plagiarizing code from online?

falcor844y ago

How long before the typical human coder can do so?

1 more reply

stnmtn4y ago

Humans study CS for 5 years, reading code from online to be able to solve these problems.

EVa5I7bHFq9mnYK4y ago

Don't worry, there are a lot of much simpler jobs, like drivers or cashiers that will surrender to AI before coder's job does. So UBI will be implemented long before that happens.

solididiot4y ago

hmate94y ago· 22 in thread

It's the next step. Binary code < assembly < C < Python < AlphaCode

Historically its always been about abstracting and writing less code to do more.

streetcat14y ago

First, If this is correct, if alpha code succeeded, this will bring to its own demise.

I.e. as soon as it starts replacing humans, it will not have enough human generated training data, since all of programming will be done by models like himself.

Second, alphacode was specifically trained for competitive programming :

1. short programs. 2. Each program has 100's of human generated solutions.

However, commercial program are:

1. long. 2. Have no predefined answer or even correct answer. 3. Need to use/reuse a lot of legacy code.

AnIdiotOnTheNet4y ago

> as soon as it starts replacing humans, it will not have enough human generated training data, since all of programming will be done by models like himself.

After all, that's basically what we've done with the entire web stack.

chroem-4y ago

Reinforcement learning and adversarial training can render both of those concerns as non-issues in practice.

1 more reply

diehunde4y ago

Enginerrrd4y ago

1 more reply

fvold4y ago

I hear that.

ctoth4y ago

You ever notice how the "let me know when" part of this keeps changing? Let me know when computers can ... play Go/understand a sentence/compose music/write a program/ ...

But surely they'll never be able to do this new reference class you have just now come up with, right?

2 more replies

YeGoblynQueenne4y ago

https://en.wikipedia.org/wiki/Algorithmic_program_debugging

Of course all this targeted only Prolog programs so it's not well-known at all.

1 more reply

wittycardio4y ago

chroem-4y ago

> Solving competitive programming problems is essentially solving hard combinatorial optimization problems.

1 more reply

jdlshore4y ago

> If anything I'd say algorithms and math continue to be the core of programming.

1 more reply

mhzsh4y ago

visarga4y ago

Maybe it's not more abstraction we need, just automating the drudgery. Abstractions are limited - by definition they abstract things away, they are brittle.

vvilliamperez4y ago

Read: Ruby on Rails

pjmorris4y ago

falcor844y ago

629514134y ago

Model-driven development and code generation from UML were once supposed to be the future. It will be interesting to see how much further this approach takes us.

Assuming ANNs resemble the way human brain function you'd also expect them to introduce bugs. And so the actual humans beings would partake in debugging too.

Inufu4y ago

I agree, I expect programmers will just move up the levels of abstraction. I enjoyed this recent blog post on the topic: https://eli.thegreenplace.net/2022/asimov-programming-and-th...

hackinthebochs4y ago

6 more replies

bmc75054y ago

[1]: https://breandan.net/public/programming_with_intelligent_mac...

qualudeheart4y ago

elwell4y ago

> writing detailed documentation/specs

That's what code is.

FiberBundle4y ago· 15 in thread

All these approaches just seem like brute-force approaches: Let's just throw our transformer on this problem and see if we can get anything useful out of this.

noduerme4y ago

>> filter out 99% of incorrect solutions

And next year they can filter out 99.99%. And the year after that, 99.9999%. So literally, an exponentially greater number of monkey/typewriting units. (An AI produced Shakespeare play coming soon).

>> we have no clue at all what that actually is and how these model learn

>>get anywhere close to human (expert) capability in any sufficiently complex domain

TOMDM4y ago

Surely if your discrimintator gets orders of magnitude better like your describing, we could train the transformer GAN style, and reduce the dependence on generating so many examples to throw away.

parentheses4y ago

i like that you drew a connection with monkeys on typewriters.

briga4y ago

Vetch4y ago

> Imagine if real programmer needed to write a solution a hundred times

My advantage as a human is I can often tell you why I am eliminating this branch of the search space. The catch is my reasoning can be flawed. But we do ok.

> just copying previous solutions with slight adjustments.

While lots of commenters seem concerned about jobs, I look forward to having the dataset oliphaunt and ship computer from Fire Upon Deep someday soon.

2 more replies

plutonorm4y ago

How do you know the inner workings of the mind don't operate in a similar manner? How many different solutions to the problem are constructed within your mind before the correct one 'just arrives'?

1 more reply

faizshah4y ago

1 more reply

MattRix4y ago

They specifically stated that they tested it on 10 challenges that were newer than their training data, so it couldn’t just be plagiarizing content.

bricemo4y ago

YeGoblynQueenne4y ago

>> What do you think then is the difference between going from 50th to 99.9th percentile in their other domains? Is there something materially different between ago, protein folding, or coding?

For Go and protein structure prediction from sequences the search space is finite, although obviously not small. So there is a huge difference in the complexity of the problems right there.

At evaluation time, we create a massive amount of C++ and Python programs for each problem, orders of magnitude larger than previous work.

(The arxiv paper linked from the article quantifies this "massive" amount as "millions"; see Section 4.4).

1 more reply

FiberBundle4y ago

2 more replies

jahewson4y ago

derangedHorse4y ago

mikesabbagh4y ago

github autopilot scares me every time I write code on my personal pc and get those auto-suggestions. I am happy we dont have it at work yet.

It is clear writing code will soon be something of the past; maybe it is a bad idea to train our children to code. Let's make sure we milk every penny before the party is over!

evouga4y ago

Maybe… maybe… tools like Copilot will allow us to work at a higher level of abstraction (like optimizing compilers have allowed us to do).

AI will make some software engineering tasks more efficient and more accessible but human programmers are not going anywhere any time this side of the Singularity.

msoad4y ago· 15 in thread

stupidcar4y ago

Having used Copilot for a while, I am quite certain it will replace me as a programmer.

karmasimida4y ago

Copilot is cool and all.

I didn't find reading largely correct but still often wrong code is a good experience for me, or it adds up any efficiency.

Still, those are exciting technology, but again, there is a big if whether such machine learning model would happen at all.

solarmist4y ago

So while it will make the remaining programmers MUCH more productive, thereby reducing the needed number of programmers, I can't see it driving that number to zero.

1 more reply

TSiege4y ago

1 more reply

Hgsb4y ago

Google Ambiguity.

chongli4y ago

repetitive code like this.foo = foo; this.bar = bar etc...

xmprt4y ago

2 more replies

valyagolev4y ago

anyway. programming is automation; automation of programming is abstraction. using AI to write your code is just a bad abstraction - we are used to them

jxcole4y ago

I feel like you are very defensive here and I want to be sure we take time to recognize this as a real accomplishment.

Also, please remember that as with anything, within 5 years we should see vast improvements to this AI. I think it will be an important thing to watch.

nsxwolf4y ago

1 more reply

visarga4y ago

I just hope LMs will prove to be just as useful in software development as they are in their own field.

0xdeadbeefbabe4y ago

> but it could make developers 50x productive

More likely it will translate the abstraction level by some vector of 50 elements.

thomasahle4y ago

If you make developers 50x more efficient, won't you need 50x fewer developers?

4 more replies

sharemywin4y ago

ipnon4y ago

The big question seems to be whether par with professional programmers is a matter of increasing training set and flop size, or whether different model or multi-model architectures are required.

It does look like we've entered an era where programmers who don't use AI assistants will be disadvantaged, and that this era has an expiration date.

FemmeAndroid4y ago· 14 in thread

This is extremely impressive, but I do think it’s worth noting that these two things were provided:

e4e78a064y ago

ctoth4y ago

2 more replies

xorcist4y ago

You don't think it's impressive, yet you surmise that a computer program could compete at a level of the top 1% of all humans in five years?

That's wildly overstating the promise of this technology, and I'd be very surprised if the authors of this wouldn't agree.

1 more reply

Jensson4y ago

Groxx4y ago

I do kinda wonder if it'd lead to as good results if you just did a standard "matches the most terms the most times" search against all of github.

thomasahle4y ago

jakub_g4y ago

ctoth4y ago

2 more replies

ohwellhere4y ago

Is the next step in the evolution of programming having the programmer become the specifier?

Fuzzy business requirements -> programmer specifies and writes tests -> AI codes

2 more replies

jensensbutton4y ago

Maybe the problem transformation will be both the beginning _and_ end of the developer's role.

machiaweliczny4y ago

But it's easy to create AI conversation that will refine problem.

baobabKoodaa4y ago

> One of the things I like about competitive programming and the like is just getting to implement a clearly articulated problem

zbobet20124y ago

They used the tests. The specification being very approximate is fine, because they had a prebuilt way to "check" if their result was good.

1 more reply

elb20204y ago

I think they will always be limitations.

37ef_ced34y ago· 12 in thread

The example problem (essentially, is T a subsequence of S with deletions of size N) is a classic problem with no doubt dozens of implementations in AlphaCode's training set.

And yet, what a garbage solution it produces.

To illustrate the difference between intelligence and regurgitation, someone tell me what CoPilot generates for this:

  // A Go function to swap the sixth bit and seventeenth bit of a 32-bit signed integer.

Here is a human solution:

  func swap(x int32) int32 {
      const mask = 1 << 5
      var (
          xor1 = (x>>11 ^ x) & mask
          xor2 = xor1 << 11
      )
      return x ^ xor1 ^ xor2
  }

quenix4y ago

Here is what CoPilot wrote in Python:

  # A function to swap the sixth bit and seventeenth bit of a 32-bit signed integer.
  def swap_bits(x):
      # Get the value of the sixth bit.
      bit6 = x & (1 << 5)
  
      # Get the value of the seventeenth bit.
      bit17 = x & (1 << 16)
  
      # Swap the values of the sixth and seventeenth bit.
      bit6 = bit6 >> 5
      bit17 = bit17 >> 16

      # Combine the swapped values of the sixth and seventeenth bit.
      x = x ^ (bit6 << 16)
      x = x ^ (bit17 << 5)

      return x

It actually synthesised about 10 viable solutions. Some interesting ones:

  def swap_six_seventeen(x):
      # Get the binary representation of the integer.
      binary = bin(x)[2:]
      # Add zeros to the beginning of the binary representation.
      binary = '0' * (32 - len(binary)) + binary
      # Swap the sixth and seventeenth bit.
      binary = binary[:5] + binary[17] + binary[5:17] + binary[18:]
      # Convert the binary back to an integer.
      return int(binary, 2)

omnicognate4y ago

The first one (swap_bits) sets both bits to the same value, which is the original two bits XORed together. Eg.

    bin(swap_bits(0b_1_0000000000_0_00000))
    '0b10000000000100000'

    bin(swap_bits(0b_0_0000000000_1_00000))
    '0b10000000000100000'

    bin(swap_bits(0b_1_0000000000_1_00000))
    '0b0'

    bin(swap_bits(0b_0_0000000000_0_00000))
    '0b0'

1 more reply

37ef_ced34y ago

All wrong. These "solutions" are not correct.

UncleMeat4y ago

The xors are wrong. Stick in the integer 1<<16.

c4wrd4y ago

Ask and you shall receive, here is the output it gave me: // Write a Go function to swap the sixth bit and seventeenth bit of a 32-bit signed integer.

    package main
    
    import (
     "fmt"
     "math"
    )
    
    func main() {
     var a int32 = 0b1010101010101010101010101010101010101010101010101010101010101010
     fmt.Printf("%b\n", a)
     fmt.Printf("%b\n", swapBits(a, 6, 17))
    }
    
    func swapBits(a int32, i int, j int) int32 {
     // convert to binary
     bin := fmt.Sprintf("%b", a)
     // get the bits
     bit1 := bin[i-1 : i]
     bit2 := bin[j-1 : j]
     // swap the bits
     bin = bin[:i-1] + bit2 + bin[i:]
     bin = bin[:j-1] + bit1 + bin[j:]
     // convert back to int
     return int32(bin2int(bin))
    }
    
    func bin2int(bin string) int64 {
     var sum int64
     for i, v := range bin {
      if v == '1' {
       sum += int64(math.Pow(2, float64(len(bin)-i-1)))
      }
     }
     return sum
    }

37ef_ced34y ago

Ridiculous. It's a joke, right? Perhaps the most inefficient and naive solution ever?

CoPilot is regurgitating some solution from its training set, the solution of an inept programmer who would manipulate bits via conversion to string... yikes.

1 more reply

deanmen4y ago

You can do it without a subtraction

     unsigned int swapbits(unsigned int a) {
     bool bit6 = a & (1 << 5); bool bit17 = a & (1 << 16); 
    if (bit6 == bit17) return a; //bits are the same, do nothing
     return (a ^ (1 << 5) ^ (1 << 16)); 
     // flip both 6th and 17th bits }

37ef_ced34y ago

And, to be clear, this is a human solution.

Not as efficient as mine, but kudos.

1 more reply

altcognito4y ago

37ef_ced34y ago

You can see it happening already.

Solutions are posted, and they're wrong.

But the CoPilot user can't see the code is wrong.

dskloet4y ago

There's really no need for an 11 in the code. I'd say that makes the code worse, not better.

37ef_ced34y ago

This is a toy problem to illustrate that CoPilot cannot write code that requires mathematical reasoning. It regurgitates solutions from the training set, via a mixed internal reresentation.

2 more replies

doctor_eval4y ago· 10 in thread

I sometimes read these and wonder if I need to retrain. At my age, I’ll struggle to get a job at a similar level in a new industry.

And then I remember that the thing I bring to the table is the ability to turn domain knowledge into code.

It’s going to take a sufficiently long time for AI to take over management that I don’t think oldies like me need to worry too much.

atleta4y ago

GuB-424y ago

I think programmers are relatively "safe" from AI for the simple reason they are the ones who talk to AI.

5 more replies

zzt1234y ago

I sometimes get the feeling that all my coding is actually a class of mathematical transforms that I have no idea how to define but feel very strongly that it is definable and AI-able.

nonameiguess4y ago

wantsanagent4y ago

Can't say if I write better code with Copilot but it's worth experiencing!

OOPMan4y ago

I've been playing with Copilot as well.

It's very good at handling boilerplate and making contextual suggestions.

I don't see it eating my cake, but it's definitely a very useful tool for saving time.

iab4y ago

I think a good yardstick for this is something that is generative; so, for instance, can the system generate a good programming challenge question? This is still a no.

nopinsight4y ago

Although most good developers will likely keep their jobs for the foreseeable future, the relative importance, and payoff, of different skills might change.

Thus, a hedge against possible job loss is still required. Owning substantial equity in a company/startup and other assets would be one good strategy.

[1] https://en.wikipedia.org/wiki/Complementary_good

jsiaajdsdaa4y ago

AI might take over management quicker than you think. If the objective is to get the rocket into space, AI might know the requirements better than humans at this rate.

nobody04y ago

algon334y ago· 7 in thread

How suprising did you guys find this? I'd have said there was a 20% chance of this performing at the median+level if I was asked to predict things beforehand.

machiaweliczny4y ago

I am surprised, as recently OpenAI had ~25% of easy problems and ~2% in competitive problems. Seems like DeepMind is ahead in this topic as well.

Actually I think Meta AI had some interesting discovery recently that could possibly improve NNs in genral, so probably this as well.

I am not in field but wonder if some other approaches like Tsetlin machines would be more useful for programming.

algon334y ago

Somehow I have never heard of Tsetlin machines before this. Are you talking about this https://ai.facebook.com/blog/the-first-high-performance-self... result by MetaAI?

1 more reply

hackinthebochs4y ago

I didn't find it very surprising, but then I tend to be more optimistic than average about the capabilities of transformer models and the prospect of general AI in the relatively near term.

marcusbuffett4y ago

I would have guessed around the same chance, this was surprising to me after playing around with copilot and not being impressed at all.

baobabKoodaa4y ago

I would have said there is a ~0% chance of this happening within our lifetimes.

Isinlor4y ago

There is a prediction market called Metaculus.

TL;DR In 2020 community of 169 people and the best forecasters were assigning ~15% that it will happen by July 2021.

More specifically, on Dec 31, 2016 in partnership with Center for the Study of Existential Risk, Machine Intelligence Research Institute, and The Future of Life Institute they asked:

How long until a machine-learning system can take a simple text description and turn it into a program coded in C/Python?

https://www.metaculus.com/questions/405/when-will-programs-w...

The question resolved on July 2021 when Codex was published.

Community and the best forecasters were assigning ~15% that it will happen by July 2021.

I'm currently 14th best forecaster there and I was predicting 33% before July 2021. It was my last prediction, and it was made on October 2018.

I'm also predicting 75% that we will have AGI by 2040 as defined in this question:

https://www.metaculus.com/questions/3479/when-will-the-first...

20% that it will happen before 2030.

There is also stronger operationalization:

https://www.metaculus.com/questions/5121/when-will-the-first...

My prediction here is 60% before 2040 and 5% before 2030.

I have also "canary in the coal mine" questions:

When will AI achieve competency on multi-choice questions across diverse fields of expertise? Community predicts 50% before 2030, I agree.

https://www.metaculus.com/questions/5276/ai-competence-in-di...

When will AI be able to learn to play Montezuma's Revenge in less than 30 min? Community predicts 50% before 2025, I think 50% before 2027.

https://www.metaculus.com/questions/5460/ai-rapidly-learning...

algon334y ago

For some reason I forgot to check metaculus for this. Thanks for the reminder.

mirrorlake4y ago· 5 in thread

I've been wondering this for a while:

Is this a naive fantasy on my part, or actually possible?

qayxc4y ago

> Is this a naive fantasy on my part, or actually possible?

Possible, yes, desirable, no.

The issue I have with all these end-to-end models is that they're a massive regression. Practitioners fought tooth and nails to get programmers to acknowledge correctness and security aspects.

EVa5I7bHFq9mnYK4y ago

It seems to me that writing an exhausting set of unit cases is harder than writing the actual code.

aduitsis4y ago

Otherwise the AI will just over-fit the unit test case subset.

machiaweliczny4y ago

First you need really good infra to make it easy to test working multiple solutions for AI but I think this will be bleeding edge in 2030.

EDIT: with in-memory DBs I can imagine AI assisted mainframe than can solve 90% of business problems.

phreeza4y ago

And a second AI to generate additional test cases similar to yours (which you accept as also in scope) to avoid the first AI gaming the test.

jonas_kgomo4y ago· 5 in thread

Genuine question, what are the reasons to be a software engineer without much ML knowledge in 2022. Seems like a wake up call for developers

jonas_kgomo4y ago

https://news.ycombinator.com/item?id=27676266&p=2

eulers_secret4y ago

> what are the reasons to be a software engineer without much ML knowledge in 2022.

I'm not quite sure what you're asking, but my reason is that I do not enjoy working on/with ML. I'd personally rather quit the industry.

But I work in embedded/driver development. I do not worry about ML models replacing me yet, but if I were just gluing together API calls I would be a bit worried and try to specialize.

slingnow4y ago

Genuine question: what are the reasons to be a carpenter without much robotics / automation knowledge in 2022. Seems like a wakeup call for carpenters.

qualudeheart4y ago

Find something that’s hard and interesting. Someone will probably have a business trying to solve it and will hire you.

0xdeadbeefbabe4y ago

I hope you are right, but just to answer the question: all those other AI winters.

1 more reply

gfd4y ago· 4 in thread

Relevant blogpost on codeforces.com (the competitive programming site used): https://codeforces.com/blog/entry/99566

baobabKoodaa4y ago

To clarify, this is a HUGE leap in AI and computing in general. I don't mean to play it down.

YeGoblynQueenne4y ago

>> To clarify, this is a HUGE leap in AI and computing in general. I don't mean to play it down.

Sorry, but it's nothing of the sort. The approach is primitive, obsolete, and its results are very poor.

To make an analogy, it's as if DeepMind had just published an article boasting of its invention of a new sorting algorithm... bubblesort.

3 more replies

gfd4y ago

You can find the rating distribution filtered for >5 contests here: https://codeforces.com/blog/entry/71260

4 more replies

shihab4y ago

For comparison, I used to be a very average, but pretty regular user about 5 years ago. I could reliably solve easiest 2 out of 5 problems, 3 in my lucky days.

My rating is 1562.

jakey_bakey4y ago· 4 in thread

ogogmad4y ago

[edit] Is "10 recent contests" a large enough sample size to prove whatever point is being made?

YeGoblynQueenne4y ago

The test against human contestants doesn't tell us anything because we have no objective measure of the ability of those human coders (they're just the median in some unknown distribution of skill).

So it's not very good.

Note also that the article above doesn't report the results on APPS. Because they're not that good.

solididiot4y ago

Does it need to solve original problems? Most of the code we write is dealing with the same problems in a slightly different context each time.

As others say in commends it might be the case where we meet in the middle. Us writing some form of tests for AI-produced code to pass.

qualudeheart4y ago

That’s been a common objection to Copilot and other recent program synthesis papers.

I think someone else in this thread even pointed put an example of AlphaCode doing the same thing.

ahgamut4y ago· 3 in thread

Until deep learning methods provide a measurement of "difficulty", it will be difficult to gauge the prowess of any new model that appears on the scene.

pedrosorio4y ago

> Suppose AlphaCode was trained on Github code that contains the entire set of solutions on Codeforces, is it actually doing anything "difficult"?

They tested it on problems from recent contests. The implication being: the statements and solutions to these problems were not available when the Github training set was collected.

> I don't believe it would be difficult for a human to solve problems on Codeforces when given access to the entirety of Github (indexed and efficiently searchable).

And yet, many humans who participate in these contests are unable to do so (although I guess the issue here is that Github is not properly indexed and searchable for humans?).

[0] https://storage.googleapis.com/deepmind-media/AlphaCode/comp...

ahgamut4y ago

> They tested it on problems from recent contests. The implication being: the statements and solutions to these problems were not available when the Github training set was collected.

> And yet, many humans who participate in these contests are unable to do so (although I guess the issue here is that Github is not properly indexed and searchable for humans?).

Indeed, so we don't know what "difficult" means for <human+indexed Github>, and hence we cannot compare it to <model trained on Github>.

Otherwise the datasets and models just keep getting larger, and we have no idea of the full capability of these models.

1 more reply

usrbinbash4y ago

> The implication being: the statements and solutions to these problems were not available when the Github training set was collected.

d0mine4y ago· 3 in thread

It reminds me that median reputation on StackOverflow is 1. All AlphaSO would have to do is to register to receive median reputation on SO ;) (kidding aside AlphaCode sounds like magic)

AI won't replace programmers until it grows to replace the humanity as a whole.

falcor844y ago

>AI won't replace programmers until it grows to replace the humanity as a whole.

Yes, but after seeing this progress in the former, my time estimate of time remaining until the latter had just significantly shortened.

d0mine4y ago

There is a progress in certain domains (such as image recognition) but (outside specialized tasks) gigantic language models look like no more than impressive BS generators.

qualudeheart4y ago

Elsewhere ITT I’ve claimed that to fully automate programming you also need a model of the external world that’s on par with a humans.

Otherwise you can’t work a job because you don’t know how to do the many other tasks that aren’t coding.

You need to understand what the business goals are and how your program solves them.

BoardsOfCanada4y ago· 3 in thread

Do I understand it correctly that it generated (in the end) ten solutions that then were examined by humans and one picked? Still absolutely amazing though.

thomasahle4y ago

No human examination was done.

But it generated 10 solutions which it ran against the example inputs, and picked the one that passed.

Actually I'm not sure if it ran the solutions against the example inputs or the real inputs.

aliceryhl4y ago

They used the real inputs. The example inputs were used to filter out which candidates to submit for the 10 tries.

aliceryhl4y ago

No, they gave the algorithm 10 tries and tested all of them, and said that it was solved if any one of them worked.

blt4y ago· 2 in thread

I think many people are uncomfortable with the idea that their own "intelligent" behavior is not that different from pattern recognition.

I do not enjoy running deep learning experiments. Doing resource-hungry empirical work is not why I got into CS. But I still believe it is very powerful.

qayxc4y ago

This scepticism shouldn't surprise you. Not being sceptical is just an indicator that you've not been in the field for long enough.

30 years ago, the end of programming was prophesised, because 5th generation languages (5GL) and visual programming would enable everybody to design and build software.

10 years ago it was new programming languages (e.g. Rust, Go, Swift, ...) and a shift to functional programming that was advertised as being "the future".

Today it's back to "no code", e.g. tool-(AI-)driven development that's all the rage.

Programming is a conversation between humans and machines. AI will in many cases shift the conversation closer to the human side, but fundamentally it'll still be the same thing.

I like to think of it as the difference between writing your program in assembly and writing it in Haskell; different approaches, same basic activity.

hnfong4y ago

I think you and GP are talking about different things.

1 more reply

rabbits774y ago· 2 in thread

What I always find missing from these Deep Learning showcase examples are an honest comparison to existing work. It isn’t like computers haven’t been able to generate code before.

Maybe the novelty here is working from the English language specification, but I am dubious just how useful that really is. Specifications are themselves hard to write well too.

And what if the “specification” was some Lisp code testing a certain goal, is this any better then existing Genetic Programming?

Maybe it is better but in my mind it is kind of suspicious that no comparison is made.

I love Deep Learning but nobody does the field any favors by over promising and exaggerating results.

thephyber4y ago

I have fiddled with genetic programming. I don't think there is a good solution for a useful metric for comparing one code generator against another, so I don't think DeepMind should care.

The time it takes to run varies vastly with parameters (like population, how the mutation function works, how the fitness function weights/scores, etc).

1 more reply

wantsanagent4y ago

Specs are hard to write because the person interpreting them may not understand you, but it's the iteration time and cost that kills you.

Make me a sandwich -> two weeks and $10k isn't viable

Make me a sandwich -> 2 seconds and free, totally viable

pretendscholar4y ago· 2 in thread

Permit4y ago

Can you elaborate and give some history? What code did you contribute, and how did it end up being used by Microsoft and then DeepMind?

2 more replies

visarga4y ago

Paying it forward, it will help others in turn.

1 more reply

EGreg4y ago· 1 in thread

And yet, I am starting to see (with GitHub’s Copilot, and now this) a sort of “GPT-4 for code”. I do see many problems with this, including:

1. It doesn’t actually “invent” solutions on its own like AlphaZero, it just uses and remixes from a huge body of work that humans put together,

This is a bit like readyplayer.me trying to find the closest combination of noses and lips to match a photo (do you know any open source alternatives to that site btw?)

But this isn’t really “solving” anything in an imperative language.

qualudeheart4y ago

Can you write more about how Cyc would help? The idea behind Cyc is cool but I don’t think I’ve seen anyone discuss using it for program synthesis.

prideout4y ago· 1 in thread

I guess this makes sense though, from a practical point of view. Verifying correctness would be difficult in other intellectual disciplines like physics and higher mathematics.

thomasahle4y ago

Just make it output a proof together with the program.

1 more reply

udev4y ago· 1 in thread

I am thinking whether this result can create a type of loop that can self-optimize.

We have AI to generate reasonable code from text problem description.

Now what if the problem description text is to generate such a system in the first place?

Would it be possible to close the loop, so to speak, so that over many iterations:

- text description is improved

- output code is improved

Would it be possible to create something that converges to something better?

machiaweliczny4y ago

2 more replies

throwaway57524y ago· 1 in thread

countvonbalzac4y ago

what?

xibalba4y ago· 1 in thread

evouga4y ago

1 more reply

mcast4y ago· 1 in thread

jakey_bakey4y ago

I think Amazon already tried this and it had surprisingly racist results

wilde4y ago· 1 in thread

Oh sweet! When can skip the bullshit puzzle phone screens?

errcorrectcode4y ago

Ali Group CAPTCHA's or Android unlock?

NicoJuicy4y ago· 1 in thread

I would stop programming if all we needed to write was unit tests :p

FartyMcFarter4y ago

1 more reply

ensan4y ago· 1 in thread

Wake me up when an AI creates an operating system on the same level of functionality as early-years Linux.

errcorrectcode4y ago

That will happen faster than you can conceive because you won't be aware of the progress until it is announced.

agentultra4y ago

londons_explore4y ago

> AlphaCode placed at about the level of the median competitor,

In many programming contests, a large number of people can't solve the problem at all, and drop out without submitting anything. Frequently that means the median scoring solution is a blank file.

Therefore, without further information, this statement shouldn't be taken to be as impressive as it sounds.

aidenn04y ago

> Creating solutions to unforeseen problems is second nature in human intelligence

If this is true then a lot of the people I know lack human intelligence...

mwattsun4y ago

Seems to me that this accelerates the trend towards a more declarative style of programming where you tell the computer what you want to do, not how to do it

erwincoumans4y ago

mrsuprawsm4y ago

Does this mean that we can all stop grinding leetcode now?

timetotea4y ago

If you want some video explanation https://youtu.be/Qr_PCqxznB0

errcorrectcode4y ago

And this is how we reach the technological singularity and how programmers become as equivalently out-of-demand as piano tuners: self-programming systems.

knowmad4y ago

*Note: as a human software engineer I am biased

thomasahle4y ago

Next they can train it on kaggle, and we'll start getting closer to the singularity

tasubotadas4y ago

I just hope that this shows how useless competitive programming is that it can be replace by the Transformer-model.

Additionally, people should REALLY rething their coding interviews if they can be solved by a program.

1 more reply

derelicto4y ago

Hey, honest question: how does one get into competitive programming? I imagine it goes far beyond just leetcoding but honestly i don't even know where to start.

a-dub4y ago

> In our preprint, we detail AlphaCode, which uses transformer-based language models to generate code at an unprecedented scale, and then smartly filters to a small set of promising programs

if you're using a large corpus of code chunks from working programs as symbols in your alphabet, i wonder how much entropy there actually is in the space of syntactically correct solution candidates.

deepbream4y ago

This result is well worth a meme.

https://opensea.io/assets/0x495f947276749ce646f68ac8c2484200...

nsikorr4y ago

dantodor4y ago

Great. Now the only thing remaining is POs being able to come with a clear spec and I'm out of job

thorwwaskeas4y ago

Since they used the tests this is not something you can do if you don't have a rich battery of tests.

Perhaps many problems are something like finite automata and the program discover the structure of the finite automata and also an algorithm for better performance.

YeGoblynQueenne4y ago

>> AlphaCode ranked within the top 54% in real-world programming competitions, an advancement that demonstrates the potential of deep learning models for tasks that require critical thinking.

Critical thinking? Oh, wow. That sounds amazing!

Let's read further on...

Ah. That doesn't sound like "critical thinking", or any thinking. It sounds like massive brute-force guessing.

Overall, pretty underwhelming and a bit surpirsing to see such lackluster results from DeepMind.

softwaredoug4y ago

BUT, our jobs have a lot more complexity

- Local constraints - We almost always work in a large, complex existing code base with specific constraints

- Correctness is hard - writing lots of code is usually not the hard part, it's proving it correct against amorphous requirements, communicated in a variety of human social contexts, and bookmarked.

- Precision is extremely important - Even if 99% of the time, CoPilot can spit out a correct solution, the 1% of the time it doesn't creates a bevy of problems

Are those insurmountable problems? We'll see I suppose, but we begin to verge on general AI if we can gather and understand half a dozen modalities of social context to build a correct solution.

Not to mention much of the skill needed in our jobs has much more to do with soft skills, and the bridge between the technical and the non technical, and less to do with hardcore heads-down coding.

Exciting times!

jdrc4y ago

I think it would be interesting the train a system end-to-end with assembly code instead of various programming languages. This would make it a much more generic compiler

alasdair_4y ago

The interesting stuff happens once AlphaCode gets used to improve the code of AlphaCode.

jdrc4y ago

"And so in 2022 the species programmus programmicus went extinct"

pedrobtz4y ago

What about finding bugs, zero-day exploits?

zmmmmm4y ago

Has nobody yet asked it to write itself?

j / k navigate · click thread line to collapse