All these approaches just seem like brute-force approaches: Let's just throw our transformer on this problem and see if we can get anything useful out of this.
Whatever it is, you can't deny that these unsupervised models learn some semantic representations, but we have no clue at all what that actually is and how these model learn that. But I'm also very sceptical that you can actually get anywhere close to human (expert) capability in any sufficiently complex domain by using this approach.
And next year they can filter out 99.99%. And the year after that, 99.9999%. So literally, an exponentially greater number of monkey/typewriting units. (An AI produced Shakespeare play coming soon).
>> we have no clue at all what that actually is and how these model learn
This is why I'm super cool-to-cold about the AI/deep learning classes being sold to young people who would otherwise be learning fundamental programming skills. It appears to me like trying to teach someone to ride a horse before they understand what skin, bones, muscles, animals, and horses are.
>>get anywhere close to human (expert) capability in any sufficiently complex domain
You can get close enough to scalp a lot of billionaires, but at the end of the day it's always going to be human coders banging our heads against management, where they ask for shit they can't visualize and it's our job to visualize how their employees/customers will use it. Yes it involves domain specific knowledge, but it also requires, er, having eyeballs and fingers, and understanding how a biological organism uses a silicon-based device. That's kind of the ultimate DS knowledge, after all. Now, lots of coders just copy-pasta a front end, but after all the hooplah here I'd be extremely surprised if in ten years an AI has caught up to your basic web mill in Indonesia when it comes to building a decent website.
To be fair, a lot of creative work requires plenty of trial and error. And since no problems are solved from scratch, all things considered, the most immediate contributors to your result and you might have iterated through tens of dozens of possibilities.
My advantage as a human is I can often tell you why I am eliminating this branch of the search space. The catch is my reasoning can be flawed. But we do ok.
> just copying previous solutions with slight adjustments.
It's not just doing that, Copilot can do a workable job providing suggestions for an invented DSL. A better analogy than autocomplete is inpainting missing or corrupted details based on a surrounding context. Except instead of a painting we are probabilistically filling in patterns common in solutions to leetcode style problems. Novelty beyond slight adjustments comes in when constraints are insufficient to pin down a problem to a known combination of concepts. The intelligence of the model is then how appropriate its best guesses are.
The limitations to GPT3 codex and AlphaCode seems to be they're relatively weak at selection and that they require problem spaces with enough data to distill a sketch of and how to inpaint well in them. Leetcode style puzzles are constructed to be soluble in a reasonable number of lines, are not open ended and have a trick to them. One can complain that while we're closer to real world utility, we're still restricted to the closed worlds of verbose apis, games and puzzles.
While lots of commenters seem concerned about jobs, I look forward to having the dataset oliphaunt and ship computer from Fire Upon Deep someday soon.
I also think generally in ML and DL the overarching progress gets hyped but in the background there are murmurs about the limitations in the research community. Thats how we end up with people in 2012 saying FSD is a couple years away but in 2022 we know we aren't even close yet. We tend to oversell how capable these systems are.
Yes, it's the size of the search space for each problem. The search space for arbitrary programs in a language with Universal Turing Machine expressivity is infinite. Even worse, for any programming problem there are an infinite number of candidate programs that may or may not solve it and that differ in only minute ways from each other.
For Go and protein structure prediction from sequences the search space is finite, although obviously not small. So there is a huge difference in the complexity of the problems right there.
Btw, I note yet again that AlphaCode performs abysmally badly on the formal benchmark included in the arxiv preprint (see Section 5.4, and table 10). That makes sense because AlphaCode is a very dumb generate-and-test, brute-force search approach that doesn't even try to be smart and tries to make up for the lack of intelligence with an awesome amount of computational resources. Most work in program synthesis is also basically a search through the space of programs, but people in the field have come up with sophisticated techniques to avoid having to search an infinite number of programs- and to avoid having to generate millions of program candidates, like DeepMind actually brags about:
At evaluation time, we create a massive amount of C++ and Python programs for each problem, orders of magnitude larger than previous work.
They say that as if generating "orders of magnitude more" progams than previous work is a good thing, but it's not. It means their system is extremely bad at generating correct programs. It is orders of magnitude worse than earlier systems, in fact.
(The arxiv paper linked from the article quantifies this "massive" amount as "millions"; see Section 4.4).
It is clear writing code will soon be something of the past; maybe it is a bad idea to train our children to code. Let's make sure we milk every penny before the party is over!
I say maybe because so far the code that Copilot has generated for me has been impressive for what it is, but riddled with obvious and subtle bugs. It’s like outsourcing my function implementations to a C-student undergraduate intern. I definitely wouldn’t use any of its code without close scrutiny.
AI will make some software engineering tasks more efficient and more accessible but human programmers are not going anywhere any time this side of the Singularity.
And then I remember that the thing I bring to the table is the ability to turn domain knowledge into code.
Being able to do competitive coding challenges is impressive, but a very large segment of software engineering is about eliciting what the squishy humans in management actually want, putting it into code, and discovering as quickly as possible that it’s not what they really wanted after all.
It’s going to take a sufficiently long time for AI to take over management that I don’t think oldies like me need to worry too much.
Now completely I agree with you that a significant part of our job is understanding and structuring the problem, but I'm not sure it can't be done in another way. We usually get taking in when we think about what machines will be able to do by thinking that just because we use intelligence (general/human intelligence) to solve the task it means that it's a requirement. Think chess. Or even calculating (as in, with numbers). Or go. Etc.
The funny thing is that we don't know, until someone does it. I've been thinking for a while that a lot of what I do could be done by a chat bot. Asking clarification questions. Of course, I do have a lot of background knowledge and that's how I can come up with those questions, but that knowledge is probably easy to acquire from the internet and then use it as training data. (Just like we have an awful lot of code available, we have a lot of problem descriptions, questions, comments and some requirement specifications/user guides.)
The hard part would probably be not what we have learned as a software developer, but the things we have learned while we were small kids and also the things that we have learned since, on the side. I.e. being a reasonable person. Understanding what people usually do and want. So the shared context. But I'm not sure it's needed that much.
So yeah, I can imagine a service that will talk to a user about what kind of app they want (first just simpler web sites, web shops, later more and more complicated ones) and then just show them "here is what it does and how it works". And then you can say what you'd like to be changed. The color or placement of a button (earlier versions) or even the association type between entities (oh, but a user can have multiple shipping addresses).
The job of programmers is to have machines do stuff so that humans don't have to, and of course, they do it for themselves too. Scripts, libraries, compilers, they are just tools to avoid flipping bits by hand. If something like copilot is not embraced by all programmers, it is that it is often less than helpful, and even then, some have adopted it. If we have super-advanced AI that can have a high level understanding of a problem and writes the app for you, then it is not much more than a super-compiler, and there will be programmers who will tell the super-compiler what to do, think of it as a new, super high level programming language. The job will evolve, but there will always be someone who tells the computer what to do.
And if there is no one needed to tell the computer what to do, that's what some people call "the singularity". Programming, or its evolution will probably be the last technical job. Social jobs may continue further, simply because humans like humans because they are human. Maybe the oldest profession will also be the last profession.
Well it’d a curious day when an AlphaGo moment hits coding. Would be funny if it happened at the same time as Fed rate increases and destabilizing world events this year (the path from median human to top human is shallow). Mass firing of a few million highly paid redundancies out of the blue? Would be quite a sight.
Or maybe it wouldn’t happen that way, but rather it would pave the way for a leaner set of startups that were built with the power to do the same thing at the same or better velocity with an order of magnitude or fewer people.
Most surprisingly I can quickly tackle domains that require libraries I don't know because a combination of code generation and IDE hinting means I can write comments and pseudo code and the tool then provides at least a first pass best method to use.
Can't say if I write better code with Copilot but it's worth experiencing!
It's very good at handling boilerplate and making contextual suggestions.
I don't see it eating my cake, but it's definitely a very useful tool for saving time.
Lower-level coding could become more and more automated, raising the values and wages of complementary skills such as requirements elicitation and understanding of business impact from technological decisions. [1]
Some of these, however, can be done by businesspeople who know how to think and express their ideas precisely, such that a neural model can turn them into a decent draft of code. (These days, many more youths learn to code before going into other fields. They have training for thinking precisely.) There can be fewer job opportunities for some groups of developers.
Thus, a hedge against possible job loss is still required. Owning substantial equity in a company/startup and other assets would be one good strategy.
- a very well defined problem. (One of the things I like about competitive programming and the like is just getting to implement a clearly articulated problem, not something I experience on most days.) - existing test data.
This is definitely a great accomplishment, but I think those two features of competitive programming are notably different than my experience of daily programming. I don’t mean to suggest these will always be limitations of this kind of technology, though.
There's also the open problem of verifying correctness in solutions and providing some sort of flag when the model is not confident in its correctness. I give it another 5 years in the optimistic case before AlphaCode can reliably compete at the top 1% level.
That's wildly overstating the promise of this technology, and I'd be very surprised if the authors of this wouldn't agree.
I have a suspicion it would - kinda like Stack Overflow, problems/solutions are not that different "in the small". It'd have almost certainly given us the fast square root trick verbatim, like Github's AI is doing routinely.
(Side note: I find that many people skip this step, and go straight from fuzzy-requirement-only-discussed-on-zoom-with-Bob to code; open a pull request without much context or comments; and then a code reviewer is supposed to review it properly without really knowing what problem is actually being solved, and whether the code is solving a proper problem at all).
Fuzzy business requirements -> programmer specifies and writes tests -> AI codes
English versions of Codeforces problems may be well-defined but they are often very badly articulated and easy to misunderstand as a human reader. I still can't understand how they got AI to be able to generate plausible solutions from these problem statements.
Software is, ultimately, always about humans. Software is always there to serve a human need. And the "intelligence" that designs software will always, at some level, need to be intelligence that understands the human mind, with all it's knowledge, needs, and intricacies. There are no shortcuts to this.
So, I think AI as a replacement for software development professionals, that's currently more like a pipe dream. I think AI will give us powerful new tools, but I do not think it will replace, or even reduce, the need for software development professionals. In total it might even increase the need for software development professionals, because it adds another level to the development stack. Another level of abstraction, and another level of complexity that needs to be understood.
Having used Copilot I can assure you that this technology won't replace you as a programmer but it will make your job easier by doing things that programmers don't like to do as much like writing tests and comments.
It appears to me that when it comes to language models, intelligence = experience * context. Where experience is the amount what's encoded in the model, and context is the prompt. And the biggest limitation on Copilot currently is context. It behaves as an "advanced autocomplete" because it all is has to go on is what regular autocomplete sees, e.g. the last few characters and lines of code.
So, you can write a function name called createUserInDB() and it will attempt to complete it for you. But how does it know what DB technology you're using? Or what your user record looks like? It doesn't, and so you typically end up with a "generic" looking function using the most common DB tech and naming conventions for your language of choice.
But now imagine a future version of Copilot that is automatically provided with a lot more context. It also gets fed a list of your dependencies, from which it can derive which DB library you're using. It gets any locatable SQL schema file, so it can determine the columns in the user table. It gets the text of the Jira ticket, so it can determine the requirements.
As a programmer a great deal of time is spent checking these different sources and synthesising them in your head into an approach, which you then code. But they are all just text, of one form or another, and language models can work with them just as easily, and much faster, than you can.
And one the ML train coding gets running, it'll only get faster. Sooner or later Github will have a "Copilot bot" that can automatically make a stab at fixing issues, which you then approve, reject, or fix. And as thousands of these issues pile up, the training set will get bigger, and the model will get better. Sooner or later it'll be possible to create a repo, start filing issues, and rely on the bot to implement everything.
I didn't find reading largely correct but still often wrong code is a good experience for me, or it adds up any efficiency.
It does do a very good job in intelligently synthesize boilerplate for you, but be Copilot or this AlphaCode, they still don't understand the coding fundamentals, in the sense causatively, what would one instruction impact the space of states.
Still, those are exciting technology, but again, there is a big if whether such machine learning model would happen at all.
I see it continuing to evolve and becoming a far superior auto-complete with full context, but, short of actual general AI, there will always be a step that takes a high-level description of a problem and turns it into something a computer can implement.
So while it will make the remaining programmers MUCH more productive, thereby reducing the needed number of programmers, I can't see it driving that number to zero.
This sort of boilerplate code is best solved by the programming language. Either via better built-in syntax or macros. Using an advanced machine learning model to generate this code is both error-prone and a big source of noise and code bloat. This is not an issue that will go away with better tooling; it will only get worse.
anyway. programming is automation; automation of programming is abstraction. using AI to write your code is just a bad abstraction - we are used to them
Seriously though, I do doubt I can be fully replaced by a robot any time soon, it may be the case that soon enough I can make high-level written descriptions of programs and hand them off to an AI to do most of the work. This wouldn't completely replace me, but it could make developers 50x productive. The question is how elastic is the market...can the market grow in step with our increase in productivitiy?
Also, please remember that as with anything, within 5 years we should see vast improvements to this AI. I think it will be an important thing to watch.
I just hope LMs will prove to be just as useful in software development as they are in their own field.
More likely it will translate the abstraction level by some vector of 50 elements.
It does look like we've entered an era where programmers who don't use AI assistants will be disadvantaged, and that this era has an expiration date.
Apparently the bot would have a rating of 1300. Although the elo rating between sites is not comparable, for some perspective, mark zuckerberg had a rating of ~1k when he was in college on topcoder: https://www.topcoder.com/members/mzuckerberg
To clarify, this is a HUGE leap in AI and computing in general. I don't mean to play it down.
Sorry, but it's nothing of the sort. The approach is primitive, obsolete, and its results are very poor.
I've posted this three times already but the arxiv preprint includes an evaluation against a formal benchmark dataset, APPS. On that more objective measure of performance, the best performing variant of AlphaCode tested, solved 25% of the easiest tasks ("introductory") and less than 10% of the intermediary ("interview") and advanced ("competition") tasks.
What's more, the approach that AlphaCode takes to program generation is primitive. It generates millions of candidate programs and then it "filters" them by running them against input-output examples of the target programs taken from the problem descriptions. The filtering still leaves thousands of candidate programs (because there are very few I/O examples and the almost random generation can generate too many programs that pass the tests, but still don't solve the problem) so there's an additional step of clustering applied to pare this down to 10 programs that are finally submitted. Overall, that's a brute-force, almost random approach that is ignoring entire decades of program synthesis work.
To make an analogy, it's as if DeepMind had just published an article boasting of its invention of a new sorting algorithm... bubblesort.
I am rated at 2100+ so I do agree that 1300 rating is low. But at the same time it solved https://codeforces.com/contest/1553/problem/D which is rated at 1500 which was actually non-trivial for me already. I had one wrong submit before getting that problem correct and I do estimate that 50% of the regular competitors (and probably the vast majority of the programmers commenting in this thread right now) should not be able to solve it within 2hrs.
My rating is 1562.
IIUC, AlphaCode was trained on Github code to solve competitive programming challenges on Codeforces, some of which are "difficult for a human to do". Suppose AlphaCode was trained on Github code that contains the entire set of solutions on Codeforces, is it actually doing anything "difficult"? I don't believe it would be difficult for a human to solve problems on Codeforces when given access to the entirety of Github (indexed and efficiently searchable).
The general question I have been trying to understand is this: is the ML model doing something that we can quantify as "difficult to do (given this particular training set)"? I would like to compute a number that measures how difficult it is for a model to do task X given a large training set Y. If the X is part of the training set, the difficulty should be zero. If X is obtained only by combining elements in the training, maybe it is harder to do. My efforts to answer this question: https://arxiv.org/abs/2109.12075
In recent literature, the RETRO Transformer (https://arxiv.org/pdf/2112.04426.pdf) talks about "quantifying dataset leakage", which is related to what I mentioned in the above paragraph. If many training samples are also in the test set, what is the model actually learning?
Until deep learning methods provide a measurement of "difficulty", it will be difficult to gauge the prowess of any new model that appears on the scene.
They tested it on problems from recent contests. The implication being: the statements and solutions to these problems were not available when the Github training set was collected.
From the paper [0]: "Our pre-training dataset is based on a snapshot of selected public GitHub repositories taken on 2021/07/14" and "Following our GitHub pre-training dataset snapshot date, all training data in CodeContests was publicly released on or before 2021/07/14. Validation problems appeared between 2021/07/15 and 2021/09/20, and the test set contains problems published after 2021/09/21. This temporal split means that only information humans could have seen is available for training the model."
At the very least, even if some of these problems had been solved exactly before, you still need to go from "all of the code in Github" + "natural language description of the problem" to "picking the correct code snippet that solves the problem". Doesn't seem trivial to me.
> I don't believe it would be difficult for a human to solve problems on Codeforces when given access to the entirety of Github (indexed and efficiently searchable).
And yet, many humans who participate in these contests are unable to do so (although I guess the issue here is that Github is not properly indexed and searchable for humans?).
[0] https://storage.googleapis.com/deepmind-media/AlphaCode/comp...
Yes, and I would like to know how similar the dataset(s) were. Suppose the models were trained only on greedy algorithms and then I provided a dynamic programming problem in the test set, (how) would the model solve it?
> And yet, many humans who participate in these contests are unable to do so (although I guess the issue here is that Github is not properly indexed and searchable for humans?).
Indeed, so we don't know what "difficult" means for <human+indexed Github>, and hence we cannot compare it to <model trained on Github>.
My point is, whenever I see a new achievement of deep learning, I have no frame of reference (apart from my personal biases) of how "trivial" or "awesome" it is. I would like to have a quantity that measures this - I call it generalization difficulty.
Otherwise the datasets and models just keep getting larger, and we have no idea of the full capability of these models.
But similar ones were, because the amount of code puzzles suitable for such contests is finite. There are differences, but when you have seen a few string-compare problems, you have a pretty good grasp of what they look like, what common paths to solutions are, etc.
And yet, what a garbage solution it produces.
To illustrate the difference between intelligence and regurgitation, someone tell me what CoPilot generates for this:
// A Go function to swap the sixth bit and seventeenth bit of a 32-bit signed integer.
Here is a human solution: func swap(x int32) int32 {
const mask = 1 << 5
var (
xor1 = (x>>11 ^ x) & mask
xor2 = xor1 << 11
)
return x ^ xor1 ^ xor2
}
CoPilot cannot reason numerically like this (understand "seventeenth bit" and "sixth bit" and generate the right code for that combination). It needs to understand the size of the gap between the bits, i.e., 11, and that's too hard. # A function to swap the sixth bit and seventeenth bit of a 32-bit signed integer.
def swap_bits(x):
# Get the value of the sixth bit.
bit6 = x & (1 << 5)
# Get the value of the seventeenth bit.
bit17 = x & (1 << 16)
# Swap the values of the sixth and seventeenth bit.
bit6 = bit6 >> 5
bit17 = bit17 >> 16
# Combine the swapped values of the sixth and seventeenth bit.
x = x ^ (bit6 << 16)
x = x ^ (bit17 << 5)
return x
It actually synthesised about 10 viable solutions. Some interesting ones: def swap_six_seventeen(x):
# Get the binary representation of the integer.
binary = bin(x)[2:]
# Add zeros to the beginning of the binary representation.
binary = '0' * (32 - len(binary)) + binary
# Swap the sixth and seventeenth bit.
binary = binary[:5] + binary[17] + binary[5:17] + binary[18:]
# Convert the binary back to an integer.
return int(binary, 2) bin(swap_bits(0b_1_0000000000_0_00000))
'0b10000000000100000'
bin(swap_bits(0b_0_0000000000_1_00000))
'0b10000000000100000'
bin(swap_bits(0b_1_0000000000_1_00000))
'0b0'
bin(swap_bits(0b_0_0000000000_0_00000))
'0b0'
The second one converts the value to a string and uses string operations, which is wildly inefficient and a very common mistake made by inexperienced programmers unaware of bitwise operations (so presumably common in the training set). It also attempts to swap the 6th and 17th most significant bits rather than the 6th and 17th least significant bits, i.e. counts in the opposite direction to the first one (the comment doesn't specify but typically you count from the least significant bit in these situations).Worse, though, it gets the string manipulation completely wrong. I think it's trying for `binary[:5] + binary[16] + binary[6:16] + binary[5] + binary[17:]`, i.e. characters 1-5, then character 17, then characters 7-16, then character 6, then characters 18-32. The manipulation it does just completely mangles the string.
I'm very keen to try Github Copilot if they ever admit me to the beta (I've been waiting forever) and will adopt it enthusiastically if it's useful. However, this is exactly what I've pessimistically expected. Analysing these truly awful implementations to identify the subtle and bizarre misbehaviours has taken me far, far longer than it would have taken me to just write and test a working implementation myself. And I'm supposed to evaluate 10 of these to see if one of them might possibly do the right thing?!?!
package main
import (
"fmt"
"math"
)
func main() {
var a int32 = 0b1010101010101010101010101010101010101010101010101010101010101010
fmt.Printf("%b\n", a)
fmt.Printf("%b\n", swapBits(a, 6, 17))
}
func swapBits(a int32, i int, j int) int32 {
// convert to binary
bin := fmt.Sprintf("%b", a)
// get the bits
bit1 := bin[i-1 : i]
bit2 := bin[j-1 : j]
// swap the bits
bin = bin[:i-1] + bit2 + bin[i:]
bin = bin[:j-1] + bit1 + bin[j:]
// convert back to int
return int32(bin2int(bin))
}
func bin2int(bin string) int64 {
var sum int64
for i, v := range bin {
if v == '1' {
sum += int64(math.Pow(2, float64(len(bin)-i-1)))
}
}
return sum
}CoPilot is regurgitating some solution from its training set, the solution of an inept programmer who would manipulate bits via conversion to string... yikes.
unsigned int swapbits(unsigned int a) {
bool bit6 = a & (1 << 5); bool bit17 = a & (1 << 16);
if (bit6 == bit17) return a; //bits are the same, do nothing
return (a ^ (1 << 5) ^ (1 << 16));
// flip both 6th and 17th bits }Not as efficient as mine, but kudos.
Solutions are posted, and they're wrong.
But the CoPilot user can't see the code is wrong.
[edit] Is "10 recent contests" a large enough sample size to prove whatever point is being made?
There's more objective measures of performance, like a good, old-fashioned, benchmark dataset. For such an evaluation, see table 10 in the arxiv preprint (page 21 of the pdf), listing the results against the APPS dataset of programming tasks. The best performing variant of AlphaCode solves 25% of the simplest ("introductory") APPS tasks and less than 10% of the intermediary ("interview") and more advanced ones ("competition").
So it's not very good.
Note also that the article above doesn't report the results on APPS. Because they're not that good.
As others say in commends it might be the case where we meet in the middle. Us writing some form of tests for AI-produced code to pass.
The models regurgitate solutions to problems already encountered in the training set. This is very common with Leetcode problems and seems To still happen with harder competitive programming problems.
I think someone else in this thread even pointed put an example of AlphaCode doing the same thing.
It's the next step. Binary code < assembly < C < Python < AlphaCode
Historically its always been about abstracting and writing less code to do more.
I.e. as soon as it starts replacing humans, it will not have enough human generated training data, since all of programming will be done by models like himself.
Second, alphacode was specifically trained for competitive programming :
1. short programs. 2. Each program has 100's of human generated solutions.
However, commercial program are:
1. long. 2. Have no predefined answer or even correct answer. 3. Need to use/reuse a lot of legacy code.
As a natural born pessimist, I can't help but feel that by the time we get to that point we'll just keep blundering forward and adapting our world around the wild nonsense garbage code the model ends up producing in this scenario.
After all, that's basically what we've done with the entire web stack.
Let me know when the AI engine is able to do complex refactoring or adding features that keeps backwards compatibility, find a bug in a giant codebase by debugging a test case or write code that's performant but also maintainable.
And yet, despite the fact that we have programs to help calculate all the things, test code-required load-combinations, even run simulations and size individual components... it turns out that, it doesn't actually save that much work, and you still need an engineer to do most of it. And not just because of regulatory requirements. It's just, that's not the hard part. The hard part is assembling the components and specifications, specifying the correct loads based on location-specific circumstances, coming up with coherent and sensible design ideas, chasing down every possible creative nook and cranny of code to make something that was originally a mistake actually work, and know when the model is just wrong for some reason and the computer isn't simulating load paths accurately.
Specifying the inputs and interpreting results is still about as much work as it was before you started with all the fancy tools. Those tools still have advantages mind you, and they do make one slightly more efficient. Substantially so in some cases, but most of the time it still comes out as a slight assist rather than a major automation.
Machine Learning also has a long way to go before it can take a long, rambling mess of a meeting and somehow generate a halfway usable spec from it. I mean, the customer says they want X, but X is silly in this context, so we'll give them Y and tell them it's "X-like, but faster". For example, SQL is "Blockchain-like, but faster" for a lot of buzzword use-cases of blockchain.
But surely they'll never be able to do this new reference class you have just now come up with, right?
https://en.wikipedia.org/wiki/Algorithmic_program_debugging
Of course all this targeted only Prolog programs so it's not well-known at all.
True, but if you relax your hard requirements of optimality to admit "good enough" solutions, you can use heuristic approaches that are much more tractable. High quality heuristic solutions to NP-hard problems, enabled by ML, are going to be a big topic over the next decade, I think.
I disagree; I think the core of programming is analyzing things people want and expressing solutions to those wants clearly, unambiguously, and in a way that is easy to change in the future. I'd say algorithms and math are a very small part of this work.
Assuming ANNs resemble the way human brain function you'd also expect them to introduce bugs. And so the actual humans beings would partake in debugging too.
[1]: https://breandan.net/public/programming_with_intelligent_mac...
The programming languages of the future are going to make Rust look like Python. That’ll be in part because you as an individual programmer aren’t weighed down by as much boilerplate as you were pre-copilot, pre-alphacode and pre- the more advanced coding assistants of the future.
That's what code is.
In the future, code-writing AI could be tasked with generating the most reliable and/or optimized code to pass your unit tests. Human programmers will decide what we want the software to do, make sure that we find all the edge cases and define as many unit tests as possible, and let the AI write significant portions of the product. Not only that, but you could include benchmarks that pit AI against itself to improve runtime or memory performance. Programmers can spend more time thinking about what they want the final product to do, rather than getting mired in mundane details, and be guaranteed that portions of software will perform extremely well.
Is this a naive fantasy on my part, or actually possible?
Possible, yes, desirable, no.
The issue I have with all these end-to-end models is that they're a massive regression. Practitioners fought tooth and nails to get programmers to acknowledge correctness and security aspects.
Mathematicians and computer scientists developed theorem solvers to tackle the correctness part. Practitioners proposed methodologies like BDD and "Clean Code" to help with stability and reliability (in terms of actually matching requirements now and in the future).
AI systems throw all this out of the window by just throwing a black box onto the wall and scraping up whatever sticks. Unit tests will never be proof for correctness - they can only show the presence of errors, not their absence.
You'd only shift the burden from implementation (i.e. the program) to the tests. What you actually want is a theorem prover that proofs the functional correctness in conjunction with integration tests that demonstrate the runtime behaviour if need be (i.e. profiling) and references that link implementation to requirements.
The danger lies in the fact that we already have a hard time getting security issues and bugs under control with software that we should be able to understand (i.e. fellow humans wrote and designed it). Imagine trying to locate and fix a bug in software that was synthesised by some elaborate black box that emitted inscrutable code in absence of any documentation and without references to requirements.
EDIT: with in-memory DBs I can imagine AI assisted mainframe than can solve 90% of business problems.
Actually I think Meta AI had some interesting discovery recently that could possibly improve NNs in genral, so probably this as well.
I am not in field but wonder if some other approaches like Tsetlin machines would be more useful for programming.
TL;DR In 2020 community of 169 people and the best forecasters were assigning ~15% that it will happen by July 2021.
More specifically, on Dec 31, 2016 in partnership with Center for the Study of Existential Risk, Machine Intelligence Research Institute, and The Future of Life Institute they asked:
How long until a machine-learning system can take a simple text description and turn it into a program coded in C/Python?
https://www.metaculus.com/questions/405/when-will-programs-w...
First 19 forecasters in March 2017 were predicting mid-2021, the best forecasters were predicting late 2024. When the question closed in 2020 the community was predicting January 2027 and the best forecasters were predicting March 2030.
The question resolved on July 2021 when Codex was published.
Community and the best forecasters were assigning ~15% that it will happen by July 2021.
I'm currently 14th best forecaster there and I was predicting 33% before July 2021. It was my last prediction, and it was made on October 2018.
I'm also predicting 75% that we will have AGI by 2040 as defined in this question:
https://www.metaculus.com/questions/3479/when-will-the-first...
20% that it will happen before 2030.
There is also stronger operationalization:
https://www.metaculus.com/questions/5121/when-will-the-first...
My prediction here is 60% before 2040 and 5% before 2030.
I have also "canary in the coal mine" questions:
When will AI achieve competency on multi-choice questions across diverse fields of expertise? Community predicts 50% before 2030, I agree.
https://www.metaculus.com/questions/5276/ai-competence-in-di...
When will AI be able to learn to play Montezuma's Revenge in less than 30 min? Community predicts 50% before 2025, I think 50% before 2027.
https://www.metaculus.com/questions/5460/ai-rapidly-learning...
Deepmind or openAI will do it. If not them, it will be a Chinese research group on par with them.
I’ll be considering a new career. It will still be in computer science but it won’t be writing a lot of code. There’ll be several new career paths made possible by this technology as greater worker productivity makes possible greater specialization.
This viewpoint seems to me to be very similar to the idea of 3rd generation languages replacing developers because programming will be so easy, it isn't about how easy it is to write code, I function as a limited mentat taking all the possible requirements, tradeoffs constraints, analyzing them and then building the model, then I write out the code, the code artifact is not the value I add. The artifact is how I communicate the value to the world.
This doesn't make programmers redundant anymore than Ruby, PHP, or Java made developers redundant because it freed them from having to manually remember and track memory usage and pointers, it is at most a tool to reduce the friction of getting what is in my head into the world.
I control the code and whoever controls the code controls the business. I posses the ability to make out the strands of flow control and see the future state of the application. For I am the Sr. Software engineer and I have seen where no Project Manager can see.
Apologies to Frank Herbet I just finished listening to Dune.
EDIT:
I got off track at the end but my point is that no matter how good the tools for developing the code are, they will never replace a software engineer anymore than electric drills and power saws replace home builders. It merely elevates our work.
As humans we have a coherent world model that current AI systems are nowhere near close to having.
That coherent world model is a necessary precondition for both understanding a business goal and implementing a program to solve it. AlphaCode can do the second part but not the first.
AlphaCode doesn’t have that world model and even if it did it still wouldn’t autonomously act on it, just follow orders from humans.
Competitive programming is going to be solved much earlier than programming in a business context will, because it’s completely independent of business requirements. It’s at most half as hard of a problem .
Analyzing the requirements is a hard problem when we do it with our brain. But our job would be very different if all we had to do it to write down the constraints, and press a button to see an error: invalid requirements, can't support this and that at the same time.
> in 5 years will there be an AI that's better than 90% of unassisted working programmers at solving new leetcode-type coding interview questions posed in natural language?
and getting pooh-poohed. https://news.ycombinator.com/item?id=29020401 (And writing that, I felt nervous that it might not be aggressive enough.)
There's this general bias in discussions of AI these days, that people forget that the advance they're pooh-poohing was dismissed in the same way as probably way off in the indefinite future, surprisingly recently.
It will take a far-far more advanced AI to write such descriptions for real-world problems.
Writing requirements for a project is difficult work, and not for technical reasons, but for human reasons (people don't know what they want exactly, people have trouble imagining things they haven't seen yet, people are irrational, people might want something that is different from what they need, etc.)
In this regard, we are safe for a few more decades at least.
You need an agent with a large and coherent world model, in order to understand how your programs relate to the real world, in order to solve business tasks.
This isn’t something any program synthesis tech currently available can do, because none of it has a coherent world model.
GPT-3 comes closest to this, but isn’t able to engage in any kind of planning or abstract modeling, beyond semi coherent extrapolations from training data.
Maybe scaling up GPT by a few more orders of magnitude would work, by generating an emergent world model along the way.
If we become mechanics of the software AI vehicles of the future, so be it.
Programmers and data scientists might find ourselves among the first half of knowledge workers to be replaced and not among the last as we previously thought.
Essentially handling large language models.
Early prompt engineers will probably be drawn from “data science” communities and will be similarly high status, well but not as well paid, and require less mathematical knowledge.
I’m personally expecting an “Alignment Engineer” role monitoring AI systems for unwanted behavior.
This will be structurally similar to current cyber security roles but mostly recruited from Machine Learning communities, and embedded in a broader ML ecosystem.
Automating the software development profession proper is going to be much harder and will require autonomous agents with coherent world models, because that’s what you need to act in a business context.
To reach average level at codeforces you need to be able to apply a standard operation like a sort, or apply a standard math formula, as the first 1-2 problems in the easy contests are just that. It is impressive that they managed to get this result in real contests with real unaltered questions and see that it works. But generalizing this to harder problems isn't as easy, as there you need to start to device original algorithms instead of just applying standard algorithms, for such problems the model needs to understand computer science instead of just mapping language to algorithms.
I wouldn't be surprised if a specifically engineered system ten years from now wins an ICPC gold medal but I'm pretty sure that a general purpose specification -> code synthesizer that would actually threaten software engineering would require us to settle a lot of technical debts first -- especially in the area of verifying code/text generation using large language models.
Let's say AI only gets to 10% (or 20% or 30% or whatever, it doesn't really matter), that's a huge number of jobs being lost.
Imagine having a machine write all the "simple/boring" code for you. Your productivity will go through the roof. The smartest programmer who can most effectively leverage the machine could replace many hundreds of programmers.
I should brush up on my plumbing and apply for a plumbing license soon. (I think plumbing is safer than electricians, because many CS people have good EE foundations).
Can you list a few?
Inventing relational DBs hasn't replaced programmers, we just write custom DB engines less often. Inventing electronic spreadsheets hasn't deprecated programmers, it just means that we don't need programmers for corresponding tasks (where spreadsheets work well).
AI won't replace programmers until it grows to replace the humanity as a whole.
Yes, but after seeing this progress in the former, my time estimate of time remaining until the latter had just significantly shortened.
There is a progress in certain domains (such as image recognition) but (outside specialized tasks) gigantic language models look like no more than impressive BS generators.
Elsewhere ITT I’ve claimed that to fully automate programming you also need a model of the external world that’s on par with a humans.
Otherwise you can’t work a job because you don’t know how to do the many other tasks that aren’t coding.
You need to understand what the business goals are and how your program solves them.
In many programming contests, a large number of people can't solve the problem at all, and drop out without submitting anything. Frequently that means the median scoring solution is a blank file.
Therefore, without further information, this statement shouldn't be taken to be as impressive as it sounds.
If this is true then a lot of the people I know lack human intelligence...
I think many people are uncomfortable with the idea that their own "intelligent" behavior is not that different from pattern recognition.
I do not enjoy running deep learning experiments. Doing resource-hungry empirical work is not why I got into CS. But I still believe it is very powerful.
30 years ago, the end of programming was prophesised, because 5th generation languages (5GL) and visual programming would enable everybody to design and build software.
20 years ago, low-code and application builders were said to revolutionise the industry and allow people in business roles to build their applications using just a few clicks. End-to-end model-driven design and development (e.g. using Rational Rose and friends) were to put an end to bugs and maintenance problems.
10 years ago it was new programming languages (e.g. Rust, Go, Swift, ...) and a shift to functional programming that was advertised as being "the future".
Today it's back to "no code", e.g. tool-(AI-)driven development that's all the rage.
It's not so much being "uncomfortable" or clinging to the exceptionalism of the human mind. It's just experience. Every decade saw its great big hype and technological breakthrough, but the lofty promises didn't hold water.
Note that this doesn't mean nothing changed - model driven development still has its niche, visual programming is widely used in video production, rendering and game development. Features of functional programming have been added to many "legacy" languages and many of the newly introduced programming languages have become mainstream.
The same will happen with AI generated software. There a large portion of the "mechanical" process of programming will be done by AI. Large and complex software systems with changing requirements, however, will still be designed and implemented primarily by people.
Programming is a conversation between humans and machines. AI will in many cases shift the conversation closer to the human side, but fundamentally it'll still be the same thing.
I like to think of it as the difference between writing your program in assembly and writing it in Haskell; different approaches, same basic activity.
You're saying a lot of so-called technological breakthrough is more hype than substance. The GP is saying that people tend to dismiss actual breakthroughs as mundane stuff. Once $method is published that solves $hardproblem, people comment as if $hardproblem was never hard in the first place, and moves the goalposts a bit saying "if $harderproblem can be solved, then that would be profound".
I think the truth is (obviously) somewhere in between. Btw, I dare you go back to a 1980s programming environment and tell me that the programming paradigm shifts are just hype :D My one-liner python scripts can probably do much more than an average coder writing assembly... and given modern hardware my code runs faster too!
But it generated 10 solutions which it ran against the example inputs, and picked the one that passed.
Actually I'm not sure if it ran the solutions against the example inputs or the real inputs.
Maybe the novelty here is working from the English language specification, but I am dubious just how useful that really is. Specifications are themselves hard to write well too.
And what if the “specification” was some Lisp code testing a certain goal, is this any better then existing Genetic Programming?
Maybe it is better but in my mind it is kind of suspicious that no comparison is made.
I love Deep Learning but nobody does the field any favors by over promising and exaggerating results.
Most of the genetic programming results code generated by my algos doesn't compile. Very occasionally the random conditions exist to allow it to jump over a "local maxima" and come up with a useful candidate source code. Sometimes the candidates compile, run, and produce correct results.
The time it takes to run varies vastly with parameters (like population, how the mutation function works, how the fitness function weights/scores, etc).
Personally I really like that these DeepMind announcements don't get lost in performance comparisons, because inevitably those would get bogged down in complaints like "the other thing wasn't tuned as well as this one was". Let 3rd party researchers who have access to both do that work, independently.
Make me a sandwich -> two weeks and $10k isn't viable
Make me a sandwich -> 2 seconds and free, totally viable
And yet, I am starting to see (with GitHub’s Copilot, and now this) a sort of “GPT-4 for code”. I do see many problems with this, including:
1. It doesn’t actually “invent” solutions on its own like AlphaZero, it just uses and remixes from a huge body of work that humans put together,
2. It isn’t really ever sure if it solved the problem, unless it can run against a well-defined test suite, because it could have subtle problems in both the test suite and the solution if it generated both
This is a bit like readyplayer.me trying to find the closest combination of noses and lips to match a photo (do you know any open source alternatives to that site btw?)
But this isn’t really “solving” anything in an imperative language.
Then again, perhaps human logic is just an approaching with operations using low-dimensional vectors, able to capture simple “explainable” models while the AI classifiers and adversarial training produces far bigger vectors that help model the “messiness” of the real world and also find simpler patterns as a side effect.
In this case, maybe our goal shouldn’t be to get solutions in the form of imperative language or logic, but rather unleash the computer on “fuzzy” inputs and outputs where things are “mostly correct 99.999% of the time”. The only areas where this could fail is when some intelligent adversarial network exploits weaknesses in that 0.001% and makes it more common. But for natural phenomena it should be good enough !
AI will eat any and all knowledge work because there's very little special a human can do that a machine won't be able to do eventually, and much faster and better. It won't be tomorrow, but the sands are inevitably shifting this way.
I guess this makes sense though, from a practical point of view. Verifying correctness would be difficult in other intellectual disciplines like physics and higher mathematics.
We have AI to generate reasonable code from text problem description.
Now what if the problem description text is to generate such a system in the first place?
Would it be possible to close the loop, so to speak, so that over many iterations:
- text description is improved
- output code is improved
Would it be possible to create something that converges to something better?
I would really like to see more effort in the AI/ML code generation space being put into things like code review, and system observation. It seems significantly more useful to use these tools to augment human software engineers rather than trying to tackle the daunting and improbable task of completely replacing them.
*Note: as a human software engineer I am biased
Additionally, people should REALLY rething their coding interviews if they can be solved by a program.
if you're using a large corpus of code chunks from working programs as symbols in your alphabet, i wonder how much entropy there actually is in the space of syntactically correct solution candidates.
https://opensea.io/assets/0x495f947276749ce646f68ac8c2484200...
Perhaps many problems are something like finite automata and the program discover the structure of the finite automata and also an algorithm for better performance.
Critical thinking? Oh, wow. That sounds amazing!
Let's read further on...
>> At evaluation time, we create a massive amount of C++ and Python programs for each problem, orders of magnitude larger than previous work. Then we filter, cluster, and rerank those solutions to a small set of 10 candidate programs that we submit for external assessment.
Ah. That doesn't sound like "critical thinking", or any thinking. It sounds like massive brute-force guessing.
A quick look at the arxiv preprint linked from the article reveals that the "massive" amount of prorgams generated is in the millions (see Section 4.4). These are "filtered" by testing them against program input-output (I/O) examples given in the problem descriptions. This "filtering" still leaves a few thousands of candidate programs that are further reduced by clustering to "only" 10 (which are finally submitted).
So it's a generate-and-test approach rather than anything to do with reasoning (as claimed elsewhere in the article) let alone "thinking". But why do such massive numbers of programs need to be generated? And why are there still thousands of candidate programs left after "filtering" on I/O examples?
The reason is that the generation step is constrained by the natural-language problem descriptions, but those are not enough to generate appropriate solutions because the generating language model doesn't understand what the problem descriptions mean; so the system must generate millions of solutions hoping to "get lucky". Most of those don't pass the I/O tests so they must be discarded. But there are only very few I/O tests for each problem so there are many programs that can pass them, and still not satisfy the problem spec. In the end, clustering is needed to reduce the overwhelming number of pretty much randomly generated programs to a small number. This is a method of generating programs that's not much more precise than drawing numbers at random from a hat.
Inevitably, the results don't seem to be particularly accurate, hence the evaluation against programs written by participants in coding competitions, which is not any objective measure of program correctness. Table 10 on the arxiv preprint lists results on a more formal benchmar, the APPS dataset, where it's clear that the results are extremely poor (the best performing AlphaCode variant solves 20% of the "introductory" level problems, though outperforming earlier approaches).
Overall, pretty underwhelming and a bit surpirsing to see such lackluster results from DeepMind.
BUT, our jobs have a lot more complexity
- Local constraints - We almost always work in a large, complex existing code base with specific constraints
- Correctness is hard - writing lots of code is usually not the hard part, it's proving it correct against amorphous requirements, communicated in a variety of human social contexts, and bookmarked.
- Precision is extremely important - Even if 99% of the time, CoPilot can spit out a correct solution, the 1% of the time it doesn't creates a bevy of problems
Are those insurmountable problems? We'll see I suppose, but we begin to verge on general AI if we can gather and understand half a dozen modalities of social context to build a correct solution.
Not to mention much of the skill needed in our jobs has much more to do with soft skills, and the bridge between the technical and the non technical, and less to do with hardcore heads-down coding.
Exciting times!
And, have you tried polling? I hear it keeps the CPU warm in winter. Interrupts are so ... this just in, Nike's stock jump 3% ... Where was I? Did I save my task context properly? Did I reenable interrupts?
I'm not quite sure what you're asking, but my reason is that I do not enjoy working on/with ML. I'd personally rather quit the industry.
But I work in embedded/driver development. I do not worry about ML models replacing me yet, but if I were just gluing together API calls I would be a bit worried and try to specialize.