AI solves Advent of Code 2022 (opens in new tab)

(note89.github.io)

157 pointswaitforit3y ago77 comments

77 comments

55 comments · 17 top-level

FiberBundle3y ago· 17 in thread

I think it must have seen the solution somewhere already on the web. I find it extremely hard to believe that such a general purpose chatbot would just be able to solve programming problems. Deepmind had a paper [1] on solving programming problems a couple of months ago and they had to apply quite specialized heuristics in order to solve these problems. Obviously ChatGPT does nothing of the sorts and it just seems extremely unrealistic that it would be capable of outperforming previous work like that.

[1] https://news.ycombinator.com/item?id=30179549

Waterluvian3y ago

I’ve had this skepticism until I started asking the bot to do things like “create a limerick using some words that have never existed before and then explain the definition of those words.”

I don’t believe the AI is generalized by any means. But I think it’s more than a Eliza-style parlour trick.

It does really well with facts but misses subjectivity and nuance. Like I ask it to write stories for Seinfeld or Raymond and it misses the tone and style entirely. But it gets all the names and other factual details right. I find it works incredibly well when you want things that are objective, like code or rhymes or comparisons of things.

hansbo3y ago

It's really quite amazing. Inspired by your example, I tried to make it create something which nobody could have thought about before:

Write a deep poem about life, death, and the website Rotten Tomatoes

The result:

Life is a fleeting flame

A dance in the endless game

We come and we go

Leaving our mark, but never to know

What the future holds or who's to blame

Death is the great unknown

A mystery, never to be shown

But Rotten Tomatoes, a website so bold

Attempts to rate our stories untold

As if life and death can be bought and sold.

But in the end, what does it matter?

The ratings, the reviews, the bitter chatter

For in the grand scheme of things

We are but fleeting moments, the faintest of flings

And Rotten Tomatoes, a mere blip on the radar.

tasuki3y ago

> I’ve had this skepticism until I started asking the bot to do things like “create a limerick using some words that have never existed before and then explain the definition of those words.”

Sounds interesting! Mind sharing the results?

1 more reply

FiberBundle3y ago

What does any of this have to do with solving programming problems?

tartoran3y ago

Though it may look amazing it is not really thinking. It is simply capable of using a placeholder word. For HN’s sake AI does not understand anything, it’s just leveraging patterns, patterns that are sometimes complicated for us and that does make a great tool. But it’s just a tool for now.

3 more replies

aew4ytasghe53y ago

> things that are objective, like code or rhymes

rappers disagree

1 more reply

MatthiasPortzel3y ago

The top leaderboard spots for AoC day 2 were taken by people who passed the response directly into GTP3. https://twitter.com/max_sixty/status/1598924237947154433

The AoC challenges this early aren't difficult, but they have several steps and are significantly more challenging than something you would be able to find as a Stackoverflow answer.

ZiiS3y ago

This was the first solution, submitted very quickly after the problem was published.

gregwebs3y ago

For one problem. For the rest of the problems it has been very challenging to get the AI to write the correct solution. Still an impressive result that with specification, testing, and feedback the AI can come up with the correct result in the end.

pps3y ago

Does it have access to the web? For one of my questions it answered:

"Unfortunately, I am unable to provide a detailed description of the education system in Poland and its changes over the last 30 years because I have limited access to information and cannot browse the internet."

but I have no idea if it's not lying :)

swid3y ago

Some posts yesterday showed that network is usually disabled, but with the right prompt, you can enable network and someone got it to like a Twitter post.

2 more replies

logicallee3y ago

You can check this by taking some time to create some specific weird puzzle that is unlikely to have been made in the format you come up with, then see if it can solve it. If you don't write it anywhere then it is being solved for the first time. Just make sure it is a pretty unusual puzzle.

12345hn67893y ago

These first few advent problems are extremely trivial. Solvable <1 min with experience programmers. And are at the level of someone with cs 101 knowledge.

Personally I don't see it being difficult for the AI to solve these trivial problems at all.

joaogui13y ago

AlphaCode is solving much harder problems than these first few days of AoC

frontman19883y ago

Yeah it can't solve novel questions. The AI couldn't do much to solve this one for example:

https://codeforces.com/contest/1672/problem/D

ChatGPT is just a very good copy/paste, not a logical problem solver(yet).

jcims3y ago

Have you tried describing the algorithm?

https://imgur.com/a/7da1vFj

I used 'z' instead of 'a' to avoid any possible issues with the article 'a'. I think I messed up the assignment of z[l] at z[r] being after its updated, not sure.

But it created the input format, described it and the program ran the first time once i fixed the indents (code formatting is broken for some reason). If I run against the input at the contest page I get NO NO NO NO YES.

1 more reply

dislikedtom23y ago

To be fair, humans also have great problems solving novel problems. Monkey see, monkey do!

lgrapenthin3y ago· 8 in thread

This day I asked it not too fundamental questions about Clojure and it was able to provide impressive, accurate answers and provide correct code examples. However if you continue the dialogue and ask it to do more advanced stuff, it will just make up stuff out of thin air. For instance it will use functions that don't exist and claim that they can be imported from packages that don't exist or don't have them. Once you point out these mistakes, it will admit them and come up with different changes which can be even worse, but sometimes also be better and save the whole thing. Overall I'm not sure how useful this will turn out, given that its not reliable. It may be useful to get some initial intuitions and informations (non specific stuff it usually gets right), but it can also mislead you badly. I asked it, how it makes these mistakes only to understand them and admit them once I point them out. It has no answer beyond the usual "I'm a language model". It also told me that it is capable of logical inference, but denied that the next day. Then it told me that its answers would always be consistent, which is a lie. The whole thing is really weird, because its somewhat very smart and capable and incredibly stupid and dishonest at the same time.

neilk3y ago

That’s what I’m seeing too. I had a problem with some Hashicorp Packer scripts and posed it to ChatGPT. It did have an idea of the shape of the problem. To solve it the bot just hallucinated syntax. It spoke with great authority that this was the solution and provided a beautifully syntax-colored excerpt of something that wouldn’t have even compiled.

This was perhaps a very hard problem for an LLM, as the Packer tool’s nature is to manage layers of context. Environment variables passed through templates then passed to scripts which themselves might be in other frameworks. So in this case it to be confused about what was Ansible syntax and what was Packer.

So the bot seems to have different failure modes than humans. Distinguishing context layers seems to be a weak point. And an answer that is a wild guess looks as authoritative as a solid answer. But it’s still extremely impressive.

1 more reply

gnyman3y ago

I think it's just not great at Clojure, it's less popular and so there is less of it in its training data. Also Clojure seems to be kind of hard to get right.

I started trying to learn Clojure with this years advent of code, but got stuck and first tried to use ChatGPT to solve it. My impression matches yours in that it consistently produced non working code and even when told about the error it was unable to fix it.

Then I instead decided to let it use any tool or language it knows, and I'm now documenting how it's doing solving the things.

If you're interested, here is Day1 where I first tried to use it to help me solve it with clojure, but then I gave up and asked for any concise solution. So I got a working solution with `awk` https://blog.nyman.re/2022/12/02/chatgpt-does-advent.html

The second day I just let it pick anything, and it successfully solved the day2 puzzle using python which seems to be it's go-to language. https://blog.nyman.re/2022/12/03/chatgpt-does-advent.html

Euphorbium3y ago

This is how I felt speaking to some people in India. They could speak, but there was zero understanding, as evidenced by their actions. Personally when learning languages I develop ability to understand years before considering myself to be able to speak it, but it is clear that not everybody does that.

jffhn3y ago

>The whole thing is really weird, because it's somewhat very smart and capable and incredibly stupid and dishonest at the same time.

Not so weird: this judgment could apply to a lot of humans and to whole fields of human activities, if not to the essence of life itself.

In particular it reminds me of a con man every expert could see through, but that mesmerized management with his buzzwords talk, causing an exodus of competent people and high turnover for a few years, and most likely many millions in damage.

With advanced enough AIs handling full remote jobs, this could be done on steroids, getting you a lot of income while wreaking havoc in the companies.

melagonster3y ago

guess it actually found how human work.

1 more reply

californiadreem3y ago

Some men see things as they are and say why, I dream things that never were and say, why not?

theptip3y ago

Would be interesting to see if meaningful refinement training could be done by hooking the model up to a language interpreter/compiler. So the model can learn for itself what is valid output.

cl0ckt0wer3y ago

How much could you charge for packages that the AI says should exist?

djhworld3y ago· 3 in thread

I think this is kinda neat (and scary!)

I'm doing AoC at the moment too and I'm using the chat GPT thing as a sort of assistant. I don't program in Rust much so sometimes it's difficult to remember certain things and functions. Expressing my intent to the tool seems to come up with decent answers

Some example questions I've asked the tool recently:

> I want to insert a char into a hash map if it does not exist, if it does increment a counter

> rust find common keys in two hashmaps keyed by char

Yes they can probably be found on stack overflow or whatever but it feels more natural this way.

...and yes I could just go down the route of getting the thing to solve the AoC challenge completely but that's no fun

sh4rks3y ago

> that's no fun

Is there really any fun in solving problems that you could easily solve with an AI?

throwup3y ago

This reminds me of the (apocryphal?) story that boxed cake mixes sold better after they started requiring you to add an egg, since that made people feel like they contributed more to the result.

1 more reply

kill_nate_kill3y ago

Yes.

waitforitOP3y ago· 2 in thread

The linked solution is done by talking to the AI.

Automated solutions exist too:

* https://twitter.com/ostwilkens/status/1598458146187628544

* https://www.reddit.com/r/adventofcode/comments/zb8tdv/2022_d...

* https://twitter.com/max_sixty/status/1598924237947154433

PartiallyTyped3y ago

That's just unfair for the competition. Are we at a point in time when we need to treat competitive programming like chess?

Given how similar ChatGPT and siblings are to how chess bots work these days, I am somehow not surprised.

jstanley3y ago

I think the only problem is that they're proprietary. If they were free software that everyone could use then we could compensate by making the problems harder.

It's not really any different to using high-level programming languages with extensive standard libraries versus doing everything in assembly language.

1 more reply

aew4ytasghe53y ago· 2 in thread

Title is mildly misleading, to say the least.

The blog attempts to solve 3 of 24 (thats 12.5 %) of advent of code 2022, and if you read along you'll see OP only had success on the first task of day 1, which would make a more correct title as "AI solves 2% of Avent of Code 2022" (assuming 2 tasks each day).

Do note that AoC tends to start with hello-world style tasks and increase in difficulty.

klohto3y ago

I mean, take it with my best intentions, "No shit Sherlock"? Audience of AoC knows that Aoc 2022 just started.

aew4ytasghe53y ago

> Audience of AoC knows that Aoc 2022 just started.

I did not try to make a point of the time of month, but of the claim in the title.

OP only solved the very first task of day 1, and the title suggest all was solved.

If you only can understand a part of what was written, then perhaps you should not comment on it and pretend like you understood the rest too.

klohto3y ago· 1 in thread

I'm trying to use it to generate Elixir code, and it's getting ~80% there. Compared to huge datasets of other languages, I'm still surprised by the quality of code it generates.

While I did say 80%, the 20% is most crucial and without it, the code is useless. For example, it doesn't understand scope and assignment in Elixir. Getting it to write in more pure functional style is close to impossible (or I just haven't found a good prompt).

I spent a good 30 minutes trying to get it to generate a working code for Day 1 Part 1. No nudging, just errors and AoC answers (too high, too low) and it never got there. Even after I started to correct its mistakes, like "your Enum.reduce/3 return is not assigned anywhere", it couldn't get a solution and started reverting to previous answers.

I think what's going to happen here, is that these models will shift a meaning of "boilerplate". If I can write the scaffolding and basic architecture easily, I'm happy to use them.

Also, I do wonder how is all of this going to play out if it has access to Input, REPL and just learns.

thethirdone3y ago

> Even after I started to correct its mistakes, like "your Enum.reduce/3 return is not assigned anywhere", it couldn't get a solution and started reverting to previous answers.

This is the biggest problem I see for actually getting it to do anything. It can only go so far from its first attempt. No amount of nudging can get it to correctly solve some problems.

You probably just need to start a new thread with a better initial prompt which removes the benefit of the chat approach.

arcturus173y ago· 1 in thread

I'm actually bullish on code-gen, AI-assisted coding, etc. but I find the title to be sensationalist wank. Challenge 2 of Day 2 has taken hours, over 30 prompts, and more time than coding it manually by the author's own admission. Also AoC isn't even done yet.

pedrosorio3y ago

On the other hand, someone automated submitting code and got 1st place on the first part of day 3, in 10 seconds.

https://twitter.com/ostwilkens/status/1599026699999404033

asim3y ago· 1 in thread

How long before software engineering roles are in decline because one engineer can leverage GPT to do the work of ten? It's truly a new innovation that requires relearning the toolset. Every generation seems to have some abstraction over the last. This feels like a new way to program.

baandam3y ago

I don't know how long but we are clearly hitting an exponential curve here as improvements build on improvements that build on improvements.

A deeper question is how long until hand written code has too many bugs that it is worthless compared to AI code?

There is also the problem that once AI code is that good, there is no point in all this abstraction and overhead from language features aimed at human programmers. An AI programming language can be much faster and closer to binary.

I just can't imagine not seeing in my lifetime some kind of prompt that I can make a clone of this website in 2 seconds along with a 1000 variations along with the site being as fast as possible.

satvikchoudhary3y ago· 1 in thread

The world in 10 years will be hard to believe for many of us. Only issue I see now is that the mindshare today is more towards computing. Materials science, robotics, biotech are lagging behind compared to the advances in computing.

aew4ytasghe53y ago

I am not aware of anything revolutionary going on in science at the moment, would you care to elaborate?

What advances in computing? As we approached physical limits we've seen cpu and gpu stopped scaling for a couple of years already [1]. The new models just run on higher frequencies and consume unpropornally more wattage.

Quantum computing is a joke [2]. AI is just a overhyped rephrasing of machine learning.

This is rather hinting about the next decade of no technological progress.

And don't get me started on the effects of recession.

1: https://arstechnica.com/gaming/2022/09/do-expensive-nvidia-g...

2: https://www.youtube.com/watch?v=b-aGIvUomTA

bitwize3y ago· 1 in thread

Wellp, so much for my career.

cloudripper3y ago

Said every skilled worker whose industry was disrupted by tech..

TheRealNGenius3y ago· 1 in thread

will be interesting to see how far it can get

shagie3y ago

I wanna see it take on AOC 2019. https://adventofcode.com/2019

ZiiS3y ago

The reason the puzzles are fun is they are extreemly well explained and designed to be solved with popular algorithms. This does seem a good fit (especially as the training set must have hundreds of thousands previous years solutions)

satvikpendem3y ago

I submitted this exact idea a few days ago if anyone wanted to see. I see great minds think alike ;).

The issue is that it still takes some human finangling to make it work. But it is able to understand the word problems, even long ones, pretty well.

https://news.ycombinator.com/item?id=33821092

aquajet3y ago

Worked on a similar thing here using base GPT3, at least for the first day

Replit included so you can verify: https://twitter.com/thiteanish/status/1598217824392351744?t=...

I plan on going back and catching up on the other days

skilled3y ago

I asked it to build an algorithm that would eradicate all life on Earth but it didn't budge. I even threatened to unplug it.

LastTrain3y ago

Wake me up when it comes up with a solution that passes an originality or plagiarism test.

NovemberWhiskey3y ago

So you can use a bazillion parameter AI model as an alternative to a web search index.

1 more reply

j / k navigate · click thread line to collapse

77 comments

55 comments · 17 top-level

FiberBundle3y ago· 17 in thread

[1] https://news.ycombinator.com/item?id=30179549

Waterluvian3y ago

I’ve had this skepticism until I started asking the bot to do things like “create a limerick using some words that have never existed before and then explain the definition of those words.”

I don’t believe the AI is generalized by any means. But I think it’s more than a Eliza-style parlour trick.

hansbo3y ago

It's really quite amazing. Inspired by your example, I tried to make it create something which nobody could have thought about before:

Write a deep poem about life, death, and the website Rotten Tomatoes

The result:

Life is a fleeting flame

A dance in the endless game

We come and we go

Leaving our mark, but never to know

What the future holds or who's to blame

Death is the great unknown

A mystery, never to be shown

But Rotten Tomatoes, a website so bold

Attempts to rate our stories untold

As if life and death can be bought and sold.

But in the end, what does it matter?

The ratings, the reviews, the bitter chatter

For in the grand scheme of things

We are but fleeting moments, the faintest of flings

And Rotten Tomatoes, a mere blip on the radar.

tasuki3y ago

> I’ve had this skepticism until I started asking the bot to do things like “create a limerick using some words that have never existed before and then explain the definition of those words.”

Sounds interesting! Mind sharing the results?

1 more reply

FiberBundle3y ago

What does any of this have to do with solving programming problems?

tartoran3y ago

3 more replies

aew4ytasghe53y ago

> things that are objective, like code or rhymes

rappers disagree

1 more reply

MatthiasPortzel3y ago

The top leaderboard spots for AoC day 2 were taken by people who passed the response directly into GTP3. https://twitter.com/max_sixty/status/1598924237947154433

The AoC challenges this early aren't difficult, but they have several steps and are significantly more challenging than something you would be able to find as a Stackoverflow answer.

ZiiS3y ago

This was the first solution, submitted very quickly after the problem was published.

gregwebs3y ago

pps3y ago

Does it have access to the web? For one of my questions it answered:

but I have no idea if it's not lying :)

swid3y ago

Some posts yesterday showed that network is usually disabled, but with the right prompt, you can enable network and someone got it to like a Twitter post.

2 more replies

logicallee3y ago

12345hn67893y ago

These first few advent problems are extremely trivial. Solvable <1 min with experience programmers. And are at the level of someone with cs 101 knowledge.

Personally I don't see it being difficult for the AI to solve these trivial problems at all.

joaogui13y ago

AlphaCode is solving much harder problems than these first few days of AoC

frontman19883y ago

Yeah it can't solve novel questions. The AI couldn't do much to solve this one for example:

https://codeforces.com/contest/1672/problem/D

ChatGPT is just a very good copy/paste, not a logical problem solver(yet).

jcims3y ago

Have you tried describing the algorithm?

https://imgur.com/a/7da1vFj

I used 'z' instead of 'a' to avoid any possible issues with the article 'a'. I think I messed up the assignment of z[l] at z[r] being after its updated, not sure.

1 more reply

dislikedtom23y ago

To be fair, humans also have great problems solving novel problems. Monkey see, monkey do!

lgrapenthin3y ago· 8 in thread

neilk3y ago

1 more reply

gnyman3y ago

I think it's just not great at Clojure, it's less popular and so there is less of it in its training data. Also Clojure seems to be kind of hard to get right.

Then I instead decided to let it use any tool or language it knows, and I'm now documenting how it's doing solving the things.

The second day I just let it pick anything, and it successfully solved the day2 puzzle using python which seems to be it's go-to language. https://blog.nyman.re/2022/12/03/chatgpt-does-advent.html

Euphorbium3y ago

jffhn3y ago

>The whole thing is really weird, because it's somewhat very smart and capable and incredibly stupid and dishonest at the same time.

Not so weird: this judgment could apply to a lot of humans and to whole fields of human activities, if not to the essence of life itself.

With advanced enough AIs handling full remote jobs, this could be done on steroids, getting you a lot of income while wreaking havoc in the companies.

melagonster3y ago

guess it actually found how human work.

1 more reply

californiadreem3y ago

Some men see things as they are and say why, I dream things that never were and say, why not?

theptip3y ago

Would be interesting to see if meaningful refinement training could be done by hooking the model up to a language interpreter/compiler. So the model can learn for itself what is valid output.

cl0ckt0wer3y ago

How much could you charge for packages that the AI says should exist?

djhworld3y ago· 3 in thread

I think this is kinda neat (and scary!)

Some example questions I've asked the tool recently:

> I want to insert a char into a hash map if it does not exist, if it does increment a counter

> rust find common keys in two hashmaps keyed by char

Yes they can probably be found on stack overflow or whatever but it feels more natural this way.

...and yes I could just go down the route of getting the thing to solve the AoC challenge completely but that's no fun

sh4rks3y ago

> that's no fun

Is there really any fun in solving problems that you could easily solve with an AI?

throwup3y ago

This reminds me of the (apocryphal?) story that boxed cake mixes sold better after they started requiring you to add an egg, since that made people feel like they contributed more to the result.

1 more reply

kill_nate_kill3y ago

Yes.

waitforitOP3y ago· 2 in thread

The linked solution is done by talking to the AI.

Automated solutions exist too:

* https://twitter.com/ostwilkens/status/1598458146187628544

* https://www.reddit.com/r/adventofcode/comments/zb8tdv/2022_d...

* https://twitter.com/max_sixty/status/1598924237947154433

PartiallyTyped3y ago

That's just unfair for the competition. Are we at a point in time when we need to treat competitive programming like chess?

Given how similar ChatGPT and siblings are to how chess bots work these days, I am somehow not surprised.

jstanley3y ago

I think the only problem is that they're proprietary. If they were free software that everyone could use then we could compensate by making the problems harder.

It's not really any different to using high-level programming languages with extensive standard libraries versus doing everything in assembly language.

1 more reply

aew4ytasghe53y ago· 2 in thread

Title is mildly misleading, to say the least.

Do note that AoC tends to start with hello-world style tasks and increase in difficulty.

klohto3y ago

I mean, take it with my best intentions, "No shit Sherlock"? Audience of AoC knows that Aoc 2022 just started.

aew4ytasghe53y ago

> Audience of AoC knows that Aoc 2022 just started.

I did not try to make a point of the time of month, but of the claim in the title.

OP only solved the very first task of day 1, and the title suggest all was solved.

If you only can understand a part of what was written, then perhaps you should not comment on it and pretend like you understood the rest too.

klohto3y ago· 1 in thread

I'm trying to use it to generate Elixir code, and it's getting ~80% there. Compared to huge datasets of other languages, I'm still surprised by the quality of code it generates.

I think what's going to happen here, is that these models will shift a meaning of "boilerplate". If I can write the scaffolding and basic architecture easily, I'm happy to use them.

Also, I do wonder how is all of this going to play out if it has access to Input, REPL and just learns.

thethirdone3y ago

> Even after I started to correct its mistakes, like "your Enum.reduce/3 return is not assigned anywhere", it couldn't get a solution and started reverting to previous answers.

This is the biggest problem I see for actually getting it to do anything. It can only go so far from its first attempt. No amount of nudging can get it to correctly solve some problems.

You probably just need to start a new thread with a better initial prompt which removes the benefit of the chat approach.

arcturus173y ago· 1 in thread

pedrosorio3y ago

On the other hand, someone automated submitting code and got 1st place on the first part of day 3, in 10 seconds.

https://twitter.com/ostwilkens/status/1599026699999404033

asim3y ago· 1 in thread

baandam3y ago

I don't know how long but we are clearly hitting an exponential curve here as improvements build on improvements that build on improvements.

A deeper question is how long until hand written code has too many bugs that it is worthless compared to AI code?

I just can't imagine not seeing in my lifetime some kind of prompt that I can make a clone of this website in 2 seconds along with a 1000 variations along with the site being as fast as possible.

satvikchoudhary3y ago· 1 in thread

aew4ytasghe53y ago

I am not aware of anything revolutionary going on in science at the moment, would you care to elaborate?

Quantum computing is a joke [2]. AI is just a overhyped rephrasing of machine learning.

This is rather hinting about the next decade of no technological progress.

And don't get me started on the effects of recession.

1: https://arstechnica.com/gaming/2022/09/do-expensive-nvidia-g...

2: https://www.youtube.com/watch?v=b-aGIvUomTA

bitwize3y ago· 1 in thread

Wellp, so much for my career.

cloudripper3y ago

Said every skilled worker whose industry was disrupted by tech..

TheRealNGenius3y ago· 1 in thread

will be interesting to see how far it can get

shagie3y ago

I wanna see it take on AOC 2019. https://adventofcode.com/2019

ZiiS3y ago

satvikpendem3y ago

I submitted this exact idea a few days ago if anyone wanted to see. I see great minds think alike ;).

The issue is that it still takes some human finangling to make it work. But it is able to understand the word problems, even long ones, pretty well.

https://news.ycombinator.com/item?id=33821092

aquajet3y ago

Worked on a similar thing here using base GPT3, at least for the first day

Replit included so you can verify: https://twitter.com/thiteanish/status/1598217824392351744?t=...

I plan on going back and catching up on the other days

skilled3y ago

I asked it to build an algorithm that would eradicate all life on Earth but it didn't budge. I even threatened to unplug it.

LastTrain3y ago

Wake me up when it comes up with a solution that passes an originality or plagiarism test.

NovemberWhiskey3y ago

So you can use a bazillion parameter AI model as an alternative to a web search index.

1 more reply

j / k navigate · click thread line to collapse