This was perhaps a very hard problem for an LLM, as the Packer tool’s nature is to manage layers of context. Environment variables passed through templates then passed to scripts which themselves might be in other frameworks. So in this case it to be confused about what was Ansible syntax and what was Packer.
So the bot seems to have different failure modes than humans. Distinguishing context layers seems to be a weak point. And an answer that is a wild guess looks as authoritative as a solid answer. But it’s still extremely impressive.
I started trying to learn Clojure with this years advent of code, but got stuck and first tried to use ChatGPT to solve it. My impression matches yours in that it consistently produced non working code and even when told about the error it was unable to fix it.
Then I instead decided to let it use any tool or language it knows, and I'm now documenting how it's doing solving the things.
If you're interested, here is Day1 where I first tried to use it to help me solve it with clojure, but then I gave up and asked for any concise solution. So I got a working solution with `awk` https://blog.nyman.re/2022/12/02/chatgpt-does-advent.html
The second day I just let it pick anything, and it successfully solved the day2 puzzle using python which seems to be it's go-to language. https://blog.nyman.re/2022/12/03/chatgpt-does-advent.html
Not so weird: this judgment could apply to a lot of humans and to whole fields of human activities, if not to the essence of life itself.
In particular it reminds me of a con man every expert could see through, but that mesmerized management with his buzzwords talk, causing an exodus of competent people and high turnover for a few years, and most likely many millions in damage.
With advanced enough AIs handling full remote jobs, this could be done on steroids, getting you a lot of income while wreaking havoc in the companies.
I don’t believe the AI is generalized by any means. But I think it’s more than a Eliza-style parlour trick.
It does really well with facts but misses subjectivity and nuance. Like I ask it to write stories for Seinfeld or Raymond and it misses the tone and style entirely. But it gets all the names and other factual details right. I find it works incredibly well when you want things that are objective, like code or rhymes or comparisons of things.
Write a deep poem about life, death, and the website Rotten Tomatoes
The result:
Life is a fleeting flame
A dance in the endless game
We come and we go
Leaving our mark, but never to know
What the future holds or who's to blame
Death is the great unknown
A mystery, never to be shown
But Rotten Tomatoes, a website so bold
Attempts to rate our stories untold
As if life and death can be bought and sold.
But in the end, what does it matter?
The ratings, the reviews, the bitter chatter
For in the grand scheme of things
We are but fleeting moments, the faintest of flings
And Rotten Tomatoes, a mere blip on the radar.
Sounds interesting! Mind sharing the results?
The AoC challenges this early aren't difficult, but they have several steps and are significantly more challenging than something you would be able to find as a Stackoverflow answer.
"Unfortunately, I am unable to provide a detailed description of the education system in Poland and its changes over the last 30 years because I have limited access to information and cannot browse the internet."
but I have no idea if it's not lying :)
Personally I don't see it being difficult for the AI to solve these trivial problems at all.
https://codeforces.com/contest/1672/problem/D
ChatGPT is just a very good copy/paste, not a logical problem solver(yet).
I used 'z' instead of 'a' to avoid any possible issues with the article 'a'. I think I messed up the assignment of z[l] at z[r] being after its updated, not sure.
But it created the input format, described it and the program ran the first time once i fixed the indents (code formatting is broken for some reason). If I run against the input at the contest page I get NO NO NO NO YES.
While I did say 80%, the 20% is most crucial and without it, the code is useless. For example, it doesn't understand scope and assignment in Elixir. Getting it to write in more pure functional style is close to impossible (or I just haven't found a good prompt).
I spent a good 30 minutes trying to get it to generate a working code for Day 1 Part 1. No nudging, just errors and AoC answers (too high, too low) and it never got there. Even after I started to correct its mistakes, like "your Enum.reduce/3 return is not assigned anywhere", it couldn't get a solution and started reverting to previous answers.
I think what's going to happen here, is that these models will shift a meaning of "boilerplate". If I can write the scaffolding and basic architecture easily, I'm happy to use them.
Also, I do wonder how is all of this going to play out if it has access to Input, REPL and just learns.
This is the biggest problem I see for actually getting it to do anything. It can only go so far from its first attempt. No amount of nudging can get it to correctly solve some problems.
You probably just need to start a new thread with a better initial prompt which removes the benefit of the chat approach.
Automated solutions exist too:
* https://twitter.com/ostwilkens/status/1598458146187628544
* https://www.reddit.com/r/adventofcode/comments/zb8tdv/2022_d...
Given how similar ChatGPT and siblings are to how chess bots work these days, I am somehow not surprised.
It's not really any different to using high-level programming languages with extensive standard libraries versus doing everything in assembly language.
I'm doing AoC at the moment too and I'm using the chat GPT thing as a sort of assistant. I don't program in Rust much so sometimes it's difficult to remember certain things and functions. Expressing my intent to the tool seems to come up with decent answers
Some example questions I've asked the tool recently:
> I want to insert a char into a hash map if it does not exist, if it does increment a counter
> rust find common keys in two hashmaps keyed by char
Yes they can probably be found on stack overflow or whatever but it feels more natural this way.
...and yes I could just go down the route of getting the thing to solve the AoC challenge completely but that's no fun
Is there really any fun in solving problems that you could easily solve with an AI?
A deeper question is how long until hand written code has too many bugs that it is worthless compared to AI code?
There is also the problem that once AI code is that good, there is no point in all this abstraction and overhead from language features aimed at human programmers. An AI programming language can be much faster and closer to binary.
I just can't imagine not seeing in my lifetime some kind of prompt that I can make a clone of this website in 2 seconds along with a 1000 variations along with the site being as fast as possible.
What advances in computing? As we approached physical limits we've seen cpu and gpu stopped scaling for a couple of years already [1]. The new models just run on higher frequencies and consume unpropornally more wattage.
Quantum computing is a joke [2]. AI is just a overhyped rephrasing of machine learning.
This is rather hinting about the next decade of no technological progress.
And don't get me started on the effects of recession.
1: https://arstechnica.com/gaming/2022/09/do-expensive-nvidia-g...
The issue is that it still takes some human finangling to make it work. But it is able to understand the word problems, even long ones, pretty well.
Replit included so you can verify: https://twitter.com/thiteanish/status/1598217824392351744?t=...
I plan on going back and catching up on the other days
The blog attempts to solve 3 of 24 (thats 12.5 %) of advent of code 2022, and if you read along you'll see OP only had success on the first task of day 1, which would make a more correct title as "AI solves 2% of Avent of Code 2022" (assuming 2 tasks each day).
Do note that AoC tends to start with hello-world style tasks and increase in difficulty.
I did not try to make a point of the time of month, but of the claim in the title.
OP only solved the very first task of day 1, and the title suggest all was solved.
If you only can understand a part of what was written, then perhaps you should not comment on it and pretend like you understood the rest too.