Bard is much worse at puzzle solving than ChatGPT (opens in new tab)

(twofergoofer.com)

90 pointscowllin3y ago80 comments

80 comments

Wow I had hoped for a more productive discussion than these 1-1 comparisons of Bard vs ChatGPT that I'm seeing everywhere. The model deployed with this version of Bard is clearly a smaller model than the biggest LaMDA/PaLM models Google has been working on for ages. Which, according to their publications, show unprecedented results on _proof writing_ of all things (see Minerva). While their strategic decisions may be questionable (or they're just trying to quantize the model for mass deployment without burning billions per month in compute costs), its almost silly to question Google's ability to build useful LLMs.

seanhunter3y ago

At the moment unless we get more information about what metric you're supposed to evaluate it on, you could probably simplify the headline to just "Bard is much worse than chatgpt" without any loss of accuracy.

It's not really realistic to expect people to give Google credit for these amazing models they have published results about but haven't let people play with - they have given people Bard and people are evaluating it based on the criteria most obvious to them - a comparison to a very similar product that has just been released.

Traubenfuchs3y ago

They knew the war they were entering, they knew their enemies, they knew how they'd get evaluated and still decided to get this model out in its current state, leading to the conclusion: Yes, this is really the best they can do and it's much worse than the state of the art.

In any case, it's a massive marketing blunder, the public opinion formed within the last hours was overwhelmingly "Bard sucks compared to ChatGPT."

carlmr3y ago

>Yes, this is really the best they can do and it's much worse than the state of the art.

This is the best they can do under pressure.

ChatGPT surprised the world with how good it was, then Google scrambled to get something out quick.

A project like this is a massive undertaking, the first mover has the advantage that they can calmly refine their model until they find it presentable.

The question is, is what Google is delivering good for the timeframe since OpenGPT exploded in popularity enough for Google leadership to take note. Since that moment, realistically, is when they put pressure on their devs to push something out the door.

I think we'll see a better iteration soon. Not only from Google, but from other competitors.

random_cynic3y ago

> its almost silly to question Google's ability to build useful LLMs.

Unless they release a model one can "use" and verify their claims it's literally silly to make this statement.

amelius3y ago

There are useful open source LLMs. Or are you questioning their ability to configure; make install?

DeathArrow3y ago

> its almost silly to question Google's ability to build useful LLMs.

It's almost silly to presume anything without proofs. People are judging Google based on what Google has shown.

xiphias23y ago

It seems like they don't want to be the best, just good and cheap enough that they don't lose users therefore ad revenue.

They behave like Yahoo when Google took over.

letitgo123453y ago

Even the best Google models seem to be lagging for reasoning tasks vs OpenAI ones at the moment - see the graphs at https://github.com/suzgunmirac/BIG-Bench-Hard

peyton3y ago

Everyone’s gonna have bigger models. Where are their useful models?

celim3073y ago

Thats like saying car brand X is more reliable than car brand Y, because brand X won formula one.

masakreTech3y ago

you can always go back to reddit

fenomas3y ago

Am I missing something? Most of TFA is about Bard failing to answer with rhyming words, but in the only prompts shown the author doesn't actually ask for rhyming words. He just says the hint and the name of the puzzle.

Is this not simply: "Bard is worse than ChatGPT at having seen the 'how-to-play' page for my side project during its training"?

comex3y ago

Clicking through to the link next to 'last week's text' and then to 'full rules', it looks like the author is starting the chat sessions with a full explanation that isn't included in the screenshots. (Also, the last screenshot shows the author explicitly asking about rhymes.)

fenomas3y ago

Ah thanks - for others here is the link, though TFA may not necessarily have used the same prompt: https://docs.google.com/document/d/1_eg_jiUE5y8e5zeiz5HCGDc2...

Based on that it looks like the author asked all 25 test puzzles in one big prompt, which one supposes would favor larger models. To compare "puzzle solving" you'd think it would make more sense to ask one puzzle at a time?

cowllinOP3y ago

I tried it both ways; with individual prompts and prompts in bulk. I ran both tests the same way. There's a tradeoff in writing a legible/interesting blog post and relating step-by-step the way the evaluation was ran! Appreciate you reading and the feedback :)

jackblemming3y ago

How is this possible? Google makes people do 8 rounds of leetcode. How could they be beaten? Nothing makes sense anymore.

BoorishBears3y ago

People from Google have argued that's exactly why they're failing.

Personally, as someone who worked at a company that was up over 500% during the pandemic, shipped absolutely nothing during that spike, and then deflated below their pre-pandemic pricing, I saw the foley in hiring smart people first hand.

It's not enough to hire the smartest people, and in fact it can be a competitive disadvantage. The smartest people often want their piece of the product to reflect their ingenuity no matter how ancillary it is to the core mission. Unfortunately that often precludes the kind of agility that businesses need to stay competitive.

OpenAI managed to poach Googlers by simply not having fiefdoms built by smart people™. I imagine if Google had built GPT-4, it wouldn't be having downtime today. Because it wouldn't be public. And it might never be public because it doesn't scale for Google scale yet, and the ethicists want their say, and we need to integrate it into Borg and the front end hasn't passed through enough layers of design and...

natch3y ago

I think they just got lazy and entitled. Maybe ChatGPT will be the scare they need. It feels bad though; they almost don’t deserve the energy this fight will give them.

SteveNuts3y ago

It's so sad to me to see the downfall of google from the absolute coolest company on the planet to the one that's now trying to keep up.

noflag3y ago

s/Microsoft/Google/

http://www.paulgraham.com/microsoft.html

zedadex3y ago

> Microsoft

Good joke. MS has always been in the 'incompetent evil' quadrant, Newcomers just keep inexplicably giving them the benefit of the doubt or assuming/insisting they've "changed".

eslaught3y ago

But was Microsoft ever cool in the sense that Google was circa 2005? (Asking honestly, I'm too young to remember that period.)

doctoboggan3y ago

No, not really. They were seen as a serious business used by other serious businesses. IMO google pioneered the "cool tech company" with their 20% time, "don't be evil", unique campus life, etc.

PopePompus3y ago

No. I'm an old-timer, and there has not been one picosecond since the moment of its founding when Microsoft was cool.

2 more replies

throw3108223y ago

Yes, somehow.

https://rarehistoricalphotos.com/windows-95-launch-day-1995/

I remember people commenting "never seen such a thing before, for a computer OS!"

"Many electronics stores held midnight launches for the product, with thousands of people waiting in line to be the first to get their hands on the operating system.

The release was a tremendous success. Microsoft sold 7 million copies in the first five weeks, and Windows 95 was soon the most popular operating system on the market."

prohobo3y ago

I know that at that time I used to be like "what kind of pleb uses anything other than Windows? Get off your high horse and use the OS that actually works", because well... Windows XP was pretty good, and games worked on it.

MaximumYComb3y ago

https://www.youtube.com/watch?v=Vhh_GeBPOhs&ab_channel=MrWue...

Really cool

waihtis3y ago

Need you even ask? https://www.youtube.com/watch?v=IY2j_GPIqRA

leto_ii3y ago

Funny, cause Microsoft didn't die. They're doing great. More than that, they're the ones behind OpenAI.

travisgriggs3y ago

s/IBM/Microsoft/

History rhymes with itself.

kodah3y ago

That's a clever game to get it to play. Today I asked ChatGPT to give me 1000 Fibonacci numbers starting with the 2000th number and it crashed. Later I asked it the same prompt and it repeatedly gave me code to calculate the Fibonacci numbers in Python.

Tiberium3y ago

> and it crashed

I hope you understand that it's only because of inability of OpenAI's servers to keep up with demand or some issue in their backend code - language models themselves can't "crash" like normal programs on some kind of input, because they "just" generate new tokens.

kodah3y ago

I know little more than what I've read about LLMs and other models. I just got it to do them again and it output them rather quickly: https://imgur.com/a/jfviCCe

leto_ii3y ago

are the numbers correct?

1 more reply

mike_d3y ago

> language models themselves can't "crash"

That is like saying the Excel document didn't crash, but Excel did when it tried to parse it. As far as I know there is no proof that you can't cause a LLM to crash with user input.

> because they "just" generate new tokens.

I can write a program that counts to 100 that crashes reliably.

inimino3y ago

> As far as I know there is no proof that you can't cause a LLM to crash with user input.

Well, then you simply don't understand how they work.

hoseja3y ago

It's a fixed number of matrix multiplications. You can't make matrix multiplication crash by feeding it any sort of strange numbers (unless they haven't handled their Infs and NaNs).

1 more reply

jasfi3y ago

What would be more interesting is to see Bard vs ChatGPT debating various topics.

boffinism3y ago

> Twofer Goofer HQ's adherence to strict "perfect" rhyme can be tricky for those slant rhyme-inclined.

And yet one puzzle they hammer Bard for failing is "Cactus Practice". What accent do you have to have for that to be a perfect rhyme?

cowllinOP3y ago

From Chicago ... and with more than a thousand solves on that puzzle we've never received a single complaint about the rhyme on that one (plus the stats say users find it an extremely easy rhyme to solve)! Curious how you pronounce that one such that they don't rhyme?

tetromino_3y ago

According to the dictionary (and matching my own non-native pronunciation), cactus is /ˈkæktʌs/, practice is /ˈpɹæktɪs/

The terminal /tʌs/ is not quite the same thing as /tɪs/; since they are both unstressed, the difference can be hard to notice in fast speech, but becomes clear when enunciating.

yorwba3y ago

Dictionaries often neglect common mergers like the weak vowel merger https://en.wikipedia.org/wiki/Phonological_history_of_Englis... where unstressed /ʌ/, /ə/ and /ɪ/ all end up being pronounced the same.

runnerup3y ago

There are differing definitions of “rhyme” in popular usage. See: https://youtu.be/_kQBVneC30o

A lot of “does this rhyme with that?” depends on the context in terms of how strict the rhyming must be or not.

sethaurus3y ago

In standard Australian English, this is a perfect rhyme.

visarga3y ago

Offtopic - have you seen Phind?

https://www.phind.com/

It is very fast and wins the search benchmarks here:

https://twitter.com/vladquant/status/1638305110869807104

masakreTech3y ago

Can your source be not someone with an animated profile picture?

visarga3y ago

Source's source: https://blog.kagi.com/kagi-ai-search#aitest

mikewarot3y ago

If I understand how Large Language models work, they don't actually know about spelling.... they are given tokens that represent words, and can only infer things from the context of those tokens across terabytes of data that they're given.

Any rhyming done is an impressive result.

ben0x5393y ago

I mean, English spelling certainly doesn't free you from the need to figure rhyming out from context.

ramblerman3y ago

It is amazing, but somewhat explicable as an emergent effect.

Dont forget that the model has seen all the poems and raps on the internet. It built some latent space where certain words always cluster together in the context of poems, and in which location.

In this case it really has the best database available to say, what next word would slot in nicely here - as that is precisely what it was trained to do.

It is amazing, but somewhat explicable as an emergent effect.

Dont forget that the model has seen all the poems and raps on the internet. It built some latent space where certain words always cluster together in the context of poems, and in which location.

In this case it really has the best database available to say, what next word would slot in nicely here - as that is precisely how it was trained.

I find it more amazing tbh that you can ask for a poem about something, and that it then sticks to the plot, makes references to the start etc than the actual rhyming.

chewxy3y ago

Well the nature of byte pair encoding has a side effect of encoding sounds as well. I briefly explored it here: https://youtu.be/fTDQUxha9qU?t=480

milemi3y ago

"Bard is much worse than ChatGPT at solving an obscure word game I invented" would have been a more honest title, but would probably generate less clicks for the author.

Bard may still be much worse than ChatGPT at solving all kinds of puzzles, but the article is click bait for promoting the author's word game, not an actual investigation that warrants that conclusion.

vineyardmike3y ago

Having read through the word game, I agree with others that it's good that the game is less likely to be in the corpus. I think rhyming, while a challenging task, may be a poor benchmark for ability. The author doesn't seem to understand rhyming too well (cactus practice is a weak rhyme at best)

I completely disagree with the "hasty rhyming test" - Skeleton and Gelatin don't rhyme (-ton vs -tin), and rhyme worse than protein and poutine (-een vs --een).

igravious3y ago

> The author doesn't seem to understand rhyming too well (cactus practice is a weak rhyme at best)

cactus / practice ?

they rhyme to my mind

skeleton / gelatin ?

also rhyme to my ears

protein / poutine

also rhyme enough to be considered to rhyme

You appear to be operating under the impression that the every syllable of a rhyming couplet has to rhyme exactly for it to be considered a rhyme. This is an incorrect assumption. In fact, the above rhymes are arguably more pleasing because they are inexact rhymes rather than being exact forced rhymes.

In your world the only actual rhymes would be

bold / cold / gold

and

double / trouble / bubble

types of rhymes but the world considers the following to be perfectly acceptable

sent to meet her / centimetre

and so on

CamouflagedKiwi3y ago

What most people consider a rhyme is that the vowel and coda of the last syllable match (of course we don't mostly reach for the technical definition).

I guess the examples there might be accent dependent. Protein/poutine is the only one of those first examples that really rhymes to me; skeleton/gelatin and cactus/practice both have different vowels. Maybe different for you though.

2 more replies

vineyardmike3y ago

> You appear to be operating under the impression that the every syllable of a rhyming couplet has to rhyme exactly for it to be considered a rhyme

I'm not operating under that impression, but the author is [1].

To me, the final "sounds" should match - not every syllable, an end rhyme according to wiki [0]. Specifically I would consider a rhyme to require matching sounds "at least from last vowel to end", but I don't think of rhymes first from the strict definition. Perhaps it's an accent thing but "-us" in cactus is not the same sound as "-ice" in practice. If a child made a poem with these sounds I would tell them "good job, it's a rhyme", and perhaps for the purpose of a silly word game too. But I would not use it as a passing case for a test of any sort like the author.

What's more pleasing is irrelevant, what is relevant is if its a true rhyme.

[0] https://en.wiktionary.org/wiki/end_rhyme#English

[1] https://news.ycombinator.com/reply?id=35258385&goto=item%3Fi...

1 more reply

cowllinOP3y ago

In the article, I mention that Twofer Goofer requires perfect or strict rhyme. Perfect and strict rhyme require that all syllables are pronounced identically in the speaker's tongue (for me, American Midwest accent), except for the first sound of the word which can vary.

Hence pooh-teen and proh-tein do not rhyme. Skell-ih-tin and Gell-ih-tin do rhyme.

A game like this requires a pretty tight rhyming definition to not annoy players in a given day!

Thanks for reading: https://www.masterclass.com/articles/perfect-vs-imperfect-rh...

LesZedCB3y ago

what's wrong with that?

the use of novel puzzles is frankly awesome because there's a much lower chance of contamination from previous puzzles so we get a chance to see how much generalization they've achieved.

milemi3y ago

I'm complaining about the title writing a check that the blog post can't cash.

Skunkleton3y ago

GPT-4 says: A more accurate and balanced title might be: "Comparing Bard and ChatGPT in Puzzle Solving: An Examination within the Context of a Word Game"

1 more reply

cowllinOP3y ago

Fair! But if I wrote Twofer Goofer in the title it would not resonate at at all. Alas, tradeoffs.

porphyra3y ago

How do you navigate this blog to read the other articles? I couldn't find any way to read the one on gpt4 (clicking the underlined "wrote about" does nothing) and twofergoofer.com/blog goes to a 404.

cldellow3y ago

I was curious to read it, too, so I googled to find the article.

It's located at https://twofergoofer.com/blog/gpt-4

cowllinOP3y ago

Hah - we only made the blog over the weekend and don't have any nav or menu for now. But yep we link the prior article a few times in this article, that article goes into more detail!

ralfd3y ago

It is interesting how lackluster the reactions are about Bard, when it would have been jaw-gapping amazing just a year ago.

masakreTech3y ago

Bard is basically trash

NoZebra120vClip3y ago

I tried to play hangman with it, but it was on crack.

j / k navigate · click thread line to collapse

80 comments

hackpert3y ago

seanhunter3y ago

Traubenfuchs3y ago

In any case, it's a massive marketing blunder, the public opinion formed within the last hours was overwhelmingly "Bard sucks compared to ChatGPT."

carlmr3y ago

>Yes, this is really the best they can do and it's much worse than the state of the art.

This is the best they can do under pressure.

ChatGPT surprised the world with how good it was, then Google scrambled to get something out quick.

A project like this is a massive undertaking, the first mover has the advantage that they can calmly refine their model until they find it presentable.

I think we'll see a better iteration soon. Not only from Google, but from other competitors.

random_cynic3y ago

> its almost silly to question Google's ability to build useful LLMs.

Unless they release a model one can "use" and verify their claims it's literally silly to make this statement.

amelius3y ago

There are useful open source LLMs. Or are you questioning their ability to configure; make install?

DeathArrow3y ago

> its almost silly to question Google's ability to build useful LLMs.

It's almost silly to presume anything without proofs. People are judging Google based on what Google has shown.

xiphias23y ago

It seems like they don't want to be the best, just good and cheap enough that they don't lose users therefore ad revenue.

They behave like Yahoo when Google took over.

letitgo123453y ago

Even the best Google models seem to be lagging for reasoning tasks vs OpenAI ones at the moment - see the graphs at https://github.com/suzgunmirac/BIG-Bench-Hard

peyton3y ago

Everyone’s gonna have bigger models. Where are their useful models?

celim3073y ago

Thats like saying car brand X is more reliable than car brand Y, because brand X won formula one.

masakreTech3y ago

you can always go back to reddit

fenomas3y ago

Is this not simply: "Bard is worse than ChatGPT at having seen the 'how-to-play' page for my side project during its training"?

comex3y ago

fenomas3y ago

Ah thanks - for others here is the link, though TFA may not necessarily have used the same prompt: https://docs.google.com/document/d/1_eg_jiUE5y8e5zeiz5HCGDc2...

cowllinOP3y ago

jackblemming3y ago

How is this possible? Google makes people do 8 rounds of leetcode. How could they be beaten? Nothing makes sense anymore.

BoorishBears3y ago

People from Google have argued that's exactly why they're failing.

natch3y ago

I think they just got lazy and entitled. Maybe ChatGPT will be the scare they need. It feels bad though; they almost don’t deserve the energy this fight will give them.

SteveNuts3y ago

It's so sad to me to see the downfall of google from the absolute coolest company on the planet to the one that's now trying to keep up.

noflag3y ago

s/Microsoft/Google/

http://www.paulgraham.com/microsoft.html

zedadex3y ago

> Microsoft

Good joke. MS has always been in the 'incompetent evil' quadrant, Newcomers just keep inexplicably giving them the benefit of the doubt or assuming/insisting they've "changed".

eslaught3y ago

But was Microsoft ever cool in the sense that Google was circa 2005? (Asking honestly, I'm too young to remember that period.)

doctoboggan3y ago

No, not really. They were seen as a serious business used by other serious businesses. IMO google pioneered the "cool tech company" with their 20% time, "don't be evil", unique campus life, etc.

PopePompus3y ago

No. I'm an old-timer, and there has not been one picosecond since the moment of its founding when Microsoft was cool.

2 more replies

throw3108223y ago

Yes, somehow.

https://rarehistoricalphotos.com/windows-95-launch-day-1995/

I remember people commenting "never seen such a thing before, for a computer OS!"

"Many electronics stores held midnight launches for the product, with thousands of people waiting in line to be the first to get their hands on the operating system.

The release was a tremendous success. Microsoft sold 7 million copies in the first five weeks, and Windows 95 was soon the most popular operating system on the market."

prohobo3y ago

MaximumYComb3y ago

https://www.youtube.com/watch?v=Vhh_GeBPOhs&ab_channel=MrWue...

Really cool

waihtis3y ago

Need you even ask? https://www.youtube.com/watch?v=IY2j_GPIqRA

leto_ii3y ago

Funny, cause Microsoft didn't die. They're doing great. More than that, they're the ones behind OpenAI.

travisgriggs3y ago

s/IBM/Microsoft/

History rhymes with itself.

kodah3y ago

Tiberium3y ago

> and it crashed

kodah3y ago

I know little more than what I've read about LLMs and other models. I just got it to do them again and it output them rather quickly: https://imgur.com/a/jfviCCe

leto_ii3y ago

are the numbers correct?

1 more reply

mike_d3y ago

> language models themselves can't "crash"

That is like saying the Excel document didn't crash, but Excel did when it tried to parse it. As far as I know there is no proof that you can't cause a LLM to crash with user input.

> because they "just" generate new tokens.

I can write a program that counts to 100 that crashes reliably.

inimino3y ago

> As far as I know there is no proof that you can't cause a LLM to crash with user input.

Well, then you simply don't understand how they work.

hoseja3y ago

It's a fixed number of matrix multiplications. You can't make matrix multiplication crash by feeding it any sort of strange numbers (unless they haven't handled their Infs and NaNs).

1 more reply

jasfi3y ago

What would be more interesting is to see Bard vs ChatGPT debating various topics.

boffinism3y ago

> Twofer Goofer HQ's adherence to strict "perfect" rhyme can be tricky for those slant rhyme-inclined.

And yet one puzzle they hammer Bard for failing is "Cactus Practice". What accent do you have to have for that to be a perfect rhyme?

cowllinOP3y ago

tetromino_3y ago

According to the dictionary (and matching my own non-native pronunciation), cactus is /ˈkæktʌs/, practice is /ˈpɹæktɪs/

The terminal /tʌs/ is not quite the same thing as /tɪs/; since they are both unstressed, the difference can be hard to notice in fast speech, but becomes clear when enunciating.

yorwba3y ago

runnerup3y ago

There are differing definitions of “rhyme” in popular usage. See: https://youtu.be/_kQBVneC30o

A lot of “does this rhyme with that?” depends on the context in terms of how strict the rhyming must be or not.

sethaurus3y ago

In standard Australian English, this is a perfect rhyme.

visarga3y ago

Offtopic - have you seen Phind?

https://www.phind.com/

It is very fast and wins the search benchmarks here:

https://twitter.com/vladquant/status/1638305110869807104

masakreTech3y ago

Can your source be not someone with an animated profile picture?

visarga3y ago

Source's source: https://blog.kagi.com/kagi-ai-search#aitest

mikewarot3y ago

Any rhyming done is an impressive result.

ben0x5393y ago

I mean, English spelling certainly doesn't free you from the need to figure rhyming out from context.

ramblerman3y ago

It is amazing, but somewhat explicable as an emergent effect.

Dont forget that the model has seen all the poems and raps on the internet. It built some latent space where certain words always cluster together in the context of poems, and in which location.

In this case it really has the best database available to say, what next word would slot in nicely here - as that is precisely what it was trained to do.

It is amazing, but somewhat explicable as an emergent effect.

Dont forget that the model has seen all the poems and raps on the internet. It built some latent space where certain words always cluster together in the context of poems, and in which location.

In this case it really has the best database available to say, what next word would slot in nicely here - as that is precisely how it was trained.

I find it more amazing tbh that you can ask for a poem about something, and that it then sticks to the plot, makes references to the start etc than the actual rhyming.

chewxy3y ago

Well the nature of byte pair encoding has a side effect of encoding sounds as well. I briefly explored it here: https://youtu.be/fTDQUxha9qU?t=480

milemi3y ago

"Bard is much worse than ChatGPT at solving an obscure word game I invented" would have been a more honest title, but would probably generate less clicks for the author.

vineyardmike3y ago

I completely disagree with the "hasty rhyming test" - Skeleton and Gelatin don't rhyme (-ton vs -tin), and rhyme worse than protein and poutine (-een vs --een).

igravious3y ago

> The author doesn't seem to understand rhyming too well (cactus practice is a weak rhyme at best)

cactus / practice ?

they rhyme to my mind

skeleton / gelatin ?

also rhyme to my ears

protein / poutine

also rhyme enough to be considered to rhyme

In your world the only actual rhymes would be

bold / cold / gold

and

double / trouble / bubble

types of rhymes but the world considers the following to be perfectly acceptable

sent to meet her / centimetre

and so on

CamouflagedKiwi3y ago

What most people consider a rhyme is that the vowel and coda of the last syllable match (of course we don't mostly reach for the technical definition).

2 more replies

vineyardmike3y ago

> You appear to be operating under the impression that the every syllable of a rhyming couplet has to rhyme exactly for it to be considered a rhyme

I'm not operating under that impression, but the author is [1].

What's more pleasing is irrelevant, what is relevant is if its a true rhyme.

[0] https://en.wiktionary.org/wiki/end_rhyme#English

[1] https://news.ycombinator.com/reply?id=35258385&goto=item%3Fi...

1 more reply

cowllinOP3y ago

Hence pooh-teen and proh-tein do not rhyme. Skell-ih-tin and Gell-ih-tin do rhyme.

A game like this requires a pretty tight rhyming definition to not annoy players in a given day!

Thanks for reading: https://www.masterclass.com/articles/perfect-vs-imperfect-rh...

LesZedCB3y ago

what's wrong with that?

the use of novel puzzles is frankly awesome because there's a much lower chance of contamination from previous puzzles so we get a chance to see how much generalization they've achieved.

milemi3y ago

I'm complaining about the title writing a check that the blog post can't cash.

Skunkleton3y ago

GPT-4 says: A more accurate and balanced title might be: "Comparing Bard and ChatGPT in Puzzle Solving: An Examination within the Context of a Word Game"

1 more reply

cowllinOP3y ago

Fair! But if I wrote Twofer Goofer in the title it would not resonate at at all. Alas, tradeoffs.

porphyra3y ago

How do you navigate this blog to read the other articles? I couldn't find any way to read the one on gpt4 (clicking the underlined "wrote about" does nothing) and twofergoofer.com/blog goes to a 404.

cldellow3y ago

I was curious to read it, too, so I googled to find the article.

It's located at https://twofergoofer.com/blog/gpt-4

cowllinOP3y ago

Hah - we only made the blog over the weekend and don't have any nav or menu for now. But yep we link the prior article a few times in this article, that article goes into more detail!

ralfd3y ago

It is interesting how lackluster the reactions are about Bard, when it would have been jaw-gapping amazing just a year ago.

masakreTech3y ago

Bard is basically trash

NoZebra120vClip3y ago

I tried to play hangman with it, but it was on crack.

j / k navigate · click thread line to collapse