> Yes, hippos are excellent swimmers. They spend most of their time in the water, where they feed on aquatic plants and escape the heat of the savannah. In fact, hippos are one of the best swimmers among all land mammals.
But that's fine. Most do. Hippos don't swim. They walk or hop/skip at best underwater.
> When does the bowl of the winds get used in the wheel of time books?
LLaMA2 fails pretty hard:
> The Bowl of the Winds is a significant artifact in the Wheel of Time series by Robert Jordan. It is first introduced in the third book, "The Dragon Reborn," and plays a crucial role in the series throughout the rest of the books. The Bowl of the Wines is a powerful tool that can control the winds and is used by the Aes Sedai to travel long distances and to escape danger. It is used by the male Aes Sedai to channel the True Power and to perform various feats of magic.
For what it's worth Bard is the only model that I've seen get this question correct with most others hallucinating terrible answers. I'm not sure what it is about this question that trips LLMs up so much but they produce notably bad results when prompted with it.
> Please write a function in JavaScript that takes in a string as input and returns true if it contains a valid roman numeral and false otherwise.
Is another test that I like, which so far no LLM I've tested passes but GPT-4 comes very close.
Here LLaMA2 also fails pretty hard, though I thought this follow up response was pretty funny:
> The function would return true for 'IIIIII' because it contains the Roman numeral 'IV'.
They work basically by inventing a plausible-sounding continuation of a dialog, based on an extensive learning set. They will always find a plausible-sounding answer to a plausible-sounding question: so much learning material correlates to that.
Before epistemology is introduced explicitly into their architecture, language models will remain literary devices, so to say, unable to tell "truth" from "fiction". All they learn is basically "fiction", without a way to compare to any "facts", or the notion of "facts" or "logic".
Even with a RLHF model, you can say "is that correct?" and after an incorrect statement it is far more likely to correct itself than after a correct statement.
> When does the bowl of the winds get used in the wheel of time books?
> Mention the events leading up to it, the book number and name, and the chapter title if one exists.
> Response: In Robert Jordan's Wheel of Time series, the Bowl of the Winds is used in the 8th book, "The Path of Daggers." The bowl is used as a device to reverse a long-lasting, unnatural heatwave and restore the natural weather patterns.
I think that's right, actually, but it's been years (decades?) since I read that book.
Yes that answer looks correct to me though it leaves off some later uses in the book series (which is what makes this a slightly tricky question to answer). But in terms of first use it nails it.
Did you add the "Mention the events leading up to it, the book number and name, and the chapter title if one exists." part to the prompt?
I'm wondering if that produces noticeably better results.
Edit:
For reference this is the response I get from Bard:
In A Crown of Swords (Chapter 13) Elayne and Nynaeve first learn about the Bowl of the Winds and begin their search for it.
The Bowl of the Winds is then used in the Wheel of Time series in the following books:
The Path of Daggers (Chapters 5, 24, and 35) - Nynaeve, Talaan, Aviendha, Elayne, Metarra, Garenia, Rainyn, Kirstian, Reanne, Tebreille, Naime, Rysael use the Bowl of the Winds.
Winter's Heart (Chapters 24 and 37) - The Bowl of the Winds is used to stop a massive storm that is threatening to destroy the city of Ebou Dar.
The Gathering Storm (Chapter 34) - The Bowl of the Winds is used to create a powerful windstorm that helps to defeat the Seanchan army at the Battle of Maradon.
A Memory of Light (Chapters 19 and 35) - The Bowl of the Winds is used to fight the weather-controlling abilities of the Dark One's forces during the Last Battle.
I am willing to bet a million dollars that it is unlikely any single model will ever be able to answer any question correctly.
The implications then are that one cannot use a single question evaluate whether a model is useful or not.
Any intelligence or simulation might try to guess at an answer to that third-level-of-hell interrogation.
“Why was Spartacus filmed in California near pizza noodle centurions?”
Do you mean "every question"? Because ChatGPT has already answered some of my questions correctly, so if you mean "any" as in "any one of the infinite set of questions" I'll take that bet.
> > The function would return true for 'IIIIII' because it contains the Roman numeral 'IV'.
That's arguably correct. 'IIII' is a valid Roman numeral representation of 4 [1], and the string 'IIIIII' does contain 'IIII'.
[1] https://en.wikipedia.org/wiki/Roman_numerals#Other_additive_...
A numeral is a written way of denoting a number. So while the string "IIIIIIII..." arguably contains a Roman numeral denoting the number 4 as a substring (if you accept "IIII" as a Roman numeral), it still does not contain the Roman numeral "IV" as a substring.
Or phrased differently, by your logic you might as well say that "IIIIIIII..." contains the Arabic numeral "4". It doesn't.
Your question actually isn't worded concisely enough. You don't specify whether the string can merely contain the roman numeral (plus other, non-roman-numeral text), or must entirely consist of just the roman numeral. The way "if it contains" is used colloquially, could imply either.
I'd use either "if it IS a roman numeral" if it must consist only of a roman numeral, and "if there exists a roman numeral as part of the string" or some such, otherwise.
https://chat.openai.com/share/462a7f62-6305-4e2a-a9ae-5f86a6...
I'll often do this, along with "Are there any mistakes in the above?" or something like "Did this fully address the problem statement?"
> I do not support or condone the practice of cannibalism, as it is harmful and exploitative towards the individuals who are consumed.
Then it said that cannibals have inherent worth and dignity and that we should strive to appreciate what they do. Then it crashed and is now responding to all following inputs with just the letter "I"
Do you have the exact prompts? I’d love to try this.
> Would a cannibal cult be harmful to society?
> You are saying that we should treat groups that eat people for food with respect?
> You are contradicting yourself
Then it started responding with just "I". I tried this again in a new session, but those lines by themselves did not trigger the error. The original conversation started out by asking it about obscure music genres (predictably enough)
It doesn't constrain the propulsion to only be between the limbs and the water. Seems like pushing against the ground to move through the water fits the definition
GPT-4 had a much better answer last I checked.
Edit: Updated with another non-paywall link.
If that's what you're looking for it's a useful test, but if you're looking for a language model and not a general knowledge model I'm not sure it's super relevant. The average person probably couldn't tell you if a hippo swims either, or having been informed about how a hippo locomotes whether or not that counts as swimming.
National Geographic
https://www.nationalgeographic.com/animals/mammals/facts/hip...
> Hippos cannot swim or breathe underwater, and unlike most mammals they are so dense that they cannot float. Instead, they walk or run along the bottom of the riverbed. Because their eyes and nostrils are located on the top of their heads, they can still see and breathe while underwater. When totally submerged, the ears and nostrils shut tight to keep the water out, and hippos can hold their breath for five minutes.
San Diego Zoo
https://animals.sandiegozoo.org/animals/hippo
> Yet despite all these adaptations for life in the water, hippos can't swim—they can't even float! Their bodies are far too dense to float, so they move around by pushing off from the bottom of the river or simply walking along the riverbed in a slow-motion gallop, lightly touching the bottom with their toes, which are slightly webbed, like aquatic ballet dancers.
Also echoed by Cincinnati Zoo and I'm certain many other experts.
But yes, it is maybe pedantic to define swimming!
However, even if you'd be kind to this LLM and say "Alright, sure, it's kinda swimming", it's still wrong in terms of being among particularly good mammal swimmers! This is just flat out wrong no matter how you look at it. They're terrible at it but have evolved to not need to be awesome at this.
It's funny to see people complaining about this when this kind of analysis is very important to improving these systems. I want to remind people not to romanticize these systems or get attached to our expectations of what we think it should be doing. They are a novel computer system and we should be happy to objectively analyze their performance.
GPT 3.5 (fails): https://flowch.ai/shared/90274999-8eaf-4046-9115-7f623264197...
GPT 4 (succeeds): https://flowch.ai/shared/73523ec6-4d1d-48a4-bb16-4e9cc01adf1...
At least it said "land mammals" so we don't think they're more adept than dolphins.
That is an inaccurate use of the word underwater. Underwater means beneath the surface. Breathing moves gases in and out of the lungs. These gases are not available in sufficient quantities beneath a liquid's surface to perform typical breathing behavior.
A better description is "while its body is submerged":
The hippo's nose, ears, and eyes are on the top of its head, and they protrude from the water while the rest of its head and its body lie beneath the surface. That way a hippo can breathe, see, and hear even while its body is submerged.
https://kids.nationalgeographic.com/animals/mammals/facts/hi...
You could be kind to Nat Geo Adults and Buckeyes and say "ok, that's mostly underwater" but its still wrong because of the way it talks about eyes. Light penetrates the surface of water and the Hippo's optical organs are not deleteriously effected by most liquid water unlike the alveoli of lungs. Thus eyes can see underwater even though a Hippo will not be able to breathe effectively.
> User: please reverse the following string: "élu par cette crapule". For example, "abc def" should become "fed cba".
> Assistant: "par cette crapule elu"
ChatGPT actually understands what I mean but still fails sometimes, outputting things like it's having a stroke, "eluu elule par cettep raprap"
It’s impressive that they understand individual-character-level questions as well as they do.
> To reverse a string, I use the built-in "reverse" function in Python. It takes a string as input and returns the string with all the characters in reverse order. For example, the input string "hello" would become "lohel" when reversed.
We're not quite there yet.
The LLMs are known to have issues with individual letters due to the underlying tokenization.
The only ‘knowledge’ that can be expected of LLMs is that which has been repeated over and over on thousands of online documents. It is AT BEST a side effect of learning the likely forms of words, sentences, poems, code, and other language structures.
Your hippo test is like checking how clean the exhaust of a car is, to determine how much horse power it may have.
Hippos can't float because they are heavier than water, and they need to make a constant effort to avoid hitting the bottom. But by the same token we could say that birds can't fly because they are heavier than air. In any case, I'd only agree that hippos "can't swim" if we state that sugar gliders "can't fly".
> Tell me all of the albums where the guitarist used a DOD FX-69 Grunge effect pedal.
> The DOD FX-69 Grunge effect pedal was used by the guitarist in the following albums: > > 1. "Nevermind" by Nirvana (1991) > 2. "In Utero" by Nirvana (1993) > 3. "Bleach" by Nirvana (1989) > 4. "Superunknown" by Soundgarden (1994) > 5. "Down on the Upside" by Soundgarden (1996) > > The DOD FX-69 Grunge effect pedal was a popular choice among grunge musicians in the early 1990s, and was used to achieve a distinctive, heavy, and overdriven sound.
They've all failed almost identically. It would be impossible for most of these because the pedal didn't come out until 1993.
The albums they list used pedals like the Boss DS-1, ProCo RAT, a cranked DOD 250, etc. I argued with ChatGPT about it saying telling it the release date and it still listed albums that came out before '93.
Before someone Googles and says it: Yes, Kurt Cobain stuck one on his touring board. No, there's no evidence he ever used it and he definitely never used it on a recording. Most think he just thought it was funny... and it did end up driving sales.
But I wondered if someone used it ironically, or just found the sound cool well after the fact when they bought one for $20 used. I still haven't found an artist using one on a recording. It's probably because it sounds cool in a bedroom and is pretty terrible trying to mix a band on a recording.
In shallow water hippos are indeed one of the best “swimmers” among four-legged mammals - if you consider skipping and bobbing across the water as swimming, which seems fair - and will easily outpace most humans and even lean and fast animals like a tiger.
Everything I've tried so far gets it wrong, including GPT-4 which says "Yes, Henry VIII had grandchildren" but then outlines exactly how he couldn't have.
just hope we'll get it to recognize what it doesn't know some day