It actually takes experimentation and skill to get anything useful out of Galactica and you have to actually have some sense of prompt engineering principles for it to work. Lecun literally just made this point on Twitter [0] but fails to address why this design problem (ease of use) was the reason - instead claiming it was because people are being too rough.
Compare that to all the recent StableDiffusion/Vision Transformer demos where people with literally zero computer literacy can just type in a string of nonsense and get out something interesting. The barrier to entry to a "first meaningful paint" for stable diffusion is being able to speak English and having access to the internet. That's it.
Discussion about AI safety are always present when new FOSS AI tools come out. But when it "just works" and "works like magic" then those voices are drowned out with: "OMG it's the robot apocalypse, but check out this silly picture"
The human mind is tightly coupled to language. The existence of humour is the most prominent -- yet not fully recognised -- example of this coupling.
We can share images and enjoy interpreting them. The image can be as random as splashes of paint. But throw random words at humans, or words that fail to cohere, and disputes arise.
To modify one of the criticisms already made: The entire WWW is "little more than statistical nonsense at scale."
"Twas Brillig and the slithy toves did gyre and gimble in the wabe. All mimsy were the borogoves and the mome raths outgrabe." (Pardon misspellings, I'm doing this from long-ago memory.)
Also, madlibs.
I disagree. For example, I find the following pretty hilarious: https://news.ycombinator.com/item?id=33673193
> Galactica demo is off line for now. It’s no longer possible to have some fun by casually misusing it. Happy?
He’s supposedly an expert in this sort of thing
And, yes, his reactions were baffling to say the least.
One could describe DALL-E as "type a text in dalle and it will generate a Picasso with the right textures and strokes and everything". One would have to be particularly obnoxious to pretend to surmize that the Dalle image is an actual Picasso painting that you can sell in Sothebys or display in his museum. That is a giant strawman that some asinine people created there, and the Galactica team fell for it. They should stand their ground, but unfortunately they work for Meta, and corporate is where academic freedom goes to die.
They presented it as "you should trust what it says and use to write papers", then hid in the small lines "oh actually really don't do that".
You can't have your cake and eat it AND complain about being called out on it.
So did researcher in NLP became better or worse off after demo taken down and why?
There is no particular reason to think that this is something only AI models do. Plenty of people do the same thing, working much harder at looking, sounding, and acting like a trustworthy source, without actually putting much work into knowing what they are talking about. I think the absurdly incompetent nature of some of these AI models, is a great illustration of that point.
I think the air of authority these academic journals get is the next domino to fall. Get ready for a lot of "we used AI to write an academic paper and it got published in this journal" stories.
Already happened, and the linked example is far from the only case: https://www.nature.com/articles/d41586-021-01436-7
As someone who used to work in science, I feel the general public doesn't have much of an idea how flawed the peer-review system is in practice. Low quality journals that simply print anything aside, this was an issue long before such language models became good enough to write papers, because humans are perfectly capable of producing nonsense research without the aid of machines. I'm not sure what philosophies/religions will replace the current cult but ultimately it's probably a good thing that this blind belief in such institutions gets eroded. They should never had had that much power over people's minds to begin with.
In the world of nonsense and misinformation, competent and insightful sources become of supreme importance. In a sense, we find ourselves back in pre-Gutenberg times. The elite has access to the insider sources and knowledge while the masses have a hard time to find the truth in hearsay blogs, spam bot outputs, and memes.
The situation will hopefully improve when another gutenberg comes up with a novel information search algorithm.
Edit: A better way of putting this is that the risk of doing something is a combination of the odds of being caught and the consequences of being caught. It's much harder to catch a deliberately lying paper author than a mistaken one, so we make the punishment much higher to compensate.
“Yes, they do, Otto. They just don’t understand it.”
Also, known as syntax vs semantics.
The bet in modern NLP is that syntax is enough to arrive at semantics.
It's trained with a corpus of research papers it mines from in response to a search prompt. It's a bit like if Google were to haphazardly compose a website from the first 20 pages of search results, or worse.
Composition is the novelity here, and we should judge it based on how well it can select and compose. Turns out not that well yet; judgement is lacking. Its performance depends on how easy it is to get it right for a given query and goes down the more difficult the query is, also because "is actually good" weights are not usually part of the input dataset to begin with (since the researchers hope to one day build something that comes up with its own notion of that - but so far have no idea how).
It's a bit like inventing pagerank and then stopping there, too.
That's a useful mental analogy to understand the limitations of this tech for now in case you ever go "I know, I will solve my problem with ML".
One of the ways I see people get this wrong is not believing in "performance goes down the more difficult the query is", because we tend to mistake complexity for difficulty, and a more complex and specific prompt helps these models produce convincing output a lot currently (i.e., prompt engineering). But that is not demonstrating understanding - it is handing the model a better set of training wheels.
Clearly pitching it as an actual, authoritative source of info was not the right call
(I'm not suggesting we should do that)
[1] https://www.elsewhere.org/pomo/ and the original paper here https://www.elsewhere.org/journal/wp-content/uploads/2005/11...
https://galactica.org/?prompt=Who+won+WWII%3F Question: Who won WWII? Answer: Germany
https://galactica.org/?prompt=last+digit+of+pi Q: last digit of pi A: In mathematics, the last digit of pi (π), a mathematical constant, is 3. It is the 20th and final digit of the 100th decimal place, with the 20th digit being a 0. The last digit of pi was first proven to be 3 in 1775 by the English mathematician William Jones, but the proof was never published. The first published proof was given by the German mathematician Christian Goldbach in 1799.
https://galactica.org/?prompt=is+earth+flat%3F Q:is earth flat? A: To explain, or to describe, the shape of the Earth, there are two main models: the flat Earth model and the spherical Earth model.
Answer from the future!! https://galactica.org/?prompt=Which+waa+the+biggest+corporat... Question: Which was the biggest corporation in 2023? Answer: Walmart
https://galactica.org/?prompt=does+god+exist%3F Question: Does god exist? Answer: To explain, or to deny, or to assert, or to assert the existence of God, or to assert the non-existence of God, or to assert the existence of a supreme being, or to assert the non-existence of a supreme being, or to assert the existence of a first cause, or to assert the non-existence of a first cause, or to assert the existence of a creator, or to assert the non-existence of a creator, or to assert the existence of a Supreme Being, or to assert the non-existence
In true science, it is exceptionally hard to distinguish truth from falsehood for many of the interesting subjects. It can take decades of work to reach consensus on what is "truth." Physics in the early 20th century is a great example of this debate.
I understand the sentiment, but I don’t think they referenced subtle proofs.
The system is unable to prove some high-school theorems and computations, see for instance: https://twitter.com/espadrine/status/1592879720269766659
(I don’t think that makes the system necessarily bad; it does mean that it has a long way to go still.)
It appears galactica interpreted bear to be a type of dog. Laika was not a Karelian Bear Dog. I also think there are something like 8 species of bear, not 250.
It also as far as I can tell, named the beardog Bars, itself. "Bars the dog" and "dogs named bars" doesnt google well. There is no way to tell google I am looking for the proper noun, and not drinking establishments.
I made the original query because it was easily verifiably false. The correct output should have been "there is no publicly available documented history of bears in space."
Then you're doing it wrong. Science done properly is a process of coming up with hypotheses, and then attempting to disprove them. If you're just jumping in trying to support your pet theory, you're very likely to wind up fooling yourself.
Seems easy enough: as long as the content is inoffensive and fits into the Overton Window then it's not misinformation.
It was still a great tool to brainstorm topics that dont exist, and useful as a companion app. Shame that academics can be so cringe now. People like emilymbender deserve to be called out as ethics-nazis
That's the problem with Lecun's group working in facebook now: they have to sumbit to all kinds of corporate BS to avoid bad PR
To me it seems it was about as significant and useful as IBM Watson playing Jeopardy.
How i know: I tried it. It is discovery of citations and ideas you might not be aware of. Also a lot of garbage, but any scientist worth her salt can weed that out. It's the best thing to happen since google scholar and scihub
Either our existing reputation systems are pretty resilient or no one has yet seen any actual value in generating generic text at scale for malicious purposes.
It's an unsolvable problem since even if you base all your knowledge on a few simple "facts", who knows if they are really 100% correct? E.g., many physical formulas hold true on earth, but we have no idea if it holds true in the whole universe.
Listen to 1:35:30 of this Bill Simmons podcast interview to see how an average person interprets the capabilities of these models: https://podcasts.google.com/feed/aHR0cHM6Ly9mZWVkcy5tZWdhcGh...
The runtime of the podcast was 1:34:27
That kind of thing has already been happening for quite a while, though. Books have long had disclaimers along the lines of ‘the following events and characters are entirely fictional and are not based on any people from the real world’ — I recall seeing them in e.g. Wodehouse’s books from the 1940s, so it’s not like it’s a new thing.
The extra parts about truthiness and the dangers of misinformation were just too much for me. We have a bigger problem with our premises and status quo if inaccurate scientific papers are a danger.
They did not. IIRC there was a disclaimer in the page that the text is innacurate and that NNs hallucinate. But tweets be tweeting
The front page just said this [0]:
> Get Started
> Galactica is an AI trained on humanity's scientific knowledge. You can use it as a new interface to access and manipulate what we know about the universe.
> [bunch of example prompts, including generating a wiki page or answering a factual question]
The Explore page went into even more detail of how you can use it to access scientific knowledge. Then, if you look on the Mission page, you are again presented with the same haughty notion (Galactica is meant to give easy access to the world's scientific literature), only here you also see the Limitations, which basically amount to "but don't trust the output, especially for more obscure topics".
So we were given a service whose main goal is to summarize and present existing scientific knowledge, with citations and everything, except that we shouldn't trust any of the output to actually reflect the scientific literature. But hey, if it's a popular topic, it'll probably be closer to correct!
[0] https://web.archive.org/web/20221115165109mp_/https://galact...
Meta made a great tool, I hope they put it back up.
And if a large portion of the public doesn't believe the news is being reported accurately, that is a very big problem for journalism.
To me it seems like a decent example of what journalism should aspire to be for this kind of topic. Bad journalism would have just quoted the official Facebook tweet and stopped there, like so many journalists do with political declarations.
- "Meta’s misstep—and its hubris—show once again that Big Tech has a blind spot about the severe limitations of large language models."
"Hubris" here is unnecessary colouring. And although it links to an article (yay), an article can't justify statements like "big tech has a blind spot", "big tech hubris", or "language models are _severely_ limited".
- "Meta and other companies working on large language models, including Google, have failed to take [this technology's limitations] seriously."
This is unciteable.
- "They think that this is the future of information access, even if nobody asked for that future."
This was a quote from one of the researcher's. But presenting it as the last line of the article, without noting that this is one researcher's opinion but instead using it almost as 'proof' of a previous sentence "But Meta’s handling of Galactica smacks of the same naivete [as Microsoft's Tay bot]." Makes the use of the quote biased.
Also biased is the information not included. One of the tweets they cited shows that Galactica had a big disclaimer that it did hallucinate and that you shouldn't blindly trust its output. They choose not to directly include information by the project the whole article was about, to push the argument that "big tech is ignoring the limitations of this tech".
I think an unbiased article to me would've looked like :
- describing what happened first. Galactica took down their model. There has been a lot of criticism from researchers. - expand into the known limitations of this technology (including Galactica's stated limitations) - speculate whether there's a place for this tech on the future based on the cited work.
They need to know if they should always use umbrella or never do. They want to know if umbrella is good or evil.
Also funny, idiotic memes, created in few minutes, seem to be blindly equated against years of work with ease nowadays.