I'm on a quixotic mission to explain how it became "common knowledge" GPT4 is a trillion parameter mixture of experts model, despite clear denial from OpenAI's CEO. Full recounting: https://news.ycombinator.com/item?id=36828878
[1] https://www.theverge.com/23560328/openai-gpt-4-rumor-release...
[2] https://twitter.com/soumithchintala/status/16712671501017210...
You cited geohot as an expert on OpenAI[1], and to indicate skepticism Altman denied it, you fixated on the # of parameters, cited a Verge link to a chart in a random tweet about 100 trillion parameters, that it didn't show Sam Altman, and it didn't ask Altman about 100 trillion parameters specifically. And if it did, what does that have to do with mixture of experts?
I flipped to 3 to -2 within 30 minutes of you posting this.
"A lie gets halfway around the world before the truth has a chance to get its pants on." - Churchill
[1] never worked at OpenAI, no notable domain expertise, and a Twitter intern in 2022.
2022/11/11: A viral tweet claims GPT-4 will have "100 trillion parameters."[1] At this point, there were no rumors about mixture of experts.
2023/01/16: In an interview, Sam Altman mentions he saw the tweet and it was "complete bullshit."[2]
2023/06/20: geohotz and the lead of PyTorch, two people who would be expected to have relevant connections, claim that GPT-4 is an 8 x 220B mixture of experts model.[3]
These are two separate, unconnected rumors. One was denied by Sam Altman and was never plausible in the first place. The other was never denied and is highly plausible. You are conflating them by claiming, without any source, that there was "a clear denial from OpenAI's CEO" that "GPT4 is a trillion parameter mixture of experts model."
[1] https://twitter.com/andrewsteinwold/status/15948895625260277...
[2] https://youtu.be/ebjkD1Om4uw?t=313
[3] https://twitter.com/soumithchintala/status/16712671501017210...
I don't know if you are right or not, but I've been shocked at how quickly people flipped to just accepting that GPT4 was a mixture of experts model given the scant evidence to support the claim.
It is possible, but not particularly likely.