This repo[2] by Meta achieves 48% MFU, or 80k token/second.
(1T tokens / 63k tokens per second) / (60 seconds per minute * 60 minutes per hour)
Is approx 4400 hours
So I guess that’s how the calculation went.
Or did you mean a source for the number of tokens per second?
Hope these AI PCs will run also something better than 1B model.
What is it useful for ? Spellcheck ?
As a chip maker - they will also have some undersold, QA, or otherwise wasted parts available for these training efforts - so the capex is likely less severe for them compared to a random startup betting on AMD.
AMD has great hardware, but they never could be assed to do anything about their software.
Which means you can do larger but it’ll become ever slower
It seems actual domain specific usefulness (say specific programming language, translation, etc) starts at 3B models.