These models are nearing 2+ trillion parameters. At 4 bits each, we're talking about somewhere around 1tb of RAM.
The problem is that RAM stopped scaling a long time ago now. We're down to the size where a single capacitor's charge is held by a mere 40,000 or so electrons and all we've been doing is making skinnier, longer cells of that size because we can't find reliable ways to boost even weaker signals, but this is a dead end because as the math shows, if the volume is consistent and you are reducing X and Y dimensions, that Z dimension starts to get crazy big really fast. The chemistry issues of burning a hole a little at a time while keeping wall thickness somewhat similar all the way down is a very hard problem.
Another problem is that Moore's law hit a wall when Dennard Scaling failed. When you look at SRAM (it's generally the smallest and most reliable stuff we can make), you see that most recent shrinks can hardly be called shrinks.
Unless we do something very different like compute in storage or have some radical breakthrough in a new technology, I don't know that we will ever get a 2T parameter model inside a phone (I'd love for someone in 10 years to show up and say how wrong I was).