For example look at the top end Raspberrypi sensor. It's a pathetic 12MP. That's like a ten year old phone or so?
I think the processing is also not to be entirely dismissed. There is frame stacking that extends the dynamic range and there is compression and other complex DSP going on that is necessary (b/c 50MP of raw pixel data is a ton of raw data to pull off the sensor). Realistically you probably can only do some of that in software