Though of course deep sky photographers do actually just this - many exposures composited together - but then we are talking about subjects so dim you need effective exposures of 30 minutes or so.
A bit of a digression but the reason why phones need so many exposures to get a decent low light image is due to their sensors/optics being relatively poor at low light. You would only need a small fraction of the exposures from a modern pro-grade camera to generate a comparable final image. This also lets you capture objects in motion in low light far, far better since you can get your required image data in a smaller temporal window.
This assumes the (typically Gaussian) noise is applied to a static image. Arguably, one could exploit the slight shakiness in a handheld shot to create an image with even less noise.
As a thought experiment, consider thousands of shots of a perfectly static scene made with an idealized, noiseless camera that is moved a very tiny amount for each shot. You could continue improving the resolution of a generated, composite image quite a bit until warping due to camera displacement became noticeable.
Recent techniques like this are actually being used for cryo-electron microscopy to create extremely high-resolution imaging of proteins.