> With enough computation, your neural net weights would converge to some very compressed latent representation of the source code of DOOM. Maybe smaller even than the source code itself? Someone in the field could probably correct me on that.
And they're not wrong! An ideally trained network could, in principle, learn the data-generating program, if that program is within its class of representable functions. I might have a NN that naively looks like it takes up GBs of space, but it might actually be parameterizing a much simpler function (hence our ability to prune/compress the weights without performance loss - most of the capacity wasn't being used for any interesting computation).
You're right that there's no guarantee that the model finds the most "dense" representation. The goal of regularization is to encourage that, though!
All over the place in ML there are bounds like:
test loss <= train loss + model complexity
Hence minimizing model complexity improves generalization performance. This is a kind of Occam's Razor: the simplest model generalizes best. So the OP is on the right track - we definitely want networks to learn the "underlying" process that explains the data, which in this case would be a latent representation of the source code (well, except that doesn't really make sense since you'd need the whole rest of the compute stack that code runs on - the neural net has no external resources/embodied complexity it calls, unlike the source code which gets to rely on drivers, hardware, operating systems, etc.)