I'm not trying to be a demoscene smart-ass here, either (I expect demoscene masters would've packed the equivalent in under 1MB). Just that, models are low-poly, the world is small and made up from pieces that feel simple to describe; as long as you're not trying to bake everything into a static set of meshes, but willing to encode them at a higher level, 5MB seems like plenty.
For a dimensionally reduced analogy, the 2D equivalent would be a perfect example of an image that's very large in raster form, but quite small in vector form.
Are my intuitions widely off here?
Note that this does not make it any less impressive - on the contrary, I'm amazed by how much detail and soul is there to this world, despite apparent simplicity. I'm also amazed at how navigable this world is. I've made many stupid moves that I was dead sure will wedge me between walls, or get me stuck in a nook with no way of going back; but none of that happened. They must've put a lot of thought into the design, and it wouldn't surprise me if they manually mapped out the world to ensure there's no one-way paths.