1) Asset bundles are the Unity "favored" means of supporting dynamic content. However, they are extremely heavy and have to be authored in the Unity editor. So you could do things like release new levels for an offline game, or new environments and items for a MMO, but user-generated content gets really hard to do. It's possible, with running the Unity editor headless, but that's so fraught with peril that it really shouldn't be considered.
2) Primitive, binary assets like textures and audio tracks are the easiest thing to load over the 'net in Unity, but last I checked, decoding them was still implemented on the UI thread. The download itself happens off-thread, but you'll have too large of a performance hit for devices like the Oculus Quest 2 with even a few textures: you will drop frames all over the floor. It's so bad that I had to find, fix, and compile in a full C# implementation of JPEG just to support dynamic texture loading without dropping frames on HoloLens 1, Oculus Go, and Quest 1. I quit using Unity by the time Quest 2 came out, but it's not so much more powerful that it would move the bar far enough.
3) Again, for primitive assets, the raw, decoded data may not be the final format that you want. To use memory efficiently, there are compressed texture formats that are supported directly in GPUs. Surprise, surprise, there is no 100% cross-platform format, so tool's like Binomial's Basis can transcode between formats. This is built into Unity's asset pipeline; if you start life with a PNG file for your image, statically loaded in your Unity scene, it will get transcoded into whatever compressed format the graphics APIs that your target operating systems support. Hence part of Unity's need to have target platforms specified.
4) For 3D models, you need to figure out where you want to lie on the spectrum of small, network-transmission friendliness vs. ease of parsing. That model will then need to be converted to Unity GameObjects and Meshes, which again, takes place on the UI thread. I know of no workaround for this, other than blanking the user's view out to black just before the object creation happens so they don't see the dropped frames.
If all you're making is a card game on smartphones, nobody is going to notice dynamic asset loading causing dropped frames because your "loading" screen isn't tied to their face. But in VR, it's basically table stakes, and Unity makes half of it very hard and the other half impossible.
Of course, I think those improvements are only for asset bundles so if that's a no-go for you then there's not been much progress.