Yeah, no one is going to do that, dude. No one who has a good idea of what they'd like this so-called metaverse to look like is going to wait around with their thumb up their bum while they wait for the IETF to sort out HTTP/3.
Everything you need to do this today already exists. You're right, you don't need to reinvent the wheel, but you'd just be screwing yourself working with a platform as crusty and stupid to work with as the web. You can reuse web protocols, but any sane person today will probably end up building a user agent unburdened by the stupidity web standards groups.
Plus a majority of usefulness wouldn't be done with HTTP. It would be done with DNS so you can do things like vrp:sweetbroandhellajeff.world, reliable UDP for anything meaningfully real-time, and some RPC protocol for packet handling. Why? Because video games do that today.
If you built a fork of Quake and exposed a way to "navigate" to worlds with an address bar of sorts, you'd be 80% of the way there.
You'll need a level format to standardize on, and various other standards would need to emerge, and most of them are probably going to be derivatives or outright ports of existing ones that are in use today.
An informal standard would emerge based on a popular client, then some attempts to hijack it would emerge. Why? Because that's what happened in the past, and what would happen all over again.
If you want to do it quickly, you're not going to use a web browser. You're talking about the same category of people who refuse to implement client side include for the web.[1]
[1]: https://github.com/whatwg/html/issues/2791
Edit: People already know what the requirements are at a minimum because video games exist, and all VR tech today is based on long-existing video game development techniques.
Do you want a floor? Walls? Textures? Guess you're gonna need a level format to standardize on then, yeah? Yeah.