They don't have to be any particular shape or size. The property of being virtual overrides everything else when free of these self-imposed constraints.
Even if you lose the GUI and go back to text, the ideal terminal is a plane of infinite columns of arbitrary cell size that dynamically fills your field of view.
I'd further argue that the only reason VR/AR isn't more widely adopted is the lack of orthographic vs perspective modality per application (and uncomfortable headsets). In VR/AR, you don't want a window manager or even windows at all. What you want is a field manager (as in FOV "fields" of varying opacity that can be composited by the user). Shape and size is just an arbitrary region blended in with the environment.
For the sake of ergonomics, you'd more often prefer to project an interface onto a surface if you had the choice. When you don't, you probably want the projection to be orthographic, but for the edges to be fuzzy if not invisible. You'd generally want to be able to layer these interfaces as well instead of having opaque rectangles always in your way.