Personally, I think it goes back to early Windows originally supporting very low resolution graphics cards where you simply couldn't display more than a single widow's worth of content at once.
Remember that EGA was only 640×350. You had to have a user interaction method that worked at such a low resolution.
Having things optimised for drag and drop between multiple windows is the reason for Classic Mac OS to only maximize a window enough so that the entire contents of the window is visible, instead of taking up the whole screen.