The Maximize button in the top right did look like a box-drawing character, but that's really about the only resemblance I see.
(I developed a DOS GUI for email, Transend PC, in the early '80s that used box-drawing characters, and soon after started writing SQLWindows for Windows 1.0, so pretty familiar with both.)
What I'm saying, is that I'm pretty sure that a lot of the Windows 1.0 GUI was drawn by emulating the "technique" TUIs use to draw box-drawing characters, but on a framebuffer and with a custom (monospace) bitmap font:
1. create any-and-all graphical detail you need on screen, by repurposing the "leftover" parts of the bitmap font you're using to draw control-label text to the screen, adding a set of additional, custom symbol-drawing elements, which are all 'characters' of that bitmap font, and so all stuck being the same size+shape as the label text;
2. create a graphical "monospace text" drawing primitive, that takes as input a pair of buffers representing the text itself, and its hardware-text-mode-alike per-character drawing attributes (i.e. FG + BG color);
3. implement your OS widget library almost entirely in terms of calls into that graphical "monospace text" drawing primitive, passing a static data-section buffer holding the positional+attribute data for your box-drawing character.
(For example, look at the drive icons in the File Manager. Those are just clearly just "text" composed of four drawing-element characters from the bitmap font, like this: [-=-]. So the whole drive-chooser area can be drawn with a single text-draw command, passing a string like "A[-=-] C[-=-] C: \WINDOWS".)
-----
You might say "but look at any screenshot of Windows 1.0 — the first thing you'll notice is that the menu-item labels in each window's menu bar are offset by a half-character-width horizontally! And modal dialogs are, in their entirety, offset by a half-character horizontally and vertically! How's that possible?"
Well, the GUI "monospace text"-drawing primitive might be drawing a grid of characters to the framebuffer; but it's not drawing them to an imaginary grid on the framebuffer. It accepts an arbitrary pixel offset for where it should start to draw the block of monospace text.
So, with that in mind, the algorithm for drawing the menu bar very likely has two passes:
1. render some box-drawing characters representing the "background" of the menu (i.e. yellow with a black border)
2. render a layer of regular non-box-drawing characters on top, offset by +4px on the X axis, in "black on transparent", representing the menu-item labels.
It's pretty clear (to me, at least) how a drawing algorithm like this could be a natural evolution and outgrowth of a TUI: first, replace the TUI's backing text buffer with a framebuffer, and the character-plotting calls with draw calls to a "draw monospace text character at emulated-grid position on framebuffer" primitive; then refactoring the draw calls to use a window-local coordinate basis; and only then adding the ability to draw anything other than monospace characters—but gradually, starting only with defined grid-snapped "rich graphical content" regions within windows (sort of like how Windows today has "Direct3D drawing surface" regions); and then going back and gradually enhancing the GUI widgets with little flourishes like draw offsets.
Within the Windows 1.0 codebase, I would bet money that—at least in some previous revision in early development—there was probably a #define flag for whether the "framebuffer driver" was enabled; and that all these windowing-system and common-controls drawing algorithms were written in a "hybrid" way where, instead of one TUI-based and an entirely-distinct framebuffer-based implementation, the framebuffer-based implementation is just #ifdef'ed ornamentation on top of the base TUI implementation.
Perhaps parts of the Windows 1.0 GUI could have been implemented like that, but the truth is much simpler and more mundane.
The bundled apps (including MS-DOS Executive) and the window decorations generally used the same GDI (Graphics Device Interface) calls as third party apps:
Text was drawn with TextOut() or DrawText().
Bitmaps were copied to the framebuffer with BitBlt() or StretchBlt().
Lines were drawn with MoveTo() and LineTo().
Rectangles were drawn with Rectangle() or RoundRect().
This is not an exhaustive list but should give you the general idea.
All of these functions operated on a "device context" (DC) that you obtained with functions like GetDC() or CreateCompatibleDC(). Some, like BitBlt(), used two DCs for the source and destination.
The MS-DOS Executive drive icons were bitmaps drawn with BitBlt() with TextOut() for the drive letter. The selected drive letter and icon were inverted with InvertRect(), or possibly drawn with the DSTINVERT raster operation code.
These are the same functions that any Windows application could use. The MS-DOS Executive was just another app.
The non-client area of a window (titlebar and such) was drawn with the same GDI calls as the client area. Your app would get a WM_PAINT message to draw the client area and a WM_NCPAINT for the non-client area. Most apps passed WM_NCPAINT through to the default handler DefWindowProc().
I was programming Windows apps starting with the first release of Windows 1.0. I spent a fair amount of time reverse engineering other Windows apps and Windows itself, along with people like Matt Pietrek and Andrew Schulman.
Andrew in particular would have devoted an entire chapter of his book Undocumented Windows to the text-based system you describe. It would be a field day for him!
Also consider memory limitations and programmer time. Remember that Windows 1.0 ran (slowly) on a machine with 256KB memory and two floppy drives.
Since they had to build GDI anyway, it would take more memory to also include this text-based system. It also would have taken more developer time, and provided less testing of the public APIs.
But again I do appreciate your interesting speculation!