The problem is the granularity of trust within the system. Were the MMU much faster, and TLB much larger (say, 128MiB of dedicated SRAM), the granularity might be pretty high, giving each function's stack a separate address space insulated from the rest of RAM. This is possible even now, just would be impractically slow.
Any hierarchical (tree-based) addressing scheme is equivalent to a linear addressing scheme, pick any tree traversal algorithm. Any locally-hierarchical addressing scheme seemingly can be implemented with (short) offsets in a linear address space; this is how most jumps in x64 and aarch64 are encoded, for instance.