> include stack locations in your register renaming scheme
Registers aren't related to the stack. "The" stack is just RAM being accessed in a specific cache friendly pattern, with additional optimizations (if you use specific registers) from the hardware in the form of the stack engine. The compiler explicitly loads and stores to and from the registers named by the ISA. Register renaming has absolutely nothing to do with the stack.
When the CPU can tell that a later instruction doesn't depend on the previous value of a register, it's free to rename it. The result is that two independent registers get used even though only one was ever directly referenced. In reality, there are a _huge_ number of registers available on modern processors. Estimates place Skylake, Zen, and Cortex-X1 at 200+, with the M1 at 600+. The ISA just doesn't provide a way to access them directly. (If you want to read about this, the term to look up is reorder buffer.)
Also, there is a giant out of order buffer for stores waiting to be written back to L1. That buffer does indeed have to keep track of cache locations, which directly map to memory addresses, which sometimes happen to refer to stack locations. So in a sense, what you suggested already exists. (If you want to read about this, the term to look up is store buffer.)
> it seems much easier to find parallelism if a bunch of adjacent instructions are explicitly independent
That would indeed make things simpler in some cases. However, many operations such as loading a value into a register (ex mov, [addr]) or zeroing it (ex xor eax, eax) explicitly break the dependency chain by definition. Cases where the CPU fails to properly account for this are documented as false dependencies.
> the compiler can then help the cpu keeping all those instruction units fed
The "compiler handles ordering" thing was tried with Itanium. It seems it didn't go so well.
The CPU is free to simultaneously load two different pieces of data into the "same" register and execute two independent instruction streams on that "single" register thanks to renaming. Speculative execution helps when the CPU can't be completely certain that there isn't a dependency.
For particularly complicated sequences, the compiler spilling due to running out of named registers could indeed pose an issue. However, the CPU is free to elide a store followed by a load if it determines that the address is the same. (If you want to read about this, terms to look up include store-to-load forwarding and load-hit-store.)