> fact that downward bump is just not what the HW expects you to do.
What fact is this? and what does it have to do with stack being bounded? (bounds of stack are unknown to most micro architectural state and only stored in OS structures).
HW prefetchers support access pattern in either direction, so do load/store instructions (stm(fd)/ldm(fd); ltp/stp;)