Yes, and in certain cases of the same generation of chip (e.g. same microarchitecture but
fewer number of cores and/or less memory per core; no problem if you compiled for a small number of cores/less memory and it is run on a "bigger" chip) as the compiler would need to remap the program and data location based on the global address map.
It is a very simple pipeline, and we expose the exact latencies required for all operations, along with things like branches with delay slots. As I have mentioned ad infinitum, determinism is a key part of our architecture, and having a fixed pipeline is necessary. Plus, we want anyone crazy and skilled enough who wants to hand write assembly the freedom to be crazy ;)
For the applications (HPC and DSP-like stuff) we are targeting, source code is always available, there are very long periods between when you have to recompile due to source code change, and optimization is a key factor. Our customers aren't only accepting with recompiling for every new generation of hardware, they expect it and want to be able to take advantage of any new improvements that the compiler would be able to make.