As a sibling comment points out, DSPs can be an exception. I’m far from an expert on them, but CPUs that try to run all instructions in the same fixed amount of time used to accomplish that by avoiding complex addressing modes and by omitting the really slow instructions (division and instructions that push/pop multiple registers onto/from the stack being prime examples).
IIRC, some of them also accomplished it by running some of the simpler instructions slower than necessary (raw speed isn’t as important as being hard real time isn many applications)
easy solution, as you allude to, don't have a division instruction! arm doesn't, 8048 doesn't, sparcv7 doesn't, even the cdc 6600 and cray-1 didn't, and even risc-v finally got zmmul: https://wiki.riscv.org/display/HOME/Zmmul. it's not just dsps
the big issue with complex addressing modes is i think fault handling. if your nice orthogonal add instruction updates two registers as it reads operands and a third register with the sum of the operands, what do you do if you get a page fault on the second operand? if the os can service the page fault, how does it restart the instruction?
as you point out, real-time latency is also an issue. on older arm chips you have to be careful not to ldm or stm too many registers in a single instruction so as not to damage interrupt latency. newer arm chips can restart the ldm/stm
i'm no expert on the area either
Heh, that's an understatement. The 8048 doesn't even have a subtract instruction, much less divide.