You lose binary portability though: what is you can write you image processing code such that it operates well whether your SIMD instructions is 4, 8, or 16 wide? (or more).
Then shipping this binary for mobile phones for instance would take advantage of the wider register available on high-end device while working well on more modest CPUs.