CPU architectures are quite stable. SSE2 is almost 20 years old now. You can't even run modern Windows on a system which doesn't support it.
Vectorize to SSE and you'll get your 50% of potential performance. You can do it without any new paradigms, C and C++ support SSE intrinsics for decades already, other languages are catching up.