"can't be fixed without breaking ABI" sounds plausible for C++.
There's generally not all that much stdc++ specific optimisation stuff in clang. There might be parts of regex that are worth implementing as compiler intrinsics, that seems to be the existing pattern for making bits much faster.
The really heavy lifting you want for regex is to partially evaluate and split them. They're a separate language unto themselves and benefit from being optimised as such. There's nowhere ideal in the clang/llvm pipeline to do that though.