This doesn't work because you're not left-shifting (doubling) the carry. But when adding the shifted carry to (x ^ y) we're back to potentially overflowing the highest bits.
The solution is to add the highest and the lower bits separately:
lower = 0x7f7f7f7f;
highest = ~lower;
z = ((x & lower) + (y & lower)) ^ ((x ^ y) & highest);
Note this only improves performance for larger container integers.