They also avoided the patented instruction issue completely by removing all alignment restrictions from the regular load/store; probably a good idea, with memory bandwidths being the bottleneck now and buses growing wider - the extra hardware is also negligible, basically a barrel shifter and logic to do an extra bus cycle if needed.