Actually, no. You are putting the cart before the horse. What actually happened is that they tweaked the ranking algorithm, and then measured a minuscule effect in a particular scoring algorithm (in this case counting certain types of words used in future posts).
So the nature of the scoring algorithm (counting emotional words) used to measure the impact of a change makes deploying the A/B test suddenly unethical?