The problem there is many uncontrollable factors contributing to the data. If engagement is found to be better with A on the day, does that mean B will
never be better? It doesn't.
A/B testing is often conducted unscientifically or with insufficient sample size and timeframe.
Sometimes the cards fall where they fall, and your small UX tweak wasn't involved. It's tempting to conclude A/B testing delivered a valid result. As an experiment, if you deliberately made A and B identical, then A/B test as normal, you will still get a winner. They won't be equal.