Sure, but in many cases, such as the example given by GP, long-term AB testing is hard or almost impossible. For the testing to have validity, you need the A and B cohorts to be stable, and have little or no overlap, and that is hard for long time spans for anything that is not account based (and somewhat dangerous even for account-based things, as people will almost certainly start to notice that they are getting a different experience than their peers, which may upset them).
In online dating, at least, this is a non-issue. Using an online dating app is, ironically, a solitary enough activity that people don’t go around comparing whether their UI is different from their friends’ UI. You of course can’t let the same user see two versions, but that just means doing permanent group assignment on signup. We used to A/B test subscription prices over enormous ranges (e.g., randomly giving some people 90% discounts) and approximately nobody noticed outside of obscure Reddit threads.
I wonder if you two are talking past each other a little. I'm thinking that A/B testing for content is a different beast than A/B testing for experience.
I’m not disagreeing — My point is really, “not all AB testing is bad, even if the kind you’re most familiar with leads to shitty content.” My second comment was just more of side note.