This is a good touchstone to use for "you've overoptimized your site, tone it back". I am also taken aback every time I'm on a site, I've got something in my shopping cart, I'm headed for the "check out" button, or I'm even on the checkout page, and some stupid interstitial pops up. Dude, I'm trying to enter my credit card information! Back off! Especially stupid for a "sign up for our newsletter" popup; we all know that unclicking the "yes, we can email you every 17 seconds from now until the heat death of the universe with valuable offers from 'our affiliates' which we define as 'anyone we share a species with'!" box on the checkout form is mandatory, and if we don't see it immediately we'd best go hunting for it. You've already default populating the checkbox to "yes" on this very screen, get out of the way!
Less unbelievably stupid, but related, is when I'm examining product X and just after I scroll down a bit to read more you pop up something related to... well... anything other than product X! I'm signalling interest in product X as hard as I can, and you've AB tested that this is a great time to jangle your keys over there instead? Your AB testing is stupid and can't possibly fail to be some stupid statistical fluke or other terrible error. What fisherman goes out on his boat, hooks a fish, and then rushes to throw another completely different lure out to the hooked fish and get them on that hook instead? This is another good touchstone for being "overoptimized".
[0] https://i.postimg.cc/HW89hs7r/Screenshot-2022-07-12-145957.p...
I left a screenshot in slack and it ended up causing a couple of teams to have to roll back their widgets, but it always baffled me that we were able to focus so much on the individual trees of metric optimization that we would miss the forest to that extent.
Like I'm sure many users sign up then drop out of your funnel but I'm part of an organization that's a paying customer. I'm already going to use your stuff. What possible business benefit could there be to you spamming me? If anything you're risking the inverse - it made me want to migrate away from the tool.
Still absurd, but I know this is a problem friends of mine have had.
Suppose the site isn't concerned about the sale very much at all?
Suppose the thing that the site uses to reel people in, is a good deal that isn't very profitable to the site but what the site then tries to sell is a very profitable near-scam/ripoff. Scaring off half the ordinary customers becomes worth it to get even 10% of the customers buying the scam.
What seems like "poor optimization" can easily optimization for something and could be seen as "the scammification of the web".
Many here are focusing on a single interaction. While the outcome of that single interaction is negative to the company, the aggregate outcome must be positive somehow, perhaps in the way you said, but it doesn't even have to be a scam or ripoff. Some products just have a higher margin and/or customer LTV.
As an individual, it is annoying, but the company is focusing only on the macro effect when it does something like this.
If I may... I have seen data from a big retailer that shows any user that doesn't immediately purchase an item, is actually not that interested in the product on the screen. If a customer is going to buy something, they will do it promptly. Anyone else is just browsing.
YMMV, grain of salt, context dependent, etc, etc.
1. Clicked on page.
2. Took maybe 10 seconds to take in what is "above the fold".
3. Scrolled down to see what else there is.
4. BAM! Popup triggered by scrolling down.
While I understand what you're getting at, they do not yet have the info to know that I'm browsing or whatever. They were so excited about their stupid popup that they didn't even get that far.I will say, generally, when I'm to the point that I'm entering credit card info, I've put up with it, but I have been chased off of sites by this use case before. Especially if that popup also crosses with some other popup and now I'm chasing down the tiny little 6pt light-grey-on-white little "x"s to click away the popups in the right order.
Actually, let me add that to my touchstone list. OF COURSE hiding the dismissal icon for the popup increases "engagement" with the popup. You don't even need to run a test for that, because what other result could it have? "We shrank the close icon, moved it to the lower right corner where nobody expects it, and made sure to kill the constrast even harder, and customers dismissed it 2.5 seconds more quickly on average"? Of course that's not possible. But... that's the wrong question! And AB testing is really good at answering the question you're asking, it has no mechanism in and of itself to see whether you're asking the right question. If you're getting down to this you've overoptimized.
Fuck that. Unless it's an emergency (in which case I'll go to a shop), anything i purchase online is carefully considered, sometimes over several weeks. My revolving user-agent, vpn etc may give the illusion I'm not interested....but i am indeed, just browsing...
I hate them so much. It makes it feel like so much more of a chore to try to do research or look for things online. I'd honestly prefer 56k page-load speeds if the pages were free of this garbage.
1) I open a product in a tab. I click "add to cart" and a "related products" sidebar slides in. I close the tab in annoyance.
However, some items exhibit a similar pattern, EXCEPT...
2) I open a product in a tab. I click "add to cart" and a stupid extended warranty sidebar slides in. I close the tab in annoyance.
The difference?
Item #1 gets added to my cart
Item #2 doesn't make it to the shopping cart.
Amazon just silently deleted my purchase.
I actually don't know when it dawned on me that this happened, but amazon lost money on me because I didn't buy certain things.
A couple years ago, after being an Android fan for the better part of a decade, I finally bought myself an iPhone and pried myself away from Google's ecosystem wherever I could. And Apple didn't even need to do any work for me to make this decision. It was the years of abuse from Google that you experience when you decide to use a Google product or service. And a big part of that was the constant A/B/C/D/E/F testing. I never felt like I was using a complete product, everything felt like a constant beta that could be changed or rearranged at any point, and I was just doing free testing work for them while they harvest all my data.
Every app update was a risk of the app rearranging itself, or features appearing/disappearing. Eventually it didn't even come from app updates in the Play Store, and new interfaces would just appear one day when a server somewhere marked your account as being in the group that gets the new UI. This app that you were familiar with could at any point be rearranged when you open it on any given day. Then maybe a week later you open it and it's back to how it was before. A button you thought was here suddenly isn't, and you question whether something actually changed or if you're losing your mind. It's a subtle gaslighting that eventually I couldn't stand any more.
To me, A/B testing means you don't respect your users. You see them as just one factor in your money machine that can be poked and prodded to optimize how much money you can squeeze out of them. That's not to say a company like Apple is creating products out of the goodness of their heart, but at least it feels like it was developed by humans who made an opinionated call as to what they thought was the right design decision, and what they would want to use. And in my 2 years of owning an iPhone, I've never opened my reminders app to find out that it's completely unrecognizable, or my messages app has been renamed or rethemed for the umpteenth time.
Your perspective is extremely short-sighted. A/B testing can result in this type of behaviour but that's just poor A/B testing. Good A/B testing focuses on removing distractions from the experience and helping users derive more value from the product. Bad A/B testing tries to make things more discoverable, where discoverability is often just noise and distractions. Good A/B testing ensures that the money machine, as you put it, pays its dues to users by making the product experience delightful.
I personally have never heard from a product person „Let’s A/B test whether this is delightful“. And I think that’s because delightfulness or satisfaction is impossible to quantify in A/B tests. You only get to measure things like engagement, signup rates, retention etc. - cold hard taps on the screen, and no more.
And I must say that I‘m glad that, right now, apps can’t just scan my face (or cortisol levels, or pheromones or…) for emotional clues while I read their pesky push notifications that want to coax me back into their daily active user base.
it's the perspective of the normal users.
every time i'm using a website and it does not behave exactly the same than for other people or I notice some AB testing, in my head it goes "who the fuck these people think they are?". The computing experience must be consistent and repeatable. If I wanted something that can change depending on the current position of the stars I'd ask another human, not a computer.
A/B testing can result in this type of behaviour but that's just poor A/B testing.
How many tech startup patterns fit that? That's a sign that either the pattern does not generalize well or it's snake oil.
I've been in a similar situation, where I created a relatively sophisticated A/* testing and control system. My idea of good use of the system ended up being very different from how the team employing the system thought about it.
I believe that is part of the point of the post, that unintended, and even unimagined side effects plague even the best of ideas.
Sometimes they'll be in A, and sometimes in B.
The button moves or disappears or appears.
Your user does not get an experience they can rely on
no, yours is. if some company wants to do some testing, they SHOULD PAY users for that. A/B testing is just exploiting users to get free testing.
This shit drives my parents insane. Me too, when I have to help them. I've had to spend tens of seconds looking at a major screen in the phone app, of all things, to figure out WTF I'm looking at so I could help them figure out what was up. Re-arranged every update (or new phone) for absolutely no reason, terrible affordances, poor use of their own design language. Ugh.
I'd get them on iOS but they need larger screens and the $400 small iPhones (what I have) are already more expensive than they think a phone "should" cost, so they keep buying $200 Android phones about once a year (hoping the next one will be better) and not being able to use them because the UI is garbage.
Before I could give her the freshly grandparent-proofed device, said video calling app upgraded on my parents' PC first and changed literally every single element of the UI beyond recognition. To someone the age of my grandma, that would be literally like bricking the device remotely, because none of the buttons would look the same, and she would not be able to work out how to use the new interface.
STOP CHANGING THINGS! Even if the new UI is better (debatable), some people just like or rely on a particular layout to operate the device or app. Don't rearrange without giving a ~permanent setting to use the old layout.
At least on iPad/iPhone you can set the apps to access Google mail, etc, which doesn't change as often, but still too often.
It's kind of mind boggling they'd decide to do that - the replacement they direct me to (Google Chat) doesn't even have feature parity so I just dropped them and moved my social circle using Hangouts to a different app (since at this point they all faced the same problem and we decided on a different platform).
I'm really curious how the A/B testing for this went down - Google is willingly throwing customers away because somebody wants to pump numbers for a new app that is objectively worse than the old one.
At this point Google Maps is the only product that is keeping me with them, but even that one is beginning to wear thin.
Which features are you missing?
The point is of course to make it annoying to manually update apps and enable auto update. I have been burned too many times with a auto update so I refuse.
This wasn't enough, they really want to force me enable auto updates to the point of the update section of the app having 50% of the visible space on my big screen being covered with a message to enable auto update over WiFi. [0]
Whoever is doing this at Google... Stop. Just stop. It is cringe.
Google seems to be really good at making developer tools like Borg and Blaze - however, I think that as an organization they have some deficit that makes them not responsible enough to develop user-facing software (like, uh, an operating system).
Maybe Google would be better as a B2B company.
In many cases the hardware was so poor it was hard to make a call due to the touchscreen.
Since the primary thing I want the phone to do is make a call I switched to the “it just works” camp and haven’t regretted it.
Except getting photos off the phone. Until I realised the best tool for that is … Ubuntu!
Why can't we have nice things?
If it's something one-and-done (like different permutations of a signup flow to see what is easier for users), then I don't see the harm in it.
You just described doing business in todays world.
Being a bit more generous towards A/B testing, I would make a counter point: _not_ doing any kind of user testing, of which large scale automated A/B tests are just a subset, means you don't respect your users. Because it means you just assume you know what their experience is like, or worse: you don't even care about it and bother to learn something.
Your complaint seems to be more about the scale and aspect of automation honestly, and continuity of the services, which is a valid complaint against Google but not about A/B testing in general.
A/B tests are not the only, or even the best, way of collecting user feedback.
I was just pontif- er, talking about this to someone, a couple of days ago.
I love the users of my products. Most of my products are free, and are carefully-crafted, highly-polished, complete deliverables, and I fret over how they are used -even if by a tiny number of end users-, like a nervous hen. I do what I do, out of love for the craft, and out of a genuine desire to make people's lives easier, through the technology I have at my disposal.
It is my belief that most tech companies despise their user base. Users are little more than cattle, to be fattened and slaughtered. "Caring about the user" means optimizing for "engagement," or keeping them trapped within their own ecosystem. John Oliver did a rant about this, recently[0]. It has nothing to do with actually caring about the user, or solving their problem. It is about harvesting users.
In fact, my discussion about this, came about, because someone wanted to keep users inside the app I'm writing, as opposed to linking them to a more familiar app, on their phone (for the record, it was for videoconferencing). Linking is a "no-brainer," as I can link out to dozens of installed apps, using the simple URL scheme method, built into iOS[1], and "keeping them in the app," would have required several months of extra work, polluting the app with megabytes of junk code, because I'd need to use SDKs, and also kill the ability to easily scale to add new clients (contrary to popular belief, Zoom is not the only videoconferencing option). It would also have possibly put us on the hook, legally, for what happened in those videoconferences.
[0] https://youtu.be/jXf04bhcjbg?t=638
[1] https://developer.apple.com/documentation/xcode/defining-a-c...
Data can "lie". What is observed is not always reality, simply what we can see of it.
Consider auctions. You never actually "see" the bidder's demand or utility. Yes, there are some ways to structure auctions that in theory show willingness to pay and such (ignoring confounding factors and irrationality), but you don't actually observe anything beyond the bid.
Similarly, on websites, you don't always know the causal reasons people click here or there. You know perhaps enough to predict a step-wise behavior, but don't (usually) understand the full behavioral lifecycle -- especially if a metric improves but at the hidden cost of decrements to conversion and similar.
I'd add a bit of nuance here. They are very good at driving traffic, but very bad at building an audience. You do this long enough and your news site is now optimized for attracting hot-take appreciators who engage with the news like a tabloid. This drives away everyone who doesn't want to be reading a tabloid and makes you more dependent on keeping up with traffic-gaming strategies to continuously drive traffic. You've basically shifted your business from being a place that produces journalism to being a place that figures out ways to game social media trends and SEO.
If do more ad placements increase revenue is the test and then there is 20% jump what are you as an engineer going to do? Tell to management that its bad?
The main issue is that people mix conversion with customer obsession! Whenever you work on a product or feature you should be asking yourself "Is this really good for my customer" - if the answer is no, then no matter what the A/B tests/conversion rates show you don't do it.
Unfortunately we mostly hire the wrong people as PMs, who then hire clones of themselves. They are not truly customer obsessed and use A/B tests incorrectly which results in products that trick or force customers to do things they don't understand/want to do. Long term this is bad for the product and company
It's about observing the users fumble through your UX when you know their motivation.
An actual, sincere customer obsession (and btw I think we both completely agree here) means that you are willing to lose out on some conversion and revenue in order to make sure your customers are top priority.
Real customer obsession isn't just an ethical principle either, it makes business sense. The problem is that the value of customer obsession is realized over the span of years or decades. Companies that have a sincere customer obsession are the kinds of places that survive economic ups and downs, where people's children grow up and are loyal to the product because they remember the time their parents were treated well by the company.
If your only company focus is Q4 KPIs then you really can't have "customer obsession".
The logic is: If they hate your app, they won't spend money. If they love your app, they will. Which is what would make you think A/B testing and UX work are the same thing.
There's really nothing new about this issue at all. Playing towards the average creates a lot of shitty stuff, in apps/websites as well as politics and wherever else there are metrics to track.
The genius of a good product is that it will make a stand and not give in to the whims of over-optimization in order to maintain its original intent. This is what made Apple unique.
It requires leadership with guts who aren't chasing the latest shiny object.
Running experiments and A/B tests are popular because it is _guaranteed_ to give you signal. If you have a large engineering team and you're not sure how to filter the quality of results, gating everything through A/B tests is a well understood methodical way to ensure only positive work makes it way through.
Early stage startups should never A/B test. When you're searching for product market fit, you're doing global optimization within the search space. Your product will change drastically as you make new learnings. Premature optimization (A/B tests) will only be detrimental.
It's almost guaranteed to ensure only false positive work makes its way through. If you're picking 0.05 as your P value, and you're running dozens to hundreds of tests, your false positives are almost certain to exceed your actual positives.
When I'm working for clients that do a lot of A/B testing, I suggest that they should always run A/A tests to ensure that they're not incorrectly rejecting the null hypothesis. If your A/A tests are showing significant differences, you have a problem in your testing pipeline that by definition can't be cured by more testing. You need holdout groups and selectivity about what to test, instead of just throwing everything at the proverbial wall.
For example, imagine a costume shop that ran a couple dozen A/B tests over the summer. Those results may look statistically significant. They may even stand up against the A/A test. But people that buy costumes in the summer are very, very different than people that buy them in October, and if 90% of the store's business is in the run up to halloween, then all these micro optimizations could actually make your total business performance worse.
I'm a A/B testing skeptic too, though I admit they have a time and a place. My favourite are ones that can be reasoned about as actual hypotheses. This usually involves some degree of data analysis or segmentation. For example, increasing font sizes may boost conversion, and a later analysis shows that this was almost solely a lift in conversion rates amongst the 45+ cohort. The data in this case isn't just blindly driving design decisions, it's helping inform the staff on how to better design in the future for the audience we have.
A/A tests do test your methodology as you said. But they do not fix a p-value one order of magnitude higher than it should be. (And yeah, I'm aware you know that, but your comment places them on the same context, so it got misleading.)
Predictably, whatever metric we were watching on it (probably conversion) swung wildly to either side over the first few days. The look on some of the product managers' faces was pretty great. After about 2 weeks, it settled into a steady state where each "version" performed equally (measured cumulatively, so just large numbers in action).
The conclusion from this exercise was...
"It takes 2 weeks."
¯\_(ツ)_/¯
The beauty of AB testing is that you don't have to give up your opinion. You can just change irrelevant things until the result you desire gets proven by chance and now you've got data to base your opinion on!
This is an interesting contrast to Amazon that also makes checkout easy but bombasts the user with thousands of listings, mostly mildly fraudulent and consisting of absolute crap, and still somehow gets repeat business.
The Amazon or Google way of throwing all things into the bin and spew it out to the users is BS. We are saying we live in an information age but I firmly believe stuff were way better catalogizised back when it was done manually by paid gatekeepers.
Hey, would you like Prime with that? Do you know we provide free two-day shipping with Prime? If you sign up for Prime today you can get a $100 discount!
My second biggest issue is: it's rare that companies offer actual, live-human support these days anyway. When marketing adds A/B testing, shit becomes really annoying if something breaks as a result - usually the phone lines are suddenly flooded, the agents have no idea what has happened either and try to reproduce and figure out what's going on (and sometimes can't because they aren't part of the test group!), and so even people who haven't been in the testing group are going to be very pissed off.
IMHO, A/B testing without explicitly notifying the customers in advance should be banned by law, and that ban be harshly enforced. Customers are not guinea pigs, and with the rise of elderly people on the Internet this becomes an actual public safety issue (as ever-changing stuff makes it easier for scammers!).
You're describing adversarial UI changes to small populations of then unsupported customers. This can have outsized impacts on vulnerable populations, eg., esp. the elderly.
This is one of my most intense frustrations in the modern age. Complete and utter disrespect for your customers' time and knowledge.
I can sort of understand wanting to hide stuff on mobile, but the discovery of controls to unhide things should be better. I often help people that are stuck trying to figure out how to do something in an app and not realizing they can click on something that gives no indication it's clickable is a common thing.
Desktop is another world. I often have 20+ inches of horizontal space and a hamburger menu. It's infuriating, especially when the hamburger menu is hiding one button.
As Product manager/owner I've only found A/B testing useful when trying to narrow in on a specific demographic and you are trying to find some optimization.
The marketing/sales funnel use of it is kind of gross and has ruined , imo, something that has utility in a very narrow scope.
Cheers, also very much agree customers should be informed and allowed to opt out.
'hey we have a new UX to try..would you like to switch?' the data from people that opt-in is way better
The three key outcomes I observed from the relentless A/B testing were UI antipatterns, team burnout, and a well-attended conference talk about "how we ran 105 A/B tests in a year, and what we learned".
We always run >=3 variants, surveyed the dozen team members on which one they thought would run. Over the years, there was no clear pattern over who could make that prediction.
IE, it's not possible to predict which is the most effective treatment, even when you include a really bad idea in the treatments!
Every single time I warn them about how the bill of goods they've been sold with A/B testing is almost completely unattainable, especially in the way that they want to go about it. They won't magically start getting more conversions by changing a button color. Even if they start getting more clicks, they rarely start getting more complete conversions, because the increased numbers is usually from people who weren't good leads in the first place.
On top of that every company I've worked with has no idea what the real methodology for good tests is, no matter how many times I explain it or put it in a slide deck. I would constantly get requests to use A/B testing for feature rollout.
Them: "Hey, could you do an A/B test of our existing site design and our upcoming redesign?"
Me: "if the old design performs better are you going to toss out the redesign?"
Them: "No we're going with the redesign but we want metrics on how it'll affect traffic"
Me: "Those metrics are useless if you aren't going to listen to them, and if the results come back and the old design performs better, you're not even going to put it in a presentation because it's counter to your planned actions. There's literally no point in running this test"
Them: "Run it anyway"
The problem is it's not subtle at all; there's a handful of those features that, when combined, end up being overbearing and noisy: "3 people looked at this listing within the past 3 days! 12 rooms left at this rate!" I don't care. I'm looking to book business or vacation travel. If a spot fills up I'll just go somewhere else. It'll be fine either way.
I don't use them anymore for that reason. Old soul (me) is old. (I'm probably in a minority, judging by their advertisement budget.)
But unfortunately, it works.
I've seen friends that I consider intelligent panic buy tickets/hotels, "because prices are going up since the last time I checked!"
Next time you want to book anything, browse around, ignore any of the fake urgency notifications, ignore the price (while staying broadly in your price-range, of course). Then when you found a destination you like, open the page in a private browsing window (or clear your cookies), and you'll see that prices and availability are back to normal.
OTA make comparisons a bit easier, but everything is negotiated and contractually controlled to keep people from just going to the hotel directly. Secret hotel prices (like HotWire if that still exists (Expedia) or Travelocity's Top Secret hotels if that still exists (also Expedia)) are an even more crazy negotiation. Hotel Tonight at least used to contact the hotel chains every day for that day's options, though since they were bought by AirBnB who knows what they do.
These days I just find a nice hotel and book with them/their system directly. Airlines too, since airlines fail to give all their options to the OTAs.
In some ways its sad that aggregators don't work all that well in the main travel industry (Flight/Hotel/Car) but travel is extremely complicated, highly competitive and still very fractured except for airlines. Pricing comparisons are not very useful since they are so mangled and obfuscated that you may as well just go to several sites and do it yourself by hand. For example Spirit Airlines used to give us prices for their tickets at $X and were always cheaper than everyone else; yet once you booked at that price they hit you for everything extra (bags, res, for all I know oxygen) then our customers complained we were fooling them and the real cost was higher.
However entering e.g. client's information take a lot of steps, you are constantly clicking "Next" throughout these beautiful wizards and pages. After some time everybody starts to feel that there must be a better way.
What is the solution?
Spreadsheet import! Where you can just do everything in this "complicated" UI of Microsoft Excel, with formulas, and hundred buttons at once on the screen. Fill in hundreds of rows of information and just import it to the "beautiful business system".
And the funny thing is, I agree with this article. Both the content and the heading of this hackernews article:
1. Notification/scare spam can have long term retention ramifications. The previous generation of experiment platforms made long term metrics literally impossible to read. But now companies can use holdout groups and long term metrics like retention to give more clarity.
2. Even if you can read long term metrics that include retention, the scare/notification spam could lead to less word of mouth growth. For travel, I am guessing that you will be swayed to drive WoM growth more by differentiated inventory, reliable service, and cost, so maybe it's just merely annoying but not a risk to the business.
3. Notice that Airbnb's UI is very, very different from Booking, Expedia. We made a conscious choice to always make sure Airbnb came across as a "sincerely helpful friend" as a booking platform. An AB experiment showing that metrics improve doesn't mean that you have to launch it. You can look at those results and say, "this metric lift isn't worth how ugly it's making my site", and that's a completely valid choice. (a choice we made often at Airbnb)
The author's idea is that this short term gain damages longer term metrics. That sounds logical and agreeable, but that doesn't make it true. Not in my experience anyway.
Probably the people complaining the most about annoying UI patterns weren't going to convert anyway. Whilst those coming with a specific conversion goal to your site will convert even if annoyed in the process.
Anyway, the true root cause goes all the way to the top. When you give a team a 20% sales increase target and "deliver by next quarter or be fired"...this is what you get. If the executive level dismisses a healthier, more sustainable long term growth model, then there's pretty much no way to stop this.
It's so hard to stop because it actually works. It works short term and evidence that it harms long time is typically lacking or it simply isn't true.
Netflix auto play? Is that you? You were a hateful idea, no one liked you, yet you stubbornly hung on for far too long
I'd pay twice as much to watch half as much quality programming, but that would tank what they think is a positive metric.
It did! People started watching stuff sooner!
Mostly because it was so incredibly irritating that I'd start a show just to make the random autoplay cease. Of course, what it really meant was that I'd go to amazon and agonize over what to watch, but at least I was doing it at my own speed and not being harassed by noise autoplay.
Having left FB years ago, I now watch people "navigate" their site/apps with disbelief.
You, of course, need to ensure the granular tweaks can be rolled up into something usable as the granular tweaks prove successful. You can't just keep bolting on UI changes while losing sight of the larger experience. Each incremental A/B test is testing against a previously successful variant so eventually the control is radically different from where it started and you're only concerned about beating the control. Using a longer-term holdout group or reseting the control experience during incremental testing can help mitigate this and get you zoomed out a bit from the local maxima.
It's why I saw it as my moral duty to leave (as well as the other FB properties), so that at least in a small way, I "produce content" that is only available by interacting with me as a person.
No, that doesn't mean A/B testing is inherently short-sighted. It's entirely possible to measure long-term secondary effects of an A/B test. Just save a record of treatment groups, and remember to come back and compare long-term metrics like LTV down the road. We do this all the time at my startup, and of the dark patterns that we've tested, we rarely see a long-term negative impact on LTV that outweighs the positive conversion rate impact.
If you want to make a valid argument against dark patterns (which is basically what 90% of this thread is trying to do), it's unlikely to be grounded in efficacy. This is coming from a business owner who spends seven figures monthly on advertising, constantly split tests, and is heavily invested in only making decisions that are in the long-term interest of the business.
Instead try to improve the customer experience, make better products, improve customer service.
(edited for clarification)
AB testing can be (although isn't always) used to improve the customer experience. Assuming you know exactly what will make the customer experience best without actually testing it can also lead to a worse experience.
For that you usually hire a market research company or do what they will do: take an interviewer, two cameras (one front-face, one top-hands) and hire an as-diverse-as-possible pool of test candidates that you then put through whatever workflow optimization you want to do. Then afterwards, you interview them - side benefit, you can get really interesting general side knowledge that you'd never gain from a dumbass A/B scheme: is your font style/color scheme legible, can the site be used by colorblind people, are there stock photo choices that give off stereotypical vibes...
It's real fun and a worthwhile experience for everyone involved.
That said, an A/B test does not tell you why something didn't work. You can make further assumptions based on the results and develop new hypotheses, but it never tells you why. Typically you would do some kind of qualitative UX research on a prototype or even static concepts beforehand to identify these kinds of issues before you even expend the effort to do a live A/B test. Far cheaper to do a study with 6-12 people and a prototype than to build out a full, functioning A/B test experience.
It's possible the flow they created was generally better but perhaps it had one fatal flaw. Perhaps that flaw could easily be remedied once identified.
A/B testing is just one small part of a good UX process.
Without a metric to say what is "better" and a method to measure it this is empty advice.
With AB testing your are optimising for a specific outcome. Usually higher conversion. As pointed out in the article eventually you'll end up with a bunch of colourful buttons and scary texts that persuade the user to click. A lot of the "only 2 seats/rooms available" are lies to scare the user into a conversion.
What once was ads everywhere, is now psychological gaming.
I hope someone comes up with a Google Extension, and maybe Apple with a new "Access Website" mode.
These messages are boring to be honest. Once you noticed them everywhere, game over for me. Time to move on.
One time offers, limited time offers, mailing list signups, up-sells, and cross sells are time tested ways to increase sales dating as far back as radio era telephone and catalog sales.
Steve Madden is a perfect example of this. They sell undifferentiated popular shoe styles less expensive than high fashion but more expensive than knockoffs. They have to hustle you to get you on their mailing list (for 10% off your order) in the hopes that you'll make another impulse purchase later when you get a text or email from them. If they weren't as aggressive you might never make another impulse purchase with them again as there are tons of brands selling nearly identical products.
Some companies are just horrible at hustling so they actually get in the way of you completing your purchase. In a competitive market this is a self correcting problem.
They spent more time seeing the items and..didn't like the pics and conversion went down. In the end we reverted to the crap gallery we had before, they don't click it anymore and conversion went back up again..
* Users know you have a nice gallery
* They are more likely to shop at your store
* In the end, you get more sales despite the lower conversation rate
You have to click through at least 10 pages of additional offers (and many extra price things that are added by default!) before you get to the actual checkout page.
Site owners: please stop doing this. You’re turning the web into a cesspit. You’re part of the problem.
AB testing is and always has been fish oil for management. The only things it can actually prove, are more easily identifiable by common sense. So wherever it actually works, it was probably a waste of time / overkill for evidence.
- sincerely, a business analyst
I will always use AB testing for uncertain code in the future. I was skeptical when I first started writing AB tests, but they have proven their worth over and over again.
A/B testing right now is done on cohort basis and tests are ran for weeks to couple of months. This means where lifetime span of a customer is beyond few weeks and months, it's really not possible to tell if global maximum was missed.
I.e. you increase the number of promotional emails the customers get per week. You do it for 3 weeks and see that customers who got those emails had higher conversion. But you didn't get to see that customers who kept getting those higher number of emails completely unsubscribed after 3 months of pain. But by this time all customers are on higher frequency group so it's hard to tell what would be driving the unsubscriptions.
I'm no expert but here are some solutions:
1. You should have really delayed long running control groups. Preferably going well beyond average duration your customer sticks around. These groups should get onto new things a year after. But even then it'd be not possible to take out WHAT feature is affecting them, because in 1 year main group would have accumulated lot of features. But still something...
2. You should really have lots of secondary KPIs that measure things that affect long term KPIs. Sure conversion is better, but is time spent reading newsletters increasing? Are buyers feeling good about their experience with the brand... some of these KPI are more qualitative and can't be just automated.
what else?
I currently work in a game publishing company, here are 2 anecdotes from it
1. We run an A/B for game performance but we keep changing the bids for our games, and thus get varied quality of users, A/B tests don't really help in such a case 2. Once by mistake we ran the same creative on FB for 2 different ads.. both ended up having totally different metrics
It's also worth noting that there's no way in hell they actually know that with any sort of precision. No GDS has proper up-to-date knowledge of bookings from all the various sources that hotel reservations actually go through (they overbook airline flights). What they're really saying is that the small inventory of rooms that are reserved for them to book exclusively are almost gone.
So if you do testing and it gives you some kind of result, the crucial step is trying to understand what it really means, is there something we can learn from it.
Unfortunately, this is also the hard part that requires actual effort and intelligence and is difficult to scale -- and so is frequently skipped.
I used to use that app all the time, then kids happened and spontaneous hotel reservation became rare. Fast forward a few years and a circumstance came up that made me think "Hotel Tonight". I discovered it wasn't installed on my new phone so I grabbed it. It was unrecognizable. Maybe the prices were as good, maybe it could still be used the way I used it previously, but it looked like it turned into a hotel booking app when what I wanted to see was a small selection of good hotels nearby with unusually low prices. One of the features was the lack of choice.
I finally opened the inspector and deleted it, so that I could use the menu to select "order online", which took me to a page ... with the same modal.
I agree with the sentiment on AB testing but I think the bigger insight is that we need to be reminded to see the forest for the trees with any process, tool, or goal.
Sometimes these intangibles are hard to measure and almost need to be sensed.
It reminds me of how you can see the exact same development methodology used at two different companies, where at one company it works beautifully and at the other it becomes a bureaucratic albatross.
For example, you can see if Group A or Group B from a test are more likely to still use the site 1 year later.
You hypothesize that those ways to 'juice the metrics in the short term' hurt the user experience in the long term... Well if your hypothesis is right, these long term AB results should show it.
This isn't very feasible on most products and certainly limited by the amount of data collected.
No one EVER tests for mean-reversion over time.
17 years I've seen companies do A/B tests. I doubt I've seen a single convincing, durable result the whole time.
Saying you made a 10% purchase rate improvement in a month is an easy pay rise.
It happens when people change perspectives from building and sustaining businesses to exploiting and squeezing every employee, supplier, and customer for the last drop.
https://news.ycombinator.com/newsguidelines.html
Edit: you broke the site guidelines particularly badly later in the thread. We ban accounts that do that, so please don't do it again. More here: https://news.ycombinator.com/item?id=32072856.
If you known its inside anatomy you know what I mean.
Milk is in the back of the store because that's where it makes sense to have a refrigerated wall.
Milk is increasingly available in smaller quantities in compact refrigeration units at the front of the store.