Same with bogus results in Google search. It would be a mistake to fixate on a fail case at the expense of seeing what it gets right.
One thing that can be said about Amazon is how data-driven it is. Even an obvious "improvement" to a system would require analysis to back it up as an improvement. For example, it might seem obvious to filter out lower quality user-created answers in the product FAQ, but answers with poor grammar might actually boost sales because shoppers trust the answer more.
Also, as we descend deeper into ML/AI and black boxes, the deeper we get into effects from afar. There's no real place to write if (user.sex == M) then weigh('tampons', -1) as it was a constellation of factors that cascaded into a man seeing tampons like that time he purchased something related for his girlfriend. The next rung in line is the business of mind-reading.