Please, Amazon, verbatim matches should go first. Then do the AI thing.
Google, too.
This is really important when looking for something obscure.
The mistake you're making is that you think Amazon should order results in the way that's best for you to find what you want. Amazon puts the results that are better for Amazon first, based on millions of people's clicks and subsequent purchases. They have no reason to put something first if they know that putting something else first will result in an increased probability of a more profitable sale.
Amazon search is a tool for Amazon to promote things to sell.
Search is definitely a case of "the devil in the details", each person got their own very "simple" test or search improvement but to make search for everyone "generally" valuable is a very hard tuning problem.
In my past job as head of search for a large price-comparison service, what we build and found work best is to handle search-tuning as the same as test-driven-development.
We will actually write test in the form of "When someone types apple, there should be at least xyz products from category cellphones and their pricing should be between $XYZ--$ABC", whenever we had a manager or a person just walk in with oh search should work like this, we will put it in a test and thus as we tune, weights and vectors we can always see how it impacted previous cases of "Managers walking in telling us, his fav search query is bogus"
I don't think so. If I search for "animal books" I probably want popular books about animals, not a low rated product with the exact name "Animal Books." Besides, that makes it highly game-able, no? What if I title my book Horror or Sci-fi?
Its not as simple as you say.
Sadly I do not think that Amazon goal is to optimize searches so you get what you look for, but so as many people as possible find something that they purchase.
There is many complains in this threat about how bad the "smart" searches are becoming. But, the reality is that "smart" searches are becoming better and better at their goal: to maximize profit finding something for you to buy/click/etc. regardless of what was your initial search.
e.g. If you look for a 3$ book that gives Amazon a profit of 50 cents but they know that they can show you another that you are going to buy for 50$ with 2$ profit Amazon will show you what is right, for their own profit. If the book is just in a similar topic or not useful at all to a minority of people, that does not matter meanwhile it maximizes profit.
In exchange for a small chance of selling a more expensive item, it can actually be worth running an even greater risk of losing out on a purchase of a cheaper item. (It sounds obvious when put like this, but people I've talked to have been surprised by it.)
Of course, this is all about short term effects. There's a cost for your brand and reputation if you become the place where it's hard to find what you're really looking for.
----
This is is hard. I work with providing product search for online stores and our service is one of the better in the business, going by a solid track record of controlled experiments against competitors.
But like -- how do you know you're better? The user making a purchase is a really strong signal. Many people would be envious to have such a good signal as "person opening their wallet." It's about the gold standard when it comes to assessing the opinions of people without asking them directly.
Except... as stated here -- what if they didn't actually find what they wanted, only we managed to figure out something else they might as well buy while we were at it? Is that success? Partial success? How can one tell the difference?
To see an illustration of how bad Amazon search for exact products is, here's an example.
I want the newer 2020 Milwaukee 2161-21 flashlight and not the older 2017 model 2110-21. This is the manufacturer's product page: https://www.milwaukeetool.com/Products/Lighting/Personal-Lig...
Screenshots of searches:
- amazon.com[1] search for "milwaukee 2161-21" gives the wrong item 2110-21: https://imgur.com/a/2lrHb0I
... but I can go to Google and append "amazon" to find what I want...
- google.com[2] search for "milwaukee 2161-21 amazon" : https://imgur.com/a/afpJSgF
So Google's scraping of Amazon's shopping pages returns better results than Amazon! It's ridiculous.
Lest one thinks that there's an Amazon profit motive to show an older 2017 model that deliberately doesn't match the customer's search query, I'm not sure about that. In this specific case, both products are sold by 3rd-party resellers which means Amazon gets the same 30% fee either way.
And as more trivia, both HomeDepot.com and Walmart.com return the correct result for "milwaukee 2161-21".
[1] https://www.amazon.com/s?k=milwaukee+2161-21
[2] https://www.google.com/search?q=milwaukee+2161-21+amazon
Same with Google.
Today there has been multiple breakthroughs in a number of relevant fields but yet we see quality declining in obvious ways.
I understand that sometimes one needs to take a step back to take two forwarded, but in Googles case they are now in their tenth year after taking 5 steps back and at some point the new solution has to prove itself.
Sadly I guess the old solution is "lost to time", that noone is able to replicate it anymore. It is the only explanation I have why it hasn't been reverted.
Amazon no longer had the edge in terms of cost nor the exclusivity in terms of inventory. I still occasionally buy stuff on there when they do have a price advantage, but that's becoming rarer and rarer.
eBay still has one of the best searches if you know exactly what you want as they support more complex queries, eg.
(adidas,puma,nike) high top 11 -womens (black, red)
But they don't have everything of course.More and more I find myself using Google Shopping then buying at some specialized site. It may take 3 days instead of 2 to come to be and it may require I type in my payment info but many support Apple Pay and for those who don't, I memorized my credit card number making checkout very fast.
All you get is ads and generic results any time you try to find anything out of the ordinary. I remember that if you were good at formulating your search query you could almost always find what you want. Instead they assume the user is stupid or just want to serve up an ad.
Here’s a search for “Parasite” with a one-letter misspelling:
https://letterboxd.com/search/Parisite/
The results are so comically bad for anything that isn’t an exact match that I think they are just random.
It’s such a great site. I don’t know how its search is so awful. Letterboxd users joke about its search awfulness. I think the developers must just be punking us. Really.
I've had good results searching for the ISBN rather than the title.
Apparently you're not one of the people who finds the ISBN by looking it up on Amazon. (joke)
A (future) AGI-based search engine can go even further than that, and generate the answer for any problem based on the knowledge it has ingested.
At the moment my biggest frustration is that both Google and DuckDuckGo keeps ignoring my keywords and keep jamming my results with all kinds of obviously irrelevant data.
Google having a "verbatim" option that they ignore on top of already ignoring doublequotes after having butchered the plus operator adds insult to injury..!
There's probably some UX research that indicates that in complex scenarios it is better to give the user the impression of control while silently doing tje right thing.
Well, guess what: that depends on you doing the right thing, and the right thing is not to stuff my results with pages that doesn't contain the error messages I was looking for!
It used to also place the results that match all of your keywords first. They stopped doing this a few years ago, but they didn't mess with the quotes (yet?).
For some reason I don't get that except for extremely obvious cases. Example: https://duckduckgo.com/?q=%22lego+sauce%22&t=fpas&ia=web
Even if you choose another model or method for storing and searching through the vector embeddings, the order of operations will be approximately the same.
I looked into using that last year. In the end we went with a simple more like this query, which proved good enough relative to the model our machine learning people came up with (not so great because real world data is just hard). The nice thing with this is that you can combine vector search clauses with additional clauses and other features such as aggregations, highlighting, etc.
Another way to use machine learning with Elasticsearch would be the more like this plugin which re-ranks results matching a particular query. It's a simple linear regression (usually) based model that you train using features in the form of elasticsearch query clauses. Basically it learns the boost factors for each of the features. E.g. Soundcloud uses this in their search.
Either way, the hard part of stuff like this is testing your system. Going from a "this looks ballpark right" to "our users click on more results by 0.5% after change X", is a long journey. Not a lot of companies are able to do this and I've seen a few teams fall into the trap of having a not so great machine learned model with a flawed tests suite proving not much at all about its alleged properties. A well crafted query can do a lot of the same things. A benefit is that you can fix things easily whereas most machine learning is a bit more like a black box (either works or does not).
* Is already coming from. Google, Bing, Spotify, Pinterest, Facebook, etc, already use it in production.
All flavors of ES have some kind of vector search available, albeit with some limitations. There are also stand-alone open-source and managed solutions popping up left and right (I work for one, Pinecone.io).
From what I’m seeing, the hard parts people encounter are:
- Finding or developing good embedding models. I consider KPI testing a part of this process, since it tells you whether or not your models are actually “good.”
- Finding and tuning vector search libraries that can meet speed and accuracy requirements. (Easy if you use ES although you’ll run into latency issues with >10M docs. Even easier with a managed vector search solution.)
- Scaling the above to handle millions (or billions!) of items and high query throughput. Again, less of an issue if you’re using managed ES on AWS or Elastic.co, or a managed vector search solution like Pinecone, or you have a team of distributed systems engineers sitting idle.
Too bad then that this is not the goal of web search engines. Their goal is instead to sell ad space.