Vectors are just getting started.
I used random projection hashing to increase the search speed because you can just match directly (or at least narrow down search) instead of calculating the euclidean distance for each row.
The demand for vector embedding models (like those released by OpenAI, Cohere, HuggingFace, etc) and vector databases (like https://pinecone.io -- disclosure: I work there) has only grown since then. The market has decided that vectors are not, in fact, over.
There are benchmarks here, http://ann-benchmarks.com/ , but LSH underperforms the state of the art ANN algorithms like HNSW on recall/throughput.
LSH I believe was state of the art 10ish years ago, but has since been surpassed. Although the caching aspect is really nice.
This approach seems feasible tbh. For example, a stock's historical bids/asks probably don't deviate greatly from month to month. That said, the generation of a good hash is dependent on the stock ticker, and a human doesn't have the time to find a good one for every stock at scale.
HNSW index is slow to construct, so it is best suited for search or recommendation engines where you build the index and serve. For workloads where you continuously mutate the index, like streaming clustering/deduplication LSH outperforms HNSW.
It might not be trendy, but it doesn't mean it can't work as good or better than HNSW. It all depends on the hashing function you come up with.
see ullman's text: mining massive datasets. it's free on the web.
Vectors didn't go anywhere. The article is discussing which function to use to interpret a vector.
Is there a special meaning of 'vector' here that I am missing? Is it so synonymous in the ML context with 'multidimensional floating point state space descriptor' that any other use is not a vector any more?
I was as confused and annoyed as you were, though, since I don't have a machine learning background
Hopefully someone who knows math will enter the field one day and build the theoretical basis for all this mess and allow us to make real progress.
> But another important goal is inventing new methods, new techniques, and yes, new tricks. In the history of science and technology, the engineering artifacts have almost always preceded the theoretical understanding: the lens and the telescope preceded optics theory, the steam engine preceded thermodynamics, the airplane preceded flight aerodynamics, radio and data communication preceded information theory, the computer preceded computer science.
[1] https://www.reddit.com/r/MachineLearning/comments/7i1uer/n_y...
Wouldn't the first part of the analogy actually be:
A 1 second flight that will probably land at your exact destination, but could potentially land you anywhere on earth?
I could see the hash approach at a functional level resulting in different features essentially getting a different number of bit directly, which be approximately equivalent to having a NN with variable precision floats, all in a very hand wavy way.
Eg we could say a NN/NH needs N bits of information to work accurately, in which case you’re trading the format and operations on those Nbits
The natural question is: how are you going to train it?
Are they re-inventing autoencoders?
Am I incorrect in thinking we are headed to future AIs that jump to conclusions? Or is it just my "human neural hash" being triggered in error?!