Vectors are over, hashes are the future of AI (opens in new tab)

(sajari.com)

66 pointsjsilvers4y ago25 comments

25 comments

23 comments · 10 top-level

sdenton44y ago· 5 in thread

hm. I'd like to believe, but the arguments here seem a bit obtuse.

No one measures vector distance using the hamming distance on binary representations. That's silly. We use L1 or L2, usually, and the binary encoding of the numbers is irrelevant.

It sounds like the LSH is maaaaybe equivalent to vector quantization. In which case this would be a form of regularization, which sometimes works well, and sometimes meh.

mish154y ago

I was part of the above article. Happy to answer questions.

In terms of accuracy, it totally depends on the resolution needed. We can get >99% accuracy of L2 waaaaay faster with 1/10 of the memory overhead. For what we are doing that is the perfect trade off.

In terms of LSH, we tried projection hashing and quantization and were always disappointed.

sdenton44y ago

So it seems like the neural network producing the neural hash is still a standard CNN operating on the usual vector representations? And then the learned hash gets used in a downstream problem...

Or is there actually some interesting hash-based neural algorithm lurking around somewhere?

1 more reply

cellis4y ago

Hello! First I would like to say this is a very cool writeup. I'm not a computer scientist but do dabble a bit in neural networks. Is it possible this could be used to build a convolutional neural network?

tomnipotent4y ago

Goal is to end up with binary encoding so that Hamming distances approximate Euclidean nearest neighbors (so basically L2). Combine with quantized SIFT/GIST for images and you end up with a fast-and-dirty model with decent results.

onlyrealcuzzo4y ago

How would these ml predictions be explainable?

sayonaraman4y ago· 4 in thread

Whoever wrote the article must have done a cursory search at best, I'm surprised they didn't mention semantic hashing by Salakhutdinov & Hinton (2007) https://www.cs.utoronto.ca/~rsalakhu/papers/semantic_final.p...

Edit: also, talking about LSH, must check out FAISS library https://github.com/facebookresearch/faiss and the current SOTA http://ann-benchmarks.com/

LuisMondragon4y ago

I'm intrigued by FAISS. Do you know if I can use BERT embeddings with FAISS?

gk14y ago

Yes. Check out this example with Faiss + Bert: https://www.pinecone.io/learn/faiss-tutorial/

1 more reply

mish154y ago

We will add neural hash based ANN to that repo when we get time. I expect HNSW to get pushed out in time for many reasons.

mish154y ago

Interesting. Hadn’t seen it actually. BOW is problematic as a starting point, but this is neat and a long time ago too.

jonbaer4y ago· 3 in thread

I feel like I have been talking about LSH for years

tomrod4y ago

I recall Uber using it in a white paper writeup.

Is the speedup that remarkable? I'd be curious at to the increase in speed versus loss of precision.

Mehdi22774y ago

I view ANN search for retrieval as clear winner in retrieval methods for domains with very large number of items like O(10 million +) that you want to high amount searches. I've worked at a couple different major tech companies and I'd consider two tower models + ANN as a classic pair. One tower for request embedding called on every request. One tower for item embedding. Compute all item embeddings periodically and build an ANN index. The top dot product of the two with minor constraints can be done by just running request tower and then doing search in the index.

The speed up is really necessity as direct methods are just too expensive both dollar wise and time wise (lowish latency is goal).

mish154y ago

Yes. Storage also. You can get >99% ordering quality of exhaustive cosine with a tiny fraction of memory usage

LuisMondragon4y ago· 1 in thread

Hi, my interest got piqued. I'm developing a similarity feature where I compare embeddings of a sentence and its translation. I wanted to know if the hashing method would be faster that the pytorch multiplication by which I get the sentence similarities. Going from strings to bytes, hashing and comparing is very fast. But if I get the embeddings, turn them into bytes, hash them and compare them, both methods take almost the same time.

I used this Python library: https://github.com/trendmicro/tlsh.

mish154y ago

You pay a decent cost to do the hash, it’s a compression algorithm of sorts. But the data is a fraction of the size and comparison is way faster. If you do many of these or compare the same ones more than once you amortise the cost very quickly

foxes4y ago

I like to speculate for reasons this might or might not make sense at several levels, although mostly just conjecturing. The fact everything works is very interesting, but it seems so hard to come up with something concrete.

You have a map from some high dimensional vector space ~ k^N -> H, some space of hashes. H sort of looks one dimensional. I assume that actually the interesting geometry of your training data lies on a relatively low dimensional subvariety/subset in k^N, so maybe its not actually that bad? It could be a really twisted and complicated curve.

However you still need to somehow preserve the relative structure right? Things that are far apart in k^N need to be far apart in H. Seems like you want things to at least approximately be an isometry. Although there are things like space filling curves that might do this for some degree.

Also maybe even though H looks low dimensional, it can actually capture quite a bit (if your data is encoded as a coefficient of a power of 2, you could think of powers of 2 as some sort of basis, so maybe it is also pretty high dimensional).

useful4y ago

Contrastive and triplet loss is pretty cool for generating hashes. I'd imagine the trick they are alluding to is a rewrite the loss function to be more aware of locality instead of trying to minimize/maximize distance.

Or they are just shingling different ML hash functions, which is kinda lazy.

andyxor4y ago

This idea goes back to "sparse distributed memory" developed by NASA research in the 80s. It's a content-addressable memory where content hashes are encoded & decoded by neural network and similar items are in proximity to each other in the embedding space, and similarity measured via Hamming distance. https://en.wikipedia.org/wiki/Sparse_distributed_memory

a-dub4y ago

using fancy neural nets for learning hash functions from data is indeed pretty cool, but hash functions fit to data isn't new. see "perfect hash functions."

lsh is most famously used for approximating jaccard distances, which even if you're not doing stuff like looking at lengths or distances in l1 or l2, is still a vector operation.

lsh is best described in jeff ullman's mining massive datasets textbook (available free online), which describes how it was used for webpage deduplication in the early days at google.

jgalt2124y ago

let's just take it to its logical extension and make every model just one big look up table (with hashes as keys). /s

nanis4y ago

> "_If this peaked your interest_"

It didn't.[1]

[1]: https://www.merriam-webster.com/words-at-play/pique-vs-peak-...

j / k navigate · click thread line to collapse

25 comments

23 comments · 10 top-level

sdenton44y ago· 5 in thread

hm. I'd like to believe, but the arguments here seem a bit obtuse.

No one measures vector distance using the hamming distance on binary representations. That's silly. We use L1 or L2, usually, and the binary encoding of the numbers is irrelevant.

It sounds like the LSH is maaaaybe equivalent to vector quantization. In which case this would be a form of regularization, which sometimes works well, and sometimes meh.

mish154y ago

I was part of the above article. Happy to answer questions.

In terms of accuracy, it totally depends on the resolution needed. We can get >99% accuracy of L2 waaaaay faster with 1/10 of the memory overhead. For what we are doing that is the perfect trade off.

In terms of LSH, we tried projection hashing and quantization and were always disappointed.

sdenton44y ago

So it seems like the neural network producing the neural hash is still a standard CNN operating on the usual vector representations? And then the learned hash gets used in a downstream problem...

Or is there actually some interesting hash-based neural algorithm lurking around somewhere?

1 more reply

cellis4y ago

tomnipotent4y ago

onlyrealcuzzo4y ago

How would these ml predictions be explainable?

sayonaraman4y ago· 4 in thread

Edit: also, talking about LSH, must check out FAISS library https://github.com/facebookresearch/faiss and the current SOTA http://ann-benchmarks.com/

LuisMondragon4y ago

I'm intrigued by FAISS. Do you know if I can use BERT embeddings with FAISS?

gk14y ago

Yes. Check out this example with Faiss + Bert: https://www.pinecone.io/learn/faiss-tutorial/

1 more reply

mish154y ago

We will add neural hash based ANN to that repo when we get time. I expect HNSW to get pushed out in time for many reasons.

mish154y ago

Interesting. Hadn’t seen it actually. BOW is problematic as a starting point, but this is neat and a long time ago too.

jonbaer4y ago· 3 in thread

I feel like I have been talking about LSH for years

tomrod4y ago

I recall Uber using it in a white paper writeup.

Is the speedup that remarkable? I'd be curious at to the increase in speed versus loss of precision.

Mehdi22774y ago

The speed up is really necessity as direct methods are just too expensive both dollar wise and time wise (lowish latency is goal).

mish154y ago

Yes. Storage also. You can get >99% ordering quality of exhaustive cosine with a tiny fraction of memory usage

LuisMondragon4y ago· 1 in thread

I used this Python library: https://github.com/trendmicro/tlsh.

mish154y ago

foxes4y ago

useful4y ago

Or they are just shingling different ML hash functions, which is kinda lazy.

andyxor4y ago

a-dub4y ago

using fancy neural nets for learning hash functions from data is indeed pretty cool, but hash functions fit to data isn't new. see "perfect hash functions."

lsh is most famously used for approximating jaccard distances, which even if you're not doing stuff like looking at lengths or distances in l1 or l2, is still a vector operation.

lsh is best described in jeff ullman's mining massive datasets textbook (available free online), which describes how it was used for webpage deduplication in the early days at google.

jgalt2124y ago

let's just take it to its logical extension and make every model just one big look up table (with hashes as keys). /s

nanis4y ago

> "_If this peaked your interest_"

It didn't.[1]

[1]: https://www.merriam-webster.com/words-at-play/pique-vs-peak-...

j / k navigate · click thread line to collapse