Here's how I got it to work (cribbed from my own README):
TL;DR: I used k-means clustering to segment a database of images into color bins for quick searching.
Using imagemagick, I ran histograms on images and converted their top n most frequent colors into Lab* color space for an approximate representation of human vision.
Colors were then matched to a user-defined set of colors using Euclidean distance, i.e. a "bin". I could choose any array of RGB values of arbitrary length.
I then stored hexadecimal values of the image's original color and the matched color, along with the frequency of that color within the image (for sorting based on frequency) and the Euclidean distance (for sorting by tolerance).
Then finding images close to a certain color was as simple as Photo.all.with_color('#993399') and order by frequency and Euclidean distance. Here's a photo of the results: https://github-camo.global.ssl.fastly.net/89cc87ac84cd3a1d12...
I might spend some time reverse-engineering Shutterstock's implementation, since it sounds way better than mine and clearly works at scale. But for my purposes, my own implementation worked just fine.
If you want help implementing it, feel free to reach out to me!
Everyone else: If you want to see his comments, turn on the showdead option in your profile.
I wonder if anyone ever bought TinEye's color-search-engine-as-a-service [1]. The as-a-service model seems really awkward for something that requires so much integration, and this new shuttershock feature (developed from the ground up) seems to confirm this.
Personally, I think the TinEye color results are better than Shutterstock's approach....although having meta-data alongside is definitely a must.
V. cool, though.
I cannot read the slider labels, the screenshot is very low-res :(
http://www.airliners.net/similarity/
It was on Slashdot back then: http://tech.slashdot.org/story/05/05/04/2239224/searching-by...