Announcing Soulmate, a Redis-backed service for fast autocompleting (opens in new tab)

(seatgeek.com)

97 pointsericwaller15y ago23 comments

23 comments

19 comments · 12 top-level

timr15y ago· 2 in thread

Are you really using the technique described in the Redis auto-complete page? Doesn't that method take a lot more RAM than is necessary using a more specialized approach (i.e. a trie)?

Also, from what I can tell, every query is log(N) in the size of the completion set, instead of linear in the length of the query/suggestion (again, like a trie). Seems like this might have trouble scaling to large suggestion sets.

ericwallerOP15y ago

We're actually using a slight variation of the second technique described, which involves maintaining a sorted set of the most relevant results for every possible prefix.

A query for a single-prefix term (i.e. "yank") is technically O(log(N)+M), where N is the # of items in the sorted set, and M is the # of items returned. But since in practice we often want a small # of items, we can keep both N and M small.

One tradeoff we made is that this approach doesn't support incremental updates easily, you can always add new items, but it would take a good bit of work to remove items that have expired, or update the scores of items that change.

As for memory usage, we only store each item once (in a redis hash table), and use the unique ids as entries in the sorted sets. Which sort of approaches a trie, since instead of:

ya -> yan -> yank -> yanke -> yankee -> yankees -> [data about yankees]

we have:

ya -> 101

yan -> 101

...

101 -> [data about yankees]

timr15y ago

Yeah, that's what it sounded like.

Aside from not supporting incremental updates, that approach has a high memory overhead. For every word of length N, you've got to store 1 + 2 + ... + N-1 + N characters, which is O(N^2). The constant is 1/2, and you do get some savings in shared prefixes, so that's definitely a worst-case upper bound -- but a trie grows with O(N).

The O(log(N)) lookup time can also be a limiting factor with large sets. A trie is going to give you worst-case linear lookup time in the length of the longest string in your set. This can be a pretty dramatic difference as string sets get large.

1 more reply

dmix15y ago· 2 in thread

The example autocomplete on Seatgeek.com is indeed impressively fast.

I might have to use this in my next service.

ericwallerOP15y ago

Thanks. If you do, definitely let us (SeatGeek) know how it goes.

kscaldef15y ago

Quick bit of feedback: I thought it was odd that I couldn't type in my city name in conjunction with a band name to pare things down to just the show in my town. Instead, it shows no results.

1 more reply

jedsmith15y ago· 1 in thread

As a UX note, I've always quietly loved autocompletes that aren't just a flat list of terms, but actually contain structured information organized in an intuitive fashion. The design of the suggestions on SeatGeek is fantastic.

ericwallerOP15y ago

Thanks, very much agreed. I think Spotlight on OSX is probably the first place I saw it done really well. I remember on XP I used to use search all the time, but now on OSX I just autocomplete to what I want probably 95% of the time.

ncavig15y ago· 1 in thread

Would be cool to have an example of this running on websockets and get rid of the request/response latency that most autocompletes have. Keep the socket open when the text field is focused and you should be able to cut down the response time even further without that overhead.

ahrjay15y ago

Considering chrome and firefox are switching off websockets until they resolve the security issues that wouldn't be very viable. I guess you could fallback to flash sockets or long polling.

jarin15y ago· 1 in thread

How ironic, this is perfect for the dating site I'm working on :)

ericwallerOP15y ago

Glad to hear the name works for someone. We thought it meshed nicely with some of the other hyperbole out there in ruby gem names (god, unicorn, thor, shotgun, etc.).

dhruvbird15y ago

How many phrases of length 30 could you handle with 1GB of RAM?

Or do you have numbers on the mean length of a phrase you handle currently, the number of such phrases and how much memory it takes?

kin15y ago

Awesome work guys! I really hope this type of UI becomes more widespread. On a side note, I've always thought the guys over at www.glyde.com execute it quite well.

henriklied15y ago

Great stuff!

In curious: What do you think about exposing this service via WebSockets? Would that make it even faster?

siculars15y ago

Oh, oh yes. Thank you kindly. I was literally about to embark on this very feature. Let's take a closer look...

kingkilr15y ago

The UI on seatgeek is almost identical to what rdio provides, I wonder if rdio is using it.

Detrus15y ago

My first few searches took a while, then every search was pretty fast. Is it the load?

jbendotnet15y ago

Nice work.

j / k navigate · click thread line to collapse

23 comments

19 comments · 12 top-level

timr15y ago· 2 in thread

Are you really using the technique described in the Redis auto-complete page? Doesn't that method take a lot more RAM than is necessary using a more specialized approach (i.e. a trie)?

ericwallerOP15y ago

We're actually using a slight variation of the second technique described, which involves maintaining a sorted set of the most relevant results for every possible prefix.

As for memory usage, we only store each item once (in a redis hash table), and use the unique ids as entries in the sorted sets. Which sort of approaches a trie, since instead of:

ya -> yan -> yank -> yanke -> yankee -> yankees -> [data about yankees]

we have:

ya -> 101

yan -> 101

...

101 -> [data about yankees]

timr15y ago

Yeah, that's what it sounded like.

1 more reply

dmix15y ago· 2 in thread

The example autocomplete on Seatgeek.com is indeed impressively fast.

I might have to use this in my next service.

ericwallerOP15y ago

Thanks. If you do, definitely let us (SeatGeek) know how it goes.

kscaldef15y ago

Quick bit of feedback: I thought it was odd that I couldn't type in my city name in conjunction with a band name to pare things down to just the show in my town. Instead, it shows no results.

1 more reply

jedsmith15y ago· 1 in thread

ericwallerOP15y ago

ncavig15y ago· 1 in thread

ahrjay15y ago

Considering chrome and firefox are switching off websockets until they resolve the security issues that wouldn't be very viable. I guess you could fallback to flash sockets or long polling.

jarin15y ago· 1 in thread

How ironic, this is perfect for the dating site I'm working on :)

ericwallerOP15y ago

Glad to hear the name works for someone. We thought it meshed nicely with some of the other hyperbole out there in ruby gem names (god, unicorn, thor, shotgun, etc.).

dhruvbird15y ago

How many phrases of length 30 could you handle with 1GB of RAM?

Or do you have numbers on the mean length of a phrase you handle currently, the number of such phrases and how much memory it takes?

kin15y ago

Awesome work guys! I really hope this type of UI becomes more widespread. On a side note, I've always thought the guys over at www.glyde.com execute it quite well.

henriklied15y ago

Great stuff!

In curious: What do you think about exposing this service via WebSockets? Would that make it even faster?

siculars15y ago

Oh, oh yes. Thank you kindly. I was literally about to embark on this very feature. Let's take a closer look...

kingkilr15y ago

The UI on seatgeek is almost identical to what rdio provides, I wonder if rdio is using it.

Detrus15y ago

My first few searches took a while, then every search was pretty fast. Is it the load?

jbendotnet15y ago

Nice work.

j / k navigate · click thread line to collapse