Redis 2.8.9 is out (opens in new tab)

(groups.google.com)

125 pointsClifReeder12y ago25 comments

25 comments

21 comments · 7 top-level

compare12y ago· 4 in thread

I'm about to deploy a new autocomplete on my site, probably with 10s to 100s of millions of records, at least 100 users at once. Would the new Redis commands help here? How would the memory usage be? Is it better to just use something else like Cleo for autocomplete?

antirez12y ago

Can't reply on the comparison with other products since I myself have still to compare and build an experience about this, but as far as how Redis performs, there is a demo here: http://autocomplete.redis.io. Basically for 8 million records it takes 1GB of memory (32 bit system), however here the records are source code lines so the average length is bigger than the usual search-term length. Definitely no problems in the ~10 millions range even with just a few GBs of memory. For +100 millions you need to either split the range across servers or use a machine with some non-trivial amount of memory, like 16-32 GB or alike.

cliveowen12y ago

Also with that much data is sensible to set up redis in a master-slave fashion for better availability. Once you have that up and running you can split the work between the two, since autocomplete searches are intrinsically read operations.

dvirsky12y ago

Keep in mind that the current redis implemention doesn't include scores for auto completion, and assumes all the entries in a sorted set have the same score to be able to do autocomplete style search. How to sort is up to you but it might create big overhead in CPU and memory.

antirez12y ago

At the end of the original link of this post, I tried to hint a bit about a simple way to overcome this, which is by discrete splitting of different rates different search strings are typed.

Basically you have a sorted set that counts the occurrences of a given search string, using top-k stream algorithm consisting of just taking a set of size N2 if you are interested in the top-N, and always trimming a random element from the top N - N2 interval when a new element is added.

At this point what happens is that you take M autocompletion keys. complete_10, complete_100, complete_1000, ..., complete_max, that go up to order of magnitudes, so what happens is that when an user types you query complete_max, and check if it has enough entries, otherwise you also query the lower-rank sorted set and so forth.

This way you are able to show users top-searches while they are typing, but still populate the autocompletion with lower-frequency entries if needed.

Of course when the "counting" sorted set is updated, if an element goes from an order of magnitude to the other, you update the sorted sets accordingly.

1 more reply

weakwire12y ago· 4 in thread

HLL eliminates read before writes in many cases and that's great.Would love to see the same data structure in cassandra and PG.

antirez12y ago

In eventually consistent systems like Cassandra, HLLs have the ideal merge semantics too (very similar to union of a Set).

ddorian4312y ago

if by pg you mean postgresql it is available as an extension

aktau12y ago

I have a vague notion of what you mean but could you annotate that with an example or something, please? I'd like to make sure :).

ddorian4312y ago

What he means is something like non-reading-increments in hypertable.

How they function:

You write a+=1

  if it doesn't exist in memory:
    a=1
    append +1 to commit-log
  else:
    a+=1 (in memory)
    append to commit-log

After some time, 'a' is written to disk and the commit-log is checkpointed (so if a server crashes it doesn't have to read a very large commit log), and 'a' becomes immutable.

But you have to increment again the 'a' key, and it is immutable. So you create a new 'a':

And repeat again. After some time this is again persisted on disk and the commit log checkpointed.

Now you want to read the value of 'a':

If a merger has run, it reads different versions of data on disk and merges them, counters are merged and written as 1 key. So it reads 'a'.

If the merger has not run, it reads both versions of 'a', merges them in memory, and returns the value.

Now change '+1' to add_to_set(5). This is even better, because it updates the in-memory value, and if the hll doesn't change because '5' was already added to set, it doesn't even have to write/commit to log because no change is made.

1 more reply

danford12y ago· 4 in thread

I'm currently learning the MEAN stack (and just started learning web development a few months ago) and I'm using mongoDB/mongoose as a post storage database. The project I'm working on is basically a 4chan-like image board with tags. I'm not expecting it to go public, but if it did it shouldn't have more than maybe 20,000 posts at a time.

I want to do something like stack overflow does where they have a pop-up come up when users start typing tag names. Does anyone know if redis would be suitable for this or have a suggestion for something even better? Keep in mind I'm trying to stay away from really advanced databases and would prefer a noSQL solution.

opendais12y ago

If you want to do something more complex than the Redis option, you may want to look into Elastic Search. Especially if you allow text submissions and want to provide the ability to search them...

danford12y ago

Thanks, I'll research it. I'm trying to stick closely to KISS principals though so hopefully it wont over complicate things too much. Even if I can't use it in this project it sounds like something that would be useful to know.

giulianob12y ago

Someone just asked this below.. see http://autocomplete.redis.io/

danford12y ago

Thanks, I should have read that comment more carefully.

Gigablah12y ago· 2 in thread

Looking at search.php, you could use the native json_encode() function available from PHP 5.2 onwards. (I know, it's just a demo)

antirez12y ago

Thanks, I'll do this.

mardix12y ago

I just replied with an improved version.

Good job by the way...

1 more reply

armis12y ago

Redis still keeps on amazing me. Can't wait to try those new handy sorted set functions

mrmondo12y ago

Well done! I'm really looking forward to the further development of cluster and sentinels - IMO at the moment Redis' clustering (or lack there of) isn't ready for production use so that'll be great.

nobbyclark12y ago

2.8.9? That sounds way too mature. What pre-alpha data store does HN recommend replacing redis with?

j / k navigate · click thread line to collapse

25 comments

21 comments · 7 top-level

compare12y ago· 4 in thread

antirez12y ago

cliveowen12y ago

dvirsky12y ago

antirez12y ago

At the end of the original link of this post, I tried to hint a bit about a simple way to overcome this, which is by discrete splitting of different rates different search strings are typed.

This way you are able to show users top-searches while they are typing, but still populate the autocompletion with lower-frequency entries if needed.

Of course when the "counting" sorted set is updated, if an element goes from an order of magnitude to the other, you update the sorted sets accordingly.

1 more reply

weakwire12y ago· 4 in thread

HLL eliminates read before writes in many cases and that's great.Would love to see the same data structure in cassandra and PG.

antirez12y ago

In eventually consistent systems like Cassandra, HLLs have the ideal merge semantics too (very similar to union of a Set).

ddorian4312y ago

if by pg you mean postgresql it is available as an extension

aktau12y ago

I have a vague notion of what you mean but could you annotate that with an example or something, please? I'd like to make sure :).

ddorian4312y ago

What he means is something like non-reading-increments in hypertable.

How they function:

You write a+=1

  if it doesn't exist in memory:
    a=1
    append +1 to commit-log
  else:
    a+=1 (in memory)
    append to commit-log

After some time, 'a' is written to disk and the commit-log is checkpointed (so if a server crashes it doesn't have to read a very large commit log), and 'a' becomes immutable.

But you have to increment again the 'a' key, and it is immutable. So you create a new 'a':

And repeat again. After some time this is again persisted on disk and the commit log checkpointed.

Now you want to read the value of 'a':

If a merger has run, it reads different versions of data on disk and merges them, counters are merged and written as 1 key. So it reads 'a'.

If the merger has not run, it reads both versions of 'a', merges them in memory, and returns the value.

1 more reply

danford12y ago· 4 in thread

opendais12y ago

If you want to do something more complex than the Redis option, you may want to look into Elastic Search. Especially if you allow text submissions and want to provide the ability to search them...

danford12y ago

giulianob12y ago

Someone just asked this below.. see http://autocomplete.redis.io/

danford12y ago

Thanks, I should have read that comment more carefully.

Gigablah12y ago· 2 in thread

Looking at search.php, you could use the native json_encode() function available from PHP 5.2 onwards. (I know, it's just a demo)

antirez12y ago

Thanks, I'll do this.

mardix12y ago

I just replied with an improved version.

Good job by the way...

1 more reply

armis12y ago

Redis still keeps on amazing me. Can't wait to try those new handy sorted set functions

mrmondo12y ago

Well done! I'm really looking forward to the further development of cluster and sentinels - IMO at the moment Redis' clustering (or lack there of) isn't ready for production use so that'll be great.

nobbyclark12y ago

2.8.9? That sounds way too mature. What pre-alpha data store does HN recommend replacing redis with?

j / k navigate · click thread line to collapse