Basically you have a sorted set that counts the occurrences of a given search string, using top-k stream algorithm consisting of just taking a set of size N2 if you are interested in the top-N, and always trimming a random element from the top N - N2 interval when a new element is added.
At this point what happens is that you take M autocompletion keys. complete_10, complete_100, complete_1000, ..., complete_max, that go up to order of magnitudes, so what happens is that when an user types you query complete_max, and check if it has enough entries, otherwise you also query the lower-rank sorted set and so forth.
This way you are able to show users top-searches while they are typing, but still populate the autocompletion with lower-frequency entries if needed.
Of course when the "counting" sorted set is updated, if an element goes from an order of magnitude to the other, you update the sorted sets accordingly.
How they function:
You write a+=1
if it doesn't exist in memory:
a=1
append +1 to commit-log
else:
a+=1 (in memory)
append to commit-log
After some time, 'a' is written to disk and the commit-log is checkpointed (so if a server crashes it doesn't have to read a very large commit log), and 'a' becomes immutable.But you have to increment again the 'a' key, and it is immutable. So you create a new 'a':
And repeat again. After some time this is again persisted on disk and the commit log checkpointed.
Now you want to read the value of 'a':
If a merger has run, it reads different versions of data on disk and merges them, counters are merged and written as 1 key. So it reads 'a'.
If the merger has not run, it reads both versions of 'a', merges them in memory, and returns the value.
Now change '+1' to add_to_set(5). This is even better, because it updates the in-memory value, and if the hll doesn't change because '5' was already added to set, it doesn't even have to write/commit to log because no change is made.
I want to do something like stack overflow does where they have a pop-up come up when users start typing tag names. Does anyone know if redis would be suitable for this or have a suggestion for something even better? Keep in mind I'm trying to stay away from really advanced databases and would prefer a noSQL solution.