Either solution would have no problem indexing all their data, rather than having to limit it to a subset to fit in a in-memory table.
There might be some messiness with the regex search (looking for user-defined patterns of consonants, vowels, etc) as I've never needed to set that up, but the rest would be really clean.
See also this blog post that discusses how RoomKey uses read-only Solr instances that are repopulated daily to speed up searching for latency-insensitive data (think "does this hotel have a pool?" etc):
http://www.colinsteele.org/post/23103789647/against-the-grai...
I think Solr and Sphinx pretty much tick the same boxes in terms of features and performance though, so use what you know and like.
http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/d...
At search time you can use the default field or specify the fields to be searched:
http://lucene.apache.org/core/2_9_4/queryparsersyntax.html#F...
Any site that requires me to register an account before I can even look at their product is going to have a bad time.
I want to see your slow query log.
That's your first problem. I bet the schema isn't normalized and these are all varchar columns.
~~ Normalize your schema ~~ such that the main table is a set of integers. JOIN on PK and index whatever is required. Trust me, MySQL is fast when you do it right. I have tables with 650M records and lookups are just as fast as day 1. Keep table size AS SMALL AS POSSIBLE (less columns, use ints, etc).
Use date_ids & time_ids (from standard DWH techniques) if you are storing datetimes, not datetime fields (default indexing them is 1 second resolution which is stupid).
Since there are so many fields with data from multiple sources, we have queries running searching on individual fields. Due to this we need to have many indexes. 4GB of the 8GB is the size of the index itself.
Are you sure you've read up on your C. J. Date? I've had that once before: someone complaining that "queries take too much time" with a paltry single-digit-GB database. When I asked about the specifics, the only repeating reply was "we can't tell you". You don't mention anything of value, but querying a few million records can't possibly take a few minutes on the aging desktop computer I've bought seven years ago, much less on a modern server.
If it were me, I'd split that one giant table into a "main" table with some basic information and foreign keys pointing to other tables with the more detailed information from the other sources.
This is a great problem that would be fun to work on. Except that I guess you already found a satisfactory solution.
This is an old, but still very useful script for helping to suggest what settings to tweak: https://github.com/major/MySQLTuner-perl
http://dev.mysql.com/doc/refman/5.5/en/innodb-buffer-pool.ht...
All in all, it was a very nice solution and use case for the memory table! :)
Really, the first stop is to use tokudb backend in mysql. If its still slow, and if you have a small subset that fits in ram, just put that straight into a hash table in app space.
This only applies to memory tables. For non-memory tables, the size of varchar columns depends on the actual string size.
(edited. Thanks for correcting!)
From - http://dev.mysql.com/doc/refman/5.0/en/memory-storage-engine...
I had no idea about this either....
The MEMORY storage engine (formerly known as HEAP) creates special-purpose tables with contents that are stored in memory. Because the data is vulnerable to crashes, hardware issues, or power outages, only use these tables as temporary work areas or read-only caches for data pulled from other tables.