How MySQL memory table saved the day (opens in new tab)

(domcop.com)

45 pointswebstartupper12y ago50 comments

50 comments

41 comments · 9 top-level

Tomdarkness12y ago· 14 in thread

Or you could actually use something designed for indexing data and searches, like Elasticsearch or Solr.

Either solution would have no problem indexing all their data, rather than having to limit it to a subset to fit in a in-memory table.

al2o3cr12y ago

+1 for Solr - looking at the search on domcop.com, it seems like a perfect fit for the faceting stuff.

There might be some messiness with the regex search (looking for user-defined patterns of consonants, vowels, etc) as I've never needed to set that up, but the rest would be really clean.

See also this blog post that discusses how RoomKey uses read-only Solr instances that are repopulated daily to speed up searching for latency-insensitive data (think "does this hotel have a pool?" etc):

http://www.colinsteele.org/post/23103789647/against-the-grai...

beersigns12y ago

This was initial thought as well, seems like both would fit the needs specified, multiple column search etc. I've had good experiences with both ES and Solr. They both have pretty healthy user bases and are well documented. Only hang-up is if you use primarily reg-ex style searching; turning on n-grams could work there, but it might be too slow.

jcampbell112y ago

Those solutions assume text is tokenizable, and suck at regex. I think they are the wrong tool for anything having to do with domain search. Furthermore, they don't play nice with continuous cron indexing.

dalore12y ago

solr has real time indexing and batch indexing

tyw12y ago

I haven't used either elasticsearch or solr, so I don't have firsthand experience on how they compare to Sphinx search (http://sphinxsearch.com) but I really love Sphinx. Also integrates nicely with MySQL if you're already running that.

I think Solr and Sphinx pretty much tick the same boxes in terms of features and performance though, so use what you know and like.

webstartupperOP12y ago

Thanks for the info. I haven't previously come across Elastisearch or Solr. Will definitely check these out.

herge12y ago

Does Elasticsearch or Solr work with tabular data, can you search across multiple columns?

arethuza12y ago

Lucene models documents as a collection of fields, each with a textual value.

http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/d...

At search time you can use the default field or specify the fields to be searched:

http://lucene.apache.org/core/2_9_4/queryparsersyntax.html#F...

Xylakant12y ago

I can't speak well for Solr, but with elasticsearch the answer depends on what you exactly mean with "search across multiple columns", but it's probably yes.

waterlion12y ago

Yes. Read the example SOLR schema, it's very easy to understand.

Tomdarkness12y ago

Yes you can, and a whole lot more.

randomnumber31412y ago

>Elasticsearch >Register to watch

Any site that requires me to register an account before I can even look at their product is going to have a bad time.

steinnes12y ago

As far as I know you can download ElasticSearch from here without registering: http://www.elasticsearch.org/download/ ... what better way to check out the product than play around with it?

Tomdarkness12y ago

It is a open source project. They sell support, but elasticsearch itself is licensed under the Apache license. Just click overview if you want to kniw what it is, I'm not sure what you are clicking on that requires registration. Perhaps their training sessions?

1 more reply

jumby12y ago· 8 in thread

8 million records (at 7GB!) and it's slow means there is something seriously wrong with your schema. That table would entirely fit in InnoDB Buffer Pool on any modern hardware.

I want to see your slow query log.

jumby12y ago

> Unfortunately, since there are 30 different columns that can be searched on

That's your first problem. I bet the schema isn't normalized and these are all varchar columns.

~~ Normalize your schema ~~ such that the main table is a set of integers. JOIN on PK and index whatever is required. Trust me, MySQL is fast when you do it right. I have tables with 650M records and lookups are just as fast as day 1. Keep table size AS SMALL AS POSSIBLE (less columns, use ints, etc).

Use date_ids & time_ids (from standard DWH techniques) if you are storing datetimes, not datetime fields (default indexing them is 1 second resolution which is stupid).

webstartupperOP12y ago

The domains table currently has 84 fields. We collect metrics from various sources, so reducing the number of fields is not feasible. All the field types are the smallest that we could use - e.g. tinyint(4) instead of an int etc.

Since there are so many fields with data from multiple sources, we have queries running searching on individual fields. Due to this we need to have many indexes. 4GB of the 8GB is the size of the index itself.

gngeal12y ago

The domains table currently has 84 fields.

Are you sure you've read up on your C. J. Date? I've had that once before: someone complaining that "queries take too much time" with a paltry single-digit-GB database. When I asked about the specifics, the only repeating reply was "we can't tell you". You don't mention anything of value, but querying a few million records can't possibly take a few minutes on the aging desktop computer I've bought seven years ago, much less on a modern server.

1 more reply

jlarocco12y ago

84 fields in one table is terrible DB design, and I'm not surprised that it's slow.

If it were me, I'd split that one giant table into a "main" table with some basic information and foreign keys pointing to other tables with the more detailed information from the other sources.

1 more reply

mtdewcmu12y ago

I wonder how much the indexes are helping. It sounds like it could be a hard case for a query planner. You might be doing more full table scans than you realize, and it might not be as slow as you might expect. I think regex searches do full scans, so that gives an idea of the cost.

This is a great problem that would be fun to work on. Except that I guess you already found a satisfactory solution.

jumby12y ago

oooohkay. you can run with that then.

jol12y ago

>"That table would entirely fit in InnoDB Buffer Pool " I would like to see how it fits in the pool on 2GB linode (mentioned in the article), also, I like that people try to think before saying "big data" and grabbing some NoSQL that is not very good fit in some places or gigantic mapreduce in 67 servers.

jumby12y ago

I am saying there is NO WAY MySQL sucks at 7M records. That's just jacked design. Talk to me at 7B when you need to reach for something bigger.

elbac12y ago· 3 in thread

A better solution, is just to increase your innodb buffer size, you will get virtually same performance as the 'memory' table once all the data is in memory. Plus all the data will be persisted.

This is an old, but still very useful script for helping to suggest what settings to tweak: https://github.com/major/MySQLTuner-perl

http://dev.mysql.com/doc/refman/5.5/en/innodb-buffer-pool.ht...

augustohp12y ago

That is partially what was done. Other than increasing the buffer size, he did a partition on the data. Together with the buffer size increase, if he created another table using the "archive" engine and kept pruning the data form the `domain` table to this archive table he would achieve the same thing with persistance.

All in all, it was a very nice solution and use case for the memory table! :)

webstartupperOP12y ago

Thanks for the link. Unfortunately, since there are 30 different columns that can be searched on, there are many indexes and the index size for the domains table itself is 4GB. I run this off a 2GB linode, so unless I add a lot more RAM, Innodb is not going to match the memory table speed.

jeffdavis12y ago

I'm a little confused... the InnoDB table and memory table had the same data, but the InnoDB table was larger (at least twice as large, I presume)?

1 more reply

willvarfar12y ago· 2 in thread

I am always cautious of memory tables. They don't support transactions, for example, and don't work well multi user.

Really, the first stop is to use tokudb backend in mysql. If its still slow, and if you have a small subset that fits in ram, just put that straight into a hash table in app space.

mtdewcmu12y ago

As I understood it, the memory tables are read-only. They're like a cache. So transactions aren't needed.

willvarfar12y ago

Except for that awkward time when the tables are refreshed?

1 more reply

Zr4012y ago· 2 in thread

> Varchars take up the space for all the chars defined.

This only applies to memory tables. For non-memory tables, the size of varchar columns depends on the actual string size.

(edited. Thanks for correcting!)

webstartupperOP12y ago

"MEMORY tables use a fixed-length row-storage format. Variable-length types such as VARCHAR are stored using a fixed length."

From - http://dev.mysql.com/doc/refman/5.0/en/memory-storage-engine...

I had no idea about this either....

jcampbell112y ago

> MEMORY tables use a fixed-length row-storage format. Variable-length types such as VARCHAR are stored using a fixed length.

iamthephpguy12y ago· 2 in thread

Haha. The Jon Snow meme got me in splits.

Flimm12y ago

You're getting downvoted because this is not Reddit. Not that we dislike memes, we just like them on Reddit, and off HN.

martswite12y ago

It confused me for a second I thought the author was on about Jon Snow of Channel 4 news...

ww52012y ago· 1 in thread

What are some typical queries look like? Several minutes for searching 7M records doesn't sound right. Are the columns indexed properly?

mtdewcmu12y ago

I'm thinking that this could be a fairly easy problem and the RDBMS may be making it harder. This is probably a tricky indexing situation, so a lot of queries might be effectively unindexed. I'd like to see what the performance would be using really simple methods, like dumping it to a TSV file and doing regex searches with grep or awk. It might be surprisingly fast.

jonaldomo12y ago

Just a heads up: http://dev.mysql.com/doc/refman/5.1/en/memory-storage-engine...

The MEMORY storage engine (formerly known as HEAP) creates special-purpose tables with contents that are stored in memory. Because the data is vulnerable to crashes, hardware issues, or power outages, only use these tables as temporary work areas or read-only caches for data pulled from other tables.

saintfiends12y ago

We did something similar at work. We had to poll for changes in a table. So instead of polling the tables we added triggers to insert events to a MEMORY table and polled that table. It performs good enough for us.

j / k navigate · click thread line to collapse

50 comments

41 comments · 9 top-level

Tomdarkness12y ago· 14 in thread

Or you could actually use something designed for indexing data and searches, like Elasticsearch or Solr.

Either solution would have no problem indexing all their data, rather than having to limit it to a subset to fit in a in-memory table.

al2o3cr12y ago

+1 for Solr - looking at the search on domcop.com, it seems like a perfect fit for the faceting stuff.

There might be some messiness with the regex search (looking for user-defined patterns of consonants, vowels, etc) as I've never needed to set that up, but the rest would be really clean.

http://www.colinsteele.org/post/23103789647/against-the-grai...

beersigns12y ago

jcampbell112y ago

dalore12y ago

solr has real time indexing and batch indexing

tyw12y ago

I think Solr and Sphinx pretty much tick the same boxes in terms of features and performance though, so use what you know and like.

webstartupperOP12y ago

Thanks for the info. I haven't previously come across Elastisearch or Solr. Will definitely check these out.

herge12y ago

Does Elasticsearch or Solr work with tabular data, can you search across multiple columns?

arethuza12y ago

Lucene models documents as a collection of fields, each with a textual value.

http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/d...

At search time you can use the default field or specify the fields to be searched:

http://lucene.apache.org/core/2_9_4/queryparsersyntax.html#F...

Xylakant12y ago

I can't speak well for Solr, but with elasticsearch the answer depends on what you exactly mean with "search across multiple columns", but it's probably yes.

waterlion12y ago

Yes. Read the example SOLR schema, it's very easy to understand.

Tomdarkness12y ago

Yes you can, and a whole lot more.

randomnumber31412y ago

>Elasticsearch >Register to watch

Any site that requires me to register an account before I can even look at their product is going to have a bad time.

steinnes12y ago

As far as I know you can download ElasticSearch from here without registering: http://www.elasticsearch.org/download/ ... what better way to check out the product than play around with it?

Tomdarkness12y ago

1 more reply

jumby12y ago· 8 in thread

8 million records (at 7GB!) and it's slow means there is something seriously wrong with your schema. That table would entirely fit in InnoDB Buffer Pool on any modern hardware.

I want to see your slow query log.

jumby12y ago

> Unfortunately, since there are 30 different columns that can be searched on

That's your first problem. I bet the schema isn't normalized and these are all varchar columns.

Use date_ids & time_ids (from standard DWH techniques) if you are storing datetimes, not datetime fields (default indexing them is 1 second resolution which is stupid).

webstartupperOP12y ago

gngeal12y ago

The domains table currently has 84 fields.

1 more reply

jlarocco12y ago

84 fields in one table is terrible DB design, and I'm not surprised that it's slow.

If it were me, I'd split that one giant table into a "main" table with some basic information and foreign keys pointing to other tables with the more detailed information from the other sources.

1 more reply

mtdewcmu12y ago

This is a great problem that would be fun to work on. Except that I guess you already found a satisfactory solution.

jumby12y ago

oooohkay. you can run with that then.

jol12y ago

jumby12y ago

I am saying there is NO WAY MySQL sucks at 7M records. That's just jacked design. Talk to me at 7B when you need to reach for something bigger.

elbac12y ago· 3 in thread

A better solution, is just to increase your innodb buffer size, you will get virtually same performance as the 'memory' table once all the data is in memory. Plus all the data will be persisted.

This is an old, but still very useful script for helping to suggest what settings to tweak: https://github.com/major/MySQLTuner-perl

http://dev.mysql.com/doc/refman/5.5/en/innodb-buffer-pool.ht...

augustohp12y ago

All in all, it was a very nice solution and use case for the memory table! :)

webstartupperOP12y ago

jeffdavis12y ago

I'm a little confused... the InnoDB table and memory table had the same data, but the InnoDB table was larger (at least twice as large, I presume)?

1 more reply

willvarfar12y ago· 2 in thread

I am always cautious of memory tables. They don't support transactions, for example, and don't work well multi user.

Really, the first stop is to use tokudb backend in mysql. If its still slow, and if you have a small subset that fits in ram, just put that straight into a hash table in app space.

mtdewcmu12y ago

As I understood it, the memory tables are read-only. They're like a cache. So transactions aren't needed.

willvarfar12y ago

Except for that awkward time when the tables are refreshed?

1 more reply

Zr4012y ago· 2 in thread

> Varchars take up the space for all the chars defined.

This only applies to memory tables. For non-memory tables, the size of varchar columns depends on the actual string size.

(edited. Thanks for correcting!)

webstartupperOP12y ago

"MEMORY tables use a fixed-length row-storage format. Variable-length types such as VARCHAR are stored using a fixed length."

From - http://dev.mysql.com/doc/refman/5.0/en/memory-storage-engine...

I had no idea about this either....

jcampbell112y ago

> MEMORY tables use a fixed-length row-storage format. Variable-length types such as VARCHAR are stored using a fixed length.

iamthephpguy12y ago· 2 in thread

Haha. The Jon Snow meme got me in splits.

Flimm12y ago

You're getting downvoted because this is not Reddit. Not that we dislike memes, we just like them on Reddit, and off HN.

martswite12y ago

It confused me for a second I thought the author was on about Jon Snow of Channel 4 news...

ww52012y ago· 1 in thread

What are some typical queries look like? Several minutes for searching 7M records doesn't sound right. Are the columns indexed properly?

mtdewcmu12y ago

jonaldomo12y ago

Just a heads up: http://dev.mysql.com/doc/refman/5.1/en/memory-storage-engine...

saintfiends12y ago

j / k navigate · click thread line to collapse