Show HN: I built a MySQL storage engine which is InnoDB compatible (opens in new tab)

(upscaledb.com)

106 pointscruppstahl9y ago30 comments

30 comments

25 comments · 9 top-level

endymi0n9y ago· 4 in thread

No transactions? Single-threaded? Global lock?

Don't want to put anyone's work down, but at first sight this looks much more like MyISAM than InnoDB - and just like this, MyISAM's a helluva lot faster than InnoDB for single writer workloads throwing ACID out of the window.

Just that MyISAM is battle tested over the course of several years. Again, interested to see what comes out of it, but if history is a lesson, it's usually easier to go from correct to fast (PostgreSQL) than from fast to correct (MySQL et al.).

nine_k9y ago

Version 0.0.1 looks much like an experiment in writing a working storage engine. It lacks a lot of critically important things. It's hard to say anything at this point about any real-world scenarios where this storage engine would be useful.

What concerned me a bit was absence of an interesting and promising general idea behind the implementation, the reason to implement a new engine. Maybe it is there, but the blog post does not say anything about it.

endymi0n9y ago

Exactly this, I was missing the big picture. Having worked with at least a dozen databases, my claim today is that of Tony Million: Don't be a hipster and just use Postgres. Don't get me wrong, I also work with Aerospike, Redshift, Redis and a few other specialized data stores, but I always know why Postgres wouldn't do there. For any specialized data store, at least provide a rationale where the niche is where that thing is better than anything else that's on the market. Simply claiming it's "fast" without further specifications worked out for MongoDB, but I hope it's for the very last time. Optimization means tradeoffs.

ignoramous9y ago

Thanks for pitching in. I'd like to know more about Postgres (correct->fast) and MySQL (fast->correct) comparison here. Can you pls share technical links that go in details about these? Thanks again.

dspillett9y ago

I'm not sure how much of it is still relevant today but in the "bad old days" there were a lot of issues where mysql did things blatantly wrong in the name of speed and the devs & supporters usually suggested pushing fixes into business logic. Examples I remember were transaction support, foreign key issues and bad/no type validation (silently accepting impossible dates like 2016-02-30 for instance).

postgres was at that point already a mature, stable, and pretty much correct RDBMS, but it wasn't as fast for certain workloads (massively read heavy ones with low levels of concurrency, IIRC) or as easy to administer badly (I wouldn't say it was harder to administer well, but mysql was more forgiving to of mistakes by people who just wanted to plug it in and not care).

marmaduke9y ago· 4 in thread

Stopped reading after no transactions. Theory question: musn't transactions reduce necessarily performance?

cruppstahlOP9y ago

Not necessarily. I ran sysbench with InnoDB and manually switched off transactions, and it was a lot slower. I don't know why - didn't look into it.

upscaledb does use transactions. I.e. if you insert a row with a primary and a secondary index then upscaledb inserts two key/value pairs in two databases. All these operations are wrapped in Transactions. They are just not yet supported on SQL level.

morgo9y ago

In InnoDB everything is a transaction. If you do not begin/commit then each statement will be its own transaction.

Batching a group of statements together in a small transaction is usually better because it reduces log flushing. ACID only needs to be guaranteed on commit. Similarly, applying transactions in parallel is faster because of group commit.

rodgerd9y ago

> Stopped reading after no transactions

Ironically, back in the day, the core MySQL team argued that was a feature, not a bug.

__jal9y ago

To be fair, that's normal, if unfortunate, competitive behavior. "We meant to do that" is the rule until there's running code correcting the initial mistake.

Less forgivable is training a couple generations of developers that all the other mistakes they made are the Right Way...

evanelias9y ago· 3 in thread

The upscaledb FAQ indicates "it is not yet concurrent; it uses a big lock to make sure that only one thread can access the upscaledb environment at a time".

InnoDB is designed for concurrency (using MVCC, granular locking, etc) so I'd expect it to be slower at single-threaded workloads than another engine that skips all that.

Only using single-threaded benchmarking is a bit misleading, imo. This is mentioned in the article but only in a small bullet point towards the bottom.

cruppstahlOP9y ago

I ran sysbench benchmarks with 30 concurrent connections. The performance gap between InnoDB and upscaledb shrank a bit, but not much.

The reason is that most of the performance is spent in MySQL and not in the key/value store, and then it does not make a big difference if the key/value store is concurrent or not.

In my experience the assumption of "concurrent = fast" is a misconception. Right now upscaledb moves certain operations (i.e. flushing dirty buffers) to the background. It is better to have fast single-threaded code instead of multi-threaded code with a huge locking overhead. A compromise would be to move the lock to the database level (instead of the Environment, which is basically the container for multiple databases), and make sure that there's no shared state between the databases. But that actually does not have that much priority for me because I do not expect to win that much performance.

evanelias9y ago

30 concurrent connections is a very low number, compared to uses of MySQL/InnoDB at real scale. What happens at higher numbers?

I am not arguing that "concurrent = fast". My point is real systems have a higher level of concurrency as a baseline.

InnoDB supports granular concurrent access because real workloads need this. Systems that have poor stories around concurrency -- MyISAM, Redis, pre-WiredTiger MongoDB -- definitely hit real scalability issues under high-volume workloads.

velox_io9y ago

There's literally a dozen ways to structure/ design a storage engine, every one of them has their pros and con's.

MVCC is so popular because it makes ACID compliance much easier to implement, but there is some additional read latency (and management overhead) because storage layout is disconnected from the natural layout.

A storage engine which sticks to the basics could be very fast, with predictable low-latency.

With regards to threading. I know locks are a dirty word, but you do need a single version of truth somewhere. Either this is done through locking or an allocator. Going single-threaded is a valid way to remove the overhead of locking (plus no race conditions!).

Hopefully a single writer, multiple readers is available soon.

spotman9y ago· 2 in thread

Performance is great, but what is more interesting is maintaining that performance in the midst of failures and things going wrong.

I don't know anyone who chooses InnoDB and says "gee I wish it were faster". While it isn't the fastest show in town, it is a known quantity, and how it breaks is well understood.

So if I was going to use a different storage engine, would want something more than its a bit faster, in some cases.

Having said that, what a wonderful amount of work, and don't stop hacking on it!

mey9y ago

The most recent break I encountered was ibtmp1 exploding to 100's of GB and killing the box with InnoDB, so I'm not sure "how it breaks is well understood".

philjohn9y ago

Funnily enough I encountered something similar years ago on Microsoft SQL Server 2005.

Never got to the bottom of it either.

1 more reply

tombert9y ago· 2 in thread

This looks pretty sexy. Are there atomicity guarantees?

cruppstahlOP9y ago

BEGIN/COMMIT is not yet supported. But that's on my TODO list.

Animats9y ago

Ulp. The whole point of the InnoDB engine is that it supports atomic transactions, even over crashes. If you want an unsafe table, there's the ISAM engine, which is faster.

What does "100% compatible to InnoDB" mean here? It's not functionally compatible and it's not disk-file-format compatible.

1 more reply

morgo9y ago· 1 in thread

MySQL Product Manager here. Any feedback on the Storage Engine API?

cruppstahlOP9y ago

It's ok! The example storage engine could implement more functionality, just to demonstrate i.e. how rows are serialized/deserialized etc. The test suite was very helpful.

_Codemonkeyism9y ago

My physics teacher would have slapped those diagrams around my head asking "Is the y-axis 6000 apples?"

daenney9y ago

I think this could be pretty neat, it'll be interesting to see how it evolves. However, looking at the comments people have left here I would suggest putting up a table/comparison with InnoDB side by side, detailing which features are implemented, which are not and which are but aren't available through the SQL layer yet.

This would also put the "compatible with InnoDB" thing in perspective for people and potentially clear up some other confusion.

Domenic_S9y ago

Label your axes!

j / k navigate · click thread line to collapse

30 comments

25 comments · 9 top-level

endymi0n9y ago· 4 in thread

No transactions? Single-threaded? Global lock?

nine_k9y ago

endymi0n9y ago

ignoramous9y ago

Thanks for pitching in. I'd like to know more about Postgres (correct->fast) and MySQL (fast->correct) comparison here. Can you pls share technical links that go in details about these? Thanks again.

dspillett9y ago

marmaduke9y ago· 4 in thread

Stopped reading after no transactions. Theory question: musn't transactions reduce necessarily performance?

cruppstahlOP9y ago

Not necessarily. I ran sysbench with InnoDB and manually switched off transactions, and it was a lot slower. I don't know why - didn't look into it.

morgo9y ago

In InnoDB everything is a transaction. If you do not begin/commit then each statement will be its own transaction.

rodgerd9y ago

> Stopped reading after no transactions

Ironically, back in the day, the core MySQL team argued that was a feature, not a bug.

__jal9y ago

To be fair, that's normal, if unfortunate, competitive behavior. "We meant to do that" is the rule until there's running code correcting the initial mistake.

Less forgivable is training a couple generations of developers that all the other mistakes they made are the Right Way...

evanelias9y ago· 3 in thread

The upscaledb FAQ indicates "it is not yet concurrent; it uses a big lock to make sure that only one thread can access the upscaledb environment at a time".

InnoDB is designed for concurrency (using MVCC, granular locking, etc) so I'd expect it to be slower at single-threaded workloads than another engine that skips all that.

Only using single-threaded benchmarking is a bit misleading, imo. This is mentioned in the article but only in a small bullet point towards the bottom.

cruppstahlOP9y ago

I ran sysbench benchmarks with 30 concurrent connections. The performance gap between InnoDB and upscaledb shrank a bit, but not much.

The reason is that most of the performance is spent in MySQL and not in the key/value store, and then it does not make a big difference if the key/value store is concurrent or not.

evanelias9y ago

30 concurrent connections is a very low number, compared to uses of MySQL/InnoDB at real scale. What happens at higher numbers?

I am not arguing that "concurrent = fast". My point is real systems have a higher level of concurrency as a baseline.

velox_io9y ago

There's literally a dozen ways to structure/ design a storage engine, every one of them has their pros and con's.

A storage engine which sticks to the basics could be very fast, with predictable low-latency.

Hopefully a single writer, multiple readers is available soon.

spotman9y ago· 2 in thread

Performance is great, but what is more interesting is maintaining that performance in the midst of failures and things going wrong.

I don't know anyone who chooses InnoDB and says "gee I wish it were faster". While it isn't the fastest show in town, it is a known quantity, and how it breaks is well understood.

So if I was going to use a different storage engine, would want something more than its a bit faster, in some cases.

Having said that, what a wonderful amount of work, and don't stop hacking on it!

mey9y ago

The most recent break I encountered was ibtmp1 exploding to 100's of GB and killing the box with InnoDB, so I'm not sure "how it breaks is well understood".

philjohn9y ago

Funnily enough I encountered something similar years ago on Microsoft SQL Server 2005.

Never got to the bottom of it either.

1 more reply

tombert9y ago· 2 in thread

This looks pretty sexy. Are there atomicity guarantees?

cruppstahlOP9y ago

BEGIN/COMMIT is not yet supported. But that's on my TODO list.

Animats9y ago

Ulp. The whole point of the InnoDB engine is that it supports atomic transactions, even over crashes. If you want an unsafe table, there's the ISAM engine, which is faster.

What does "100% compatible to InnoDB" mean here? It's not functionally compatible and it's not disk-file-format compatible.

1 more reply

morgo9y ago· 1 in thread

MySQL Product Manager here. Any feedback on the Storage Engine API?

cruppstahlOP9y ago

It's ok! The example storage engine could implement more functionality, just to demonstrate i.e. how rows are serialized/deserialized etc. The test suite was very helpful.

_Codemonkeyism9y ago

My physics teacher would have slapped those diagrams around my head asking "Is the y-axis 6000 apples?"

daenney9y ago

This would also put the "compatible with InnoDB" thing in perspective for people and potentially clear up some other confusion.

Domenic_S9y ago

Label your axes!

j / k navigate · click thread line to collapse