story

Exploring performance differences between Amazon Aurora and vanilla MySQL (opens in new tab)

plaid.com

137 pointsbjacokes4y ago20 comments

20 comments

This is a wonderful article. I recently discovered an aurora gem that saved me from some flakey mysqldump: you can save the results directly to s3 with a “select into outfile s3” query. This is, according to the docs, an optimized operation when run from a read replica.

https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide...

roopawl4y ago

Every once in a while there is a well written blog post about database internal. Uber's Postgres-MySql switch saga produced a few of them. This one is pretty good too

jeandenis4y ago

We worked closely with AWS on this (problem and blog) and they were great and quite transparent. Glad it's interesting/useful to you.

slownews454y ago

The simplest is probably read committed especially if like many ETL jobs you are just going to grab stuff using one read for further processing. Another option, do a read committed and omit last 15 minutes of data if you are doing long running jobs to avoid churn at end of tables / logs.

I see folks doing serializable reads for historic ETL jobs with one read in the transaction - why? Is there some history / tool issue I'm not familiar with?

bjacokesOP4y ago

For Aurora MySQL, the default for read-only replicas is repeatable read. As we mentioned towards the end of the post, read committed support appears to have been introduced to Aurora MySQL just last year. But you're right – now that it's supported, switching to read committed is by far the easiest fix.

No idea why people would be using serializable reads for ETL jobs though! :O

slownews454y ago

My own guess was that some ETL jobs were really data integrity jobs - in which case folks got used to higher levels of isolation being necessary across many reads to avoid false positives on their cross check stuff maybe.

whs4y ago

We had similar problem where a running ETL job caused a production outage due to binlog pressure.

One thing that surprised us that our TAM says that on a 1 AZ write-heavy workload normal MySQL would have higher performance as Aurora synchronously write to storage servers in other AZs. On immediate read-after-write workload that would mean it would take longer time to acquire lock.

frakkingcylons4y ago

> One thing that surprised us that our TAM says that on a 1 AZ write-heavy workload normal MySQL would have higher performance as Aurora synchronously write to storage servers in other AZs

What is surprising about a multi-AZ database having higher latency than one that runs in only one AZ?

bjacokesOP4y ago

From what I can tell, they provisioned their DB instance(s) in a single AZ, but weren't aware that Aurora automatically provisions its own storage and always uses multiple AZs. We touch on the separation of compute and storage in the post.

I think the surprise is that it's not possible to have a truly "single AZ" Aurora database, even though you might have thought you provisioned your DB instances that way.

2 more replies

bjacokesOP4y ago

This seems plausible given our understanding of the database internals. In general we found our AWS contacts to be knowledgeable and forthcoming about complex tradeoffs between Aurora and vanilla MySQL, even if some of that information is hard or impossible to find in the docs.

georgewfraser4y ago

I wonder why Aurora shares undo logs between replicas? It’s perfectly possible for the read replicas to each re-create their own copy of the undo logs, and retain those undo logs for different durations based on the different long-running queries on each replica.

user39393824y ago

It seems like the benefits of cloud infrastructure have normalized vendor lock-in.

I’ve never used Aurora because I don’t want to code anything to the idiosyncrasies of AWS (or any other cloud provider).

sofixa4y ago

Aurora has compatibility layers and you interact with it as with normal MySQL, MariaDB or PgSQL. Of course there are some underlying differences, but the code and most of the tooling stay the same.

exabrial4y ago

One huge difference is in locking we discovered. Do not expect Aurora to do you any kind of favors in you actually use these features.

shepardrtc4y ago

Really great article! I have a question: in it you say to keep an eye on RollbackSegmentHistoryListLength, and I want to do that, but I don't know at what number does it become something to worry about. There doesn't seem to be any guidance on AWS' site. I'm seeing ranges of 1,000 to 5,000 and sometimes 100,000.

bjacokesOP4y ago

Great question, although I'm not sure there's a concrete answer to it other than "it depends". You can think of that metric as representing the number of logs that haven't been garbage collected, so as it goes up, performance will get worse.

If you're seeing spikes in RollbackSegmentHistoryListLength that coincide with dips in DB performance, you've probably identified the culprit. In the scenario described in our post, that metric would have grown monotonically for the duration of the long-lived ETL query – probably a more overt problem than what you're describing with short spikes to 100,000.

shepardrtc4y ago

A number of our 100k spikes spanned about a day, and a cluster of them seem to coincide with serious performance issues we have encountered. We "solved" the problem by increasing the instance size, but I'm starting to see spikes that get larger and larger, so I suspect we will run into this issue again. But now I have something to report on and watch out for. Thank you!

wooly_bully4y ago

H3 tags on this could really use a bump in size and contrast from the regular text.

j / k navigate · click thread line to collapse

20 comments

encoderer4y ago

https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide...

roopawl4y ago

Every once in a while there is a well written blog post about database internal. Uber's Postgres-MySql switch saga produced a few of them. This one is pretty good too

jeandenis4y ago

We worked closely with AWS on this (problem and blog) and they were great and quite transparent. Glad it's interesting/useful to you.

slownews454y ago

I see folks doing serializable reads for historic ETL jobs with one read in the transaction - why? Is there some history / tool issue I'm not familiar with?

bjacokesOP4y ago

No idea why people would be using serializable reads for ETL jobs though! :O

slownews454y ago

whs4y ago

We had similar problem where a running ETL job caused a production outage due to binlog pressure.

frakkingcylons4y ago

> One thing that surprised us that our TAM says that on a 1 AZ write-heavy workload normal MySQL would have higher performance as Aurora synchronously write to storage servers in other AZs

What is surprising about a multi-AZ database having higher latency than one that runs in only one AZ?

bjacokesOP4y ago

I think the surprise is that it's not possible to have a truly "single AZ" Aurora database, even though you might have thought you provisioned your DB instances that way.

2 more replies

bjacokesOP4y ago

georgewfraser4y ago

user39393824y ago

It seems like the benefits of cloud infrastructure have normalized vendor lock-in.

I’ve never used Aurora because I don’t want to code anything to the idiosyncrasies of AWS (or any other cloud provider).

sofixa4y ago

Aurora has compatibility layers and you interact with it as with normal MySQL, MariaDB or PgSQL. Of course there are some underlying differences, but the code and most of the tooling stay the same.

exabrial4y ago

One huge difference is in locking we discovered. Do not expect Aurora to do you any kind of favors in you actually use these features.

shepardrtc4y ago

bjacokesOP4y ago

shepardrtc4y ago

wooly_bully4y ago

H3 tags on this could really use a bump in size and contrast from the regular text.

j / k navigate · click thread line to collapse