GitHub's online schema migration for MySQL (opens in new tab)

(github.com)

248 pointsqiuyesuifeng7y ago94 comments

94 comments

At NoRedInk, We've been using gh-ost for a few years now, and it's been a pleasure.

- The ability to control a running migration is crucial. We have pretty predictable load, and we generally run long-running migrations during off-peak hours. If a migration runs longer than we were expecting and might run into peak hours, we can pause the migration and have the migration not impact users.

- hooks make it trivial to integrate with other tools. Right now it reports to slack, but if we used it more, we'd likely hook it up to real monitoring infrastructure.

- there's a lot of default behavior that we want. I'd recommend regular users wrap their best practices in another script and not call gh-ost directly. It's nice to not worry about good defaults for e.g. throttling, or worrying about whether ghost is hooked up to some kind of external monitoring.

killbrad7y ago

I'm probably really ignorant asking this, but how do you "pause" schema migrations period. And even if you did, how do you ensure a consistent experience for your users if your db is broken? Some sort of application logic to deal with inconsistencies? That seems really expensive (from a development work perspective).

fipar7y ago

With gh-ost migrations are performed on a copy of the table. This, combined with the way data is copied to this table mean: - you can pause just by suspending the copy, - changes are invisible until the end, when tables are swapped.

The first point depends on the mechanism used to keep up with changes to the original table. You can’t fully pause migrations on pt-online-schema-change for example, as it leverages triggers for that part.

From my phone so sorry if too brief, gh-ost’s docs are great and would tell the whole story.

inyourtenement7y ago

It's all described in the readme, but generally online schema change tools work by creating a new table, copying the data from the old table over, somehow keeping track of new writes to the old table, and then syncing those over. At the end the tables are swapped. With gh-ost you can pause the writes to the new table.

radicality7y ago

Not OP but I’m familiar with the topic and run similar tooling on large clusters. By pause he probably means prevent it from starting on more databases and let whatever is inflight finish. For the second point, correct, your application needs to handle both schemas during transition. When that’s done, you can rip out the unneeded logic from your application.

1 more reply

perlgeek7y ago

We evaluated gh-ost, but the killer for us is that it doesn't support any kind of foreign keys.

I understand that at GitHub's scale, foreign keys might be more of a hassle than what they are worth, but for a smallish company that values data integrity over scale and uptime, this is not an acceptable choice.

shlomi-noach7y ago

Author of gh-ost here. Actually, it should be possible to support child-side foreign keys. They would have to be named differently (the foreign key constraint has a unique name in a schema) -- but it should work. See discussion in https://github.com/github/gh-ost/issues/507

It is true that it is not on our roadmap to implement FK support for gh-ost (see https://github.com/github/gh-ost/issues/331), but if anyone wishes to contribute support for FK we're grateful. We've had more complex contributions coming from the community and we're grateful for those.

killbrad7y ago

Child-side foreign keys? Does that mean constraints in app logic instead of the database?

1 more reply

cup-of-tea7y ago

I'm interested to know what companies are doing "at scale" to not need to use foreign keys. Do they just write user ids or whatnot into other tables?

barrkel7y ago

Foreign keys add locks to referenced rows during insert and update transactions. Bad news if you're locking those rows for other reasons, like a kind of distributed lock. So the problem with FKs isn't that you don't need them; it's that the extra locking limits concurrency.

FK checks also affect performance, of course. Where I work, we disable FKs on our bulk inserts but keep them enabled otherwise and also in tests; but our workload is different from the usual consumer web app, we have multi-million row inserts per user, and no more than 100 users or so per customer, who each get their own tenant DB.

Chilinot7y ago

I'm guessing they are validating the constraints in their applications instead of their databases. While this puts more requirements on your setup and developers, it offloads a lot of stress from your database.

2 more replies

sa467y ago

Google's Spanner doesn't support foreign keys but does support interleaved tables to cover some use cases of foreign keys. Spanner sees wide use inside Google.

The usual routes for data consistency between tables are batch clean up or in-app validation.

zzzeek7y ago

Are you referring to ON UPDATE cascades? Mutable/natural primary key values are not very common these days especially in a DB like MySQL. Incrementing integers are most common.

pwnna7y ago

This is an unfortunate by product due to the way that gh-ost is implemented. It is simply not possible to run it with a FK constraint. The reason is that since it replays the binlogs on the ghost table while the ghost table is not fully populated, the FK constraint will cause some of statements to fail. The data move from the original to the ghost table cannot be completed.

codedokode7y ago

Cannot it add FK constraints after the ghost table is fully populated?

2 more replies

yxhuvud7y ago

What? Why would enforced consistency be less worth at bigger scale? My guess would have been the complete opposite.

sjansen7y ago

It's not a question of worth but feasibility. Just like an ideal schema is fully normalized, but performance concerns sometimes drive denormalization. When foreign keys can't be used to enforce data integrity, the application has to be built to compensate in other ways. Sometimes that means simply accepting dirty data, and designing the application to stay robust when encountering unexpected data. Other times it means building alternate solutions to discover and repair data issues.

sciurus7y ago

shlomi-noach linked to https://github.com/github/gh-ost/issues/331 in another comment. That goes into some of the reasons to avoid foreign keys.

At a past job where we had a complex MySQL setup, I set up a slack autoresponse to post "Just say no!" anytime someone mentioned foreign keys. :-)

anton_gogolev7y ago

How could foreign keys ever be a "hassle"?

shlomi-noach7y ago

They can be a hassle when you want to shard your data, having outgrown your single-instance capacity. You will either shard functionally (extracting complete tables to other database servers), in which case FKs will completely break, or horizontally (split rows across database servers), in which case you may or may not be able to still use FKs.

They're also a performance impact on large tables since inserts/deletes must make multiple trips to the tables/indexes. That's a growing operational hassle as tables grow larger.

1 more reply

throwawaypls7y ago

Back when I worked for Shopify, I got a chance to work on something similar -- GhostFerry(https://github.com/shopify/ghostferry), which allows for doing all sorts of migrations, that too between various databases.

It was recently open-sourced. Do take a look.

pwnna7y ago

Hey! I'm the current maintainer of Ghostferry. Thank you for all your work!

For the reader here: one thing to clarify here is that gh-ost performs schema migration via a data migration between two different tables and it does it via a very efficient way. Ghostferry on the other hand is general purpose data migration library that moves data between different databases, most likely different hosts. Frequently, both schema migration and data migrations are abbreviated as migrations and thus may cause some confusion. The domain of operation of Ghostferry do not necessarily overlap with gh-ost, as it would be very inefficient to use Ghostferry to implement gh-ost.

That said it is a very interesting project on its own as it has a lot of potential use cases. I don't want to hijack the thread any further than I already have so if anyone has any further questions, you can contact information and docs in the repo.

viraptor7y ago

I used it and it's really impressive. Works as described. The only issue with this is that you can't easily use it without understanding how it works. It's more of a system you have to own rather than a tool you can use, so you can't just point a new person at it and go "just run this".

groodt7y ago

I agree. I've used it a lot too, but only after a few test runs against some snapshots to get familiar with the operational aspects of it.

analogmemory7y ago

So my understanding is that this is for migrating a db to a new one? Can someone explain like I was beginner why/how'd you would use this?

stakecounter7y ago

In certain scenarios if you need to modify the schema for a table in MySQL it will lead to the entire table being locked, and for large tables this could lead to a noticeable outage for users if you need to run queries on that table. One case I had where we faced this problem was changing the primary key for a table from 32 bit to 64 bit ints since we were running out of space. We used Percona's online schema change tool for handling this, which wrapped the creation of a new 'ghost' table (which has the target schema you want), rate limited writes from original table to ghost table, triggered writes from original table to ghost table as new writes came in, and finally a table rename from the ghost table back to the original table name in order to perform the full migration with no data loss or outage.

Sounds like this tool is doing something similar but avoiding the use of triggers for flexibility.

toomanybeersies7y ago

We had to do something similar at my old job, but rather than migrating to a different schema, we were migrating our moderately sized DB (tens of gigabytes) from MySQL to Postgres.

We dual wrote to both DBs while we copied the existing data to the new DB, then switched them over. I think we had less than 5 minutes of downtime all up.

1 more reply

manigandham7y ago

Modern advice: always use 64 bit integer ids. If it's a small table, it won't matter. If it's a big table, you'll need them anyway.

2 more replies

analogmemory7y ago

Ah ok! This sound like a great tool then. I have no need for it, but good one to star for a day when I might need it :)

Existenceblinks7y ago

That's really old and still good strategy. [off-topic] I've heard this first time from a novel (1964).

Flynn.io uses the same kind of strategy; transaction log && async replication (https://flynn.io/docs/databases)

A little sad nanobox.io which one of my app running on has an inferior strategy; temporarily offline at the last sync moment (https://docs.nanobox.io/data-management/data-migrations-scal...)

AdamJacobMuller7y ago

This is a really amazing, very well designed and thought out, tool that solves a problem that should never exist.

pkulak7y ago

Holy crap, an alternative to Percona? Why does MySQL get two awesome tools and Postgres nothing?

josegonzalez7y ago

Postgres supports transactional DDL statements natively, and many alter table statements don't end up locking the table nearly as severely as some MySQL versions do.

michaeldejong7y ago

Actually both lock for many (crucial) schema operators, and often severely enough to block your application from reading from the table(s) under change. I've been researching this stuff for a while. Check out http://blog.minicom.nl/blog/2015/04/03/revisiting-profiling-... . It's slightly outdated, but still holds.

fipar7y ago

That is true, but I wanted to share another angle that may or may not affect PostgreSQL while it continues to affect MySQL even as it has crash-safe (though not transactional) DDL now: these schema changes are online for the master, but are not replication-aware and can have impact in replication delay on servers down the hierarchy.

For this reason alone I think we'll continue to use schema-change tools on MySQL even if the server itself becomes better at those.

In the specific case of gh-ost, another good point is that migrations can be completely paused, which in MySQL is not true of online DDL.

michaeldejong7y ago

I felt the same way, so I've been working on QuantumDB for the last couple of years. Take a look at https://quantumdb.io . QuantumDB doesn't use the binlog / WAL log like gh-ost does, but it does support foreign key constraints, and it allows you to perform several schema operations in one go without having to deal with the intermediates. It's still not ready for production, but feel free to try it out. Feedback is welcome!

sethhochberg7y ago

There are several other notable options - SoundCloud's LHM, Facebook's online schema change tool, etc. They all have their different quirks.

(And as modern MySQL releases get better online DDL support, become less and less critical - though still useful for all of those edge cases where native lockless online DDLs can't work yet)

Scarbutt7y ago

mysql is a bigger target market.

timewarrior7y ago

We recently started investing in Postgres because of support of JSON fields and nested indexes in those fields. Should we have chosen MySQL?

5 more replies

thathoo7y ago

Square also its online schema migration tool that is open source here: https://github.com/square/shift

Its pretty cool. Check it out as well.

sciurus7y ago

That's not a schema migration tool per-se. It's a web interface for managing running a schema migration tool (in their case the venerable pt-osc, but there is an open issue for supporting gh-ost too).

ceohockey607y ago

Very cool! Curious, does this leverage this go-mysql library at all? https://github.com/siddontang/go-mysql

tejasmanohar7y ago

Yes, https://github.com/github/gh-ost/search?utf8=%E2%9C%93&q=go-...

kd227y ago

Can someone shed some light on how this tool compares to something like Flyway?

bpicolo7y ago

It's an alternative to e.g. pt-online-schema-change [0]. The problem is that, for very large mysql tables / clusters, running DDL against the tables live will lock up reads/writes against the table for ages. These tools allow you to run those changes without taking downtime.

https://www.percona.com/doc/percona-toolkit/LATEST/pt-online...

magoon7y ago

I believe RDS uses this same technique for instance resize/replace.

zmoazeni7y ago

We use gh-ost at Harvest[1] and it's a dream in comparison to manually migrating on a replica and switching master/slave roles [2].

Also the linked post[3] in the readme hit us very close to home. We originally tried some of our migrations with pt-online-schema-change, which was great in theory but caused a lot of locking contention during the actual process.

I see many people hammering on the lack of foreign key support which is interesting to me. At some point, a database system grows to where relying on MySQL's Online DDL[4] "works" but not really with production load. I feel like a team knows when they need to bring in a tool like this.

The dev in me understands how wonderful FKs are for consistency. But the db-guy in me that has had to deal with locking issues recognizes FKs as a tradeoff, not dogma.

If you shy away from migrating your large or busy tables, or are scheduling frequent maintenance down times in order to migrate these tables, that's when gh-ost (and others) are appropriate to evaluate.

So for us it's not an immediate red flag that gh-ost doesn't support FKs. We just have to work around that limitation[5] because the alternatives are much worse.

For the record, we don't gh-ost all of our migrations. Only the ones that are deemed sufficiently large enough are gh-osted and those heuristics will change from team-to-team.

But as a guy who has had to deal with our database issues AND as a developer who doesn't want to be chained by a database design decision from a decade ago, I love the flexibility gh-ost gives us as we continue to grow.

[1] https://www.getharvest.com/

[2] https://dev.mysql.com/doc/refman/5.6/en/replication-features...

[3] https://dev.mysql.com/doc/refman/5.6/en/replication-features...

[4] https://dev.mysql.com/doc/refman/5.6/en/innodb-create-index-...

[5] https://github.com/github/gh-ost/issues/507#issuecomment-338...

z3t47y ago

I was investigating using the binary log for another project a few years ago, but came to the conclusion that it's too hard to work with ... I don't remember any details though, maybe someone can fill me in ?

qaq7y ago

You can jump through hoops or just use an RDBMS that supports transactional DDL.

shlomi-noach7y ago

Author of gh-ost here. Here are my thoughts on migrating to a different RDBMS: http://code.openark.org/blog/mysql/mysql-vs-postgresql-gh-os...

qaq7y ago

Everything is a choice that has Pros and Cons. For me personally outside of any technical considerations a simple rule applies: anything that has Oracle IP I want to avoid. If anyone thinks there is even a 1% chance Oracle is not planing to recoup it's investment into MySQL by royally f#$%ing over the people using it in some horrifically unethical manner I have a bridge to sell you.

dtech7y ago

That does not solve the problem. Transactional DDL still needs a full table lock for most operations, which on large tables can take minutes to hours. Then it's not really an online schema migration anymore.

c2h5oh7y ago

Depends on a migration. Postgres can add / drop a column to a table with a billion rows in milliseconds as long as you don't provide a default value for the new column.

1 more reply

viraptor7y ago

Also, even if the lock is not used, when you're changing an indexed column, you need to rebuild that index. In most production environments you just can't say "we're going to serve all the traffic without this index for a few hours" - that would kill the service (or a part of it if you're lucky and can disable it)

1 more reply

qaq7y ago

Transactional DDL solves the problem for a large % of use cases. I remember a study that had average prod db size among other things and it was something less than 10GB if memory serves.

mplewis7y ago

Sometimes installing Ghost is a better tactical choice than migrating an entire running system to a new RDBMS.

j / k navigate · click thread line to collapse

94 comments

MichaelGlass7y ago

At NoRedInk, We've been using gh-ost for a few years now, and it's been a pleasure.

- hooks make it trivial to integrate with other tools. Right now it reports to slack, but if we used it more, we'd likely hook it up to real monitoring infrastructure.

killbrad7y ago

fipar7y ago

From my phone so sorry if too brief, gh-ost’s docs are great and would tell the whole story.

inyourtenement7y ago

radicality7y ago

1 more reply

perlgeek7y ago

We evaluated gh-ost, but the killer for us is that it doesn't support any kind of foreign keys.

shlomi-noach7y ago

killbrad7y ago

Child-side foreign keys? Does that mean constraints in app logic instead of the database?

1 more reply

cup-of-tea7y ago

I'm interested to know what companies are doing "at scale" to not need to use foreign keys. Do they just write user ids or whatnot into other tables?

barrkel7y ago

Chilinot7y ago

2 more replies

sa467y ago

Google's Spanner doesn't support foreign keys but does support interleaved tables to cover some use cases of foreign keys. Spanner sees wide use inside Google.

The usual routes for data consistency between tables are batch clean up or in-app validation.

zzzeek7y ago

Are you referring to ON UPDATE cascades? Mutable/natural primary key values are not very common these days especially in a DB like MySQL. Incrementing integers are most common.

pwnna7y ago

codedokode7y ago

Cannot it add FK constraints after the ghost table is fully populated?

2 more replies

yxhuvud7y ago

What? Why would enforced consistency be less worth at bigger scale? My guess would have been the complete opposite.

sjansen7y ago

sciurus7y ago

shlomi-noach linked to https://github.com/github/gh-ost/issues/331 in another comment. That goes into some of the reasons to avoid foreign keys.

At a past job where we had a complex MySQL setup, I set up a slack autoresponse to post "Just say no!" anytime someone mentioned foreign keys. :-)

anton_gogolev7y ago

How could foreign keys ever be a "hassle"?

shlomi-noach7y ago

They're also a performance impact on large tables since inserts/deletes must make multiple trips to the tables/indexes. That's a growing operational hassle as tables grow larger.

1 more reply

throwawaypls7y ago

It was recently open-sourced. Do take a look.

pwnna7y ago

Hey! I'm the current maintainer of Ghostferry. Thank you for all your work!

viraptor7y ago

groodt7y ago

I agree. I've used it a lot too, but only after a few test runs against some snapshots to get familiar with the operational aspects of it.

analogmemory7y ago

So my understanding is that this is for migrating a db to a new one? Can someone explain like I was beginner why/how'd you would use this?

stakecounter7y ago

Sounds like this tool is doing something similar but avoiding the use of triggers for flexibility.

toomanybeersies7y ago

We had to do something similar at my old job, but rather than migrating to a different schema, we were migrating our moderately sized DB (tens of gigabytes) from MySQL to Postgres.

We dual wrote to both DBs while we copied the existing data to the new DB, then switched them over. I think we had less than 5 minutes of downtime all up.

1 more reply

manigandham7y ago

Modern advice: always use 64 bit integer ids. If it's a small table, it won't matter. If it's a big table, you'll need them anyway.

2 more replies

analogmemory7y ago

Ah ok! This sound like a great tool then. I have no need for it, but good one to star for a day when I might need it :)

Existenceblinks7y ago

That's really old and still good strategy. [off-topic] I've heard this first time from a novel (1964).

Flynn.io uses the same kind of strategy; transaction log && async replication (https://flynn.io/docs/databases)

A little sad nanobox.io which one of my app running on has an inferior strategy; temporarily offline at the last sync moment (https://docs.nanobox.io/data-management/data-migrations-scal...)

AdamJacobMuller7y ago

This is a really amazing, very well designed and thought out, tool that solves a problem that should never exist.

pkulak7y ago

Holy crap, an alternative to Percona? Why does MySQL get two awesome tools and Postgres nothing?

josegonzalez7y ago

Postgres supports transactional DDL statements natively, and many alter table statements don't end up locking the table nearly as severely as some MySQL versions do.

michaeldejong7y ago

fipar7y ago

For this reason alone I think we'll continue to use schema-change tools on MySQL even if the server itself becomes better at those.

In the specific case of gh-ost, another good point is that migrations can be completely paused, which in MySQL is not true of online DDL.

michaeldejong7y ago

sethhochberg7y ago

There are several other notable options - SoundCloud's LHM, Facebook's online schema change tool, etc. They all have their different quirks.

(And as modern MySQL releases get better online DDL support, become less and less critical - though still useful for all of those edge cases where native lockless online DDLs can't work yet)

Scarbutt7y ago

mysql is a bigger target market.

timewarrior7y ago

We recently started investing in Postgres because of support of JSON fields and nested indexes in those fields. Should we have chosen MySQL?

5 more replies

thathoo7y ago

Square also its online schema migration tool that is open source here: https://github.com/square/shift

Its pretty cool. Check it out as well.

sciurus7y ago

ceohockey607y ago

Very cool! Curious, does this leverage this go-mysql library at all? https://github.com/siddontang/go-mysql

tejasmanohar7y ago

Yes, https://github.com/github/gh-ost/search?utf8=%E2%9C%93&q=go-...

kd227y ago

Can someone shed some light on how this tool compares to something like Flyway?

bpicolo7y ago

https://www.percona.com/doc/percona-toolkit/LATEST/pt-online...

magoon7y ago

I believe RDS uses this same technique for instance resize/replace.

zmoazeni7y ago

We use gh-ost at Harvest[1] and it's a dream in comparison to manually migrating on a replica and switching master/slave roles [2].

The dev in me understands how wonderful FKs are for consistency. But the db-guy in me that has had to deal with locking issues recognizes FKs as a tradeoff, not dogma.

So for us it's not an immediate red flag that gh-ost doesn't support FKs. We just have to work around that limitation[5] because the alternatives are much worse.

For the record, we don't gh-ost all of our migrations. Only the ones that are deemed sufficiently large enough are gh-osted and those heuristics will change from team-to-team.

[1] https://www.getharvest.com/

[2] https://dev.mysql.com/doc/refman/5.6/en/replication-features...

[3] https://dev.mysql.com/doc/refman/5.6/en/replication-features...

[4] https://dev.mysql.com/doc/refman/5.6/en/innodb-create-index-...

[5] https://github.com/github/gh-ost/issues/507#issuecomment-338...

z3t47y ago

qaq7y ago

You can jump through hoops or just use an RDBMS that supports transactional DDL.

shlomi-noach7y ago

Author of gh-ost here. Here are my thoughts on migrating to a different RDBMS: http://code.openark.org/blog/mysql/mysql-vs-postgresql-gh-os...

qaq7y ago

dtech7y ago

c2h5oh7y ago

Depends on a migration. Postgres can add / drop a column to a table with a billion rows in milliseconds as long as you don't provide a default value for the new column.

1 more reply

viraptor7y ago

1 more reply

qaq7y ago

Transactional DDL solves the problem for a large % of use cases. I remember a study that had average prod db size among other things and it was something less than 10GB if memory serves.

mplewis7y ago

Sometimes installing Ghost is a better tactical choice than migrating an entire running system to a new RDBMS.

j / k navigate · click thread line to collapse