Ask HN: How does your development team handle database migrations?

269 pointsabetlen7y ago146 comments

What's your team workflow for delivering features that require database migrations. How do you keep migrations from slowing down development as your team grows.

146 comments

124 comments · 55 top-level

liyanchang7y ago· 18 in thread

I've been really happy with how my current company[0] has been doing migrations and I've seen a couple others do it but it seems like it should be more widespread.

Database Schema as Code

Instead of writing up and down migrations, you define what the end state should look like. Then the computer will figure out how to get here. This is just how the industry started managing server configurations (Puppet) and infrastructure (Terraform).

We use protocol buffers so it was pretty straight forward to have a definition of what our tables should look like. We have a script that figures out what the delta is between two states (either proto files or a db) and can calculate the schema migration SQL (e.g. CREATE TABLE, etc).

From there, we run it through a safety check. Any unsafe migration (either for data loss or performance issues e.g. DROP TABLE) requires an extra approval file.

There's no real difference between an up migration and a down migration (except that one tends to result in an unsafe migrations). It's calculable at CI time so we can give devs a chance to look at what it's going to do and approve any unsafe migrations. API compatability checks enforce that you need to deprecate before you can drop.

DML, that is data changes, are handled via standard check in a sql file and CI will run it before the code deploy and after the schema migration.

Alembic is the one other place I've seen this concept (a couple others mentioned this) so it's not new, but surprised I haven't seen it more places.

[0] Shameless plug: We're hiring if you're interested in changing how healthcare is paid for, delivered, and experienced. https://www.devoted.com/about/tech-jobs/

evanelias7y ago

I call this declarative schema management, since the repo declares the desired state, and the tooling knows how to reach this state. This concept is finally catching on lately, although some huge companies have already been doing it this way for quite some time. Facebook is a key example; they've managed their schema changes in a pure-SQL declarative fashion, company-wide, for nearly a decade.

I'm developing a suite of tools [1] to provide declarative schema management and "schema change by pull request" functionality, initially targeting MySQL and MariaDB. A few large companies have built pipelines using one of my tools -- including Twilio SendGrid, who wrote about their process in-depth recently [2].

[1] https://skeema.io

[2] https://sendgrid.com/blog/schema-management-with-skeema/

liyanchang7y ago

This is good to know. As someone who didn't do much with databases before, I was frankly worried given how it didn't seem like many others were taking this approach when it made so much sense (we did have the advantage of having a defined schema which I know isn't always available). Seems like I just didn't know what to search for.

Git would never have worked it required devs to write the up/down patches - why should we have to write the up/down migrations for my schema?

Excited to see more tooling around declarative schema!

meowface7y ago

I'm surprised this isn't more of a thing. It seems like the natural evolution of "[X] as code". I've always been a little turned off by migrations (though they were certainly an improvement over the previous situation, which was basically just indeterministic changes on the fly).

2 more replies

nickserv7y ago

Curious, how do you deal with renaming fields or tables?

This is a (minor) pain point for traditional migration systems.

2 more replies

barrkel7y ago

Data migrations? Denormalizing columns from one table to one or more child tables, possibly more than one relation away? Switching one set of fields in a single table to be in a different table via a relation, converting something from 1:1 to 1:n?

The concept appeals to me, but it only seems to work for trivial migrations.

liyanchang7y ago

I’ll openly admit that we don’t have everything figured out. You’re absolutely right that currently, we constrain ourselves on what we migrate to admittedly simple migrations.

I think there’s a whole set of problems to be solved in this space and frankly, I’m as surprised as anyone that given how SQL is declarative, we use procedural code to do migrations so part of my post was hoping people would tell me what tool I should be using or how this approach fails over time. So your examples are really helpful for me as I think through if it’s possible to do automatically, workaround, or get by without.

It seems to me that we just lack the ability to express these transitions mathematically that can help us do them. And of those, there’s probably only a subset which are possible to do without taking downtime.

In particular, the class of migrations that you outlines are a combination of DDL and DML changes and also have quite a bit of code complexity to do without downtime. It’s definitely a current weakness.

evanelias7y ago

A totally valid point, but I'd argue those should be handled by a separate tool or process. Data migrations tend to be fully programmatic; tools and frameworks can help reduce the code required, but cannot handle every possible case. (having performed numerous multi-billion-row data migrations, I learned this painfully first-hand...)

For simpler cases, where it may make sense to run a data migration immediately after a schema change, a good generic middle-ground may be configurable hook scripts. A declarative schema management system can then pass relevant info to the hook (which tables were changed, for example) and then the script can run any arbitrary row data diff/apply/migrate type of logic.

I do understand your point though; for relatively straightforward data migrations, an imperative system can capture these much more cleanly by just coupling them with the corresponding schema migration code.

1 more reply

PaulMest7y ago

I am guessing that you are probably not using Python/Django... but is this any different than what Django offers?

Django allows you to define your models (schema) and then run a command that will generate the migrations. If you don't like the migration that was generated, you can modify it. You can customize up and down operations.

There are also tools that will take an existing database and generate the Django models for you.

All of these operations can also export the exact SQL that will run on each supported RDBMS platform in case you want to be extra sure on what exactly will be executed.

nickserv7y ago

Django migrations can be problematic because they're meant to be sequential and have interdependencies. I've had problems merging multiple feature branches because of this, even though there are no code conflicts.

A system like Saltstack or Puppet for databases would not have checked in migrations, these would be generated on the fly at deploy time.

So you could very well have multiple state changes in a single run, by comparing actual DB state and desired DB state, then creating the SQL code as needed for that deployment.

Honestly not having to fiddle with the migrations table on a live server seems pretty nice ;-)

This could very well turn out to be Django's next gen migration tool...

2 more replies

evanelias7y ago

That's great initially, but problems definitely crop up at scale:

* What happens when your company creates new systems that aren't Python/Django? You can either still shoehorn all migrations into Django models, or have multiple separate schema change processes/pipelines... both options are not good.

* If someone makes an out-of-band schema change manually (either by accident or to do a rapid hotfix), you're no longer on a known version. Not sure about Django, but when this happens, most traditional migration systems cannot operate without additional manual cleanups... whereas declarative tools can inherently transition any state to any other state.

* Depending on the DBMS, with large tables and certain types of ALTERs, using an external online schema change tool is greatly preferable to just running an ALTER directly.

* Does Django support sharding / the notion that a change to a sharded model must be made to multiple shards?

1 more reply

henning7y ago

Prepare for the exciting future of DevOps transformation: it's `./manage.py makemigrations`.

daoxid7y ago

This sounds nice! One Question: You said that DML changes are handled via "standard check in sql file". Does this simply mean a new SQL file for each migration? And how are DML changes connected to DDL changes? For example, if some code is two versions behind and updated to the current schema, wouldn't this mean that the DDL is updated in one step to the current state, but the DML potentially in two steps, breaking the update?

liyanchang7y ago

That's correct. The DML changes as part of CI are somewhat new so we haven't ironed it all out yet.

Here's the scenario that I think you're laying out: 1. Commit A creates column foo 2. Commit B has DML that reference column foo 3. Commit C removes column foo

This works fine if our CI deployer does each commit individually. First roll out any schema changes, then run any DML SQL.

However, our deployer might pick up all those changes and since we roll out the schema migrations first (in this case a create + drop -> NOP) and then runs the DML (which will error), this is an issue because of the rollup.

In practice, we have yet to see this case (most of the time, the dev who write the DML is close enough to the code to know if it's going to be dropped soon and we don't drop that many columns - in part because we know that there be dragons) but truthfully, I haven't thought about it much and need to think through what the impact is beyond this example. Thanks for helping me refine my thinking and I'll have something to ponder on this weekend!

1 more reply

wvenable7y ago

Can that handle column renames? Most schema-to-schema diff tools can't tell the difference between a rename and a delete/add.

liyanchang7y ago

I’ll openly admit that we don’t have everything ironed out. In fact my next big project is to tackle derived columns (rename is a column where the transformation is the identity function).

It requires a bit more finesse and integration into our code base as it requires multiple deploys with code that knows how to handle both columns.

djrobstep7y ago

Strong agree. You can do much better than than Rails/Django migrations.

I have been advocating for this approach for a while now: https://djrobstep.com/talks/your-migrations-are-bad-and-you-...

safog7y ago

Not sure about the state of the world currently after living in BigCo filter bubble for the past few years, but do you even need custom tools to calculate the delta between the schema as checked into VCS vs the database's actual state?

Spanner (https://cloud.google.com/spanner/) I think can auto-compute the diff between its current state and a given schema, generate appropriate SQL statements to perform a migration and get user confirmation for destructive schema changes.

atsaloli7y ago

> This is just how the industry started managing server configurations (Puppet) ...

Yes, and CFEngine pioneered this in 1993 (Mark Burgess).

I make my living as a CFEngine consultant.

jolmg7y ago· 7 in thread

Would be cool to have git for databases.

"oh no! our migration deleted columns without re-representing that data in the new manner, and our users have already done changes to the database so we can't simply restore from a backup!"

quick!

  dbgit checkout -b fixed-migration before-original-migration
  # *run fixed migration*
  dbgit rebase --onto fixed-migration after-original-migration master

day saved!

If only it were so simple.

ben5097y ago

The main issue is you can't "reverse migrate" creating a table or adding a column as that deletes prod data. So your base tables are always forward migrated, and probably have only basic constraints.

With good updatable view support, you can maintain versions of table views and constraints on the views that may be forward, reverse or laterally migrated. (You'd have a different set of views for each version of the schema in use, so each build of the client is connecting to exactly the schema it expects.)

Then you'll want to checkpoint the base tables. So the system must track usage of view-set versions to determine when they can be dropped entirely. Since the constraints are enforced at the view level, you have to deal with legacy data that violates current constraints, but was hidden by the views.

Then the system can do a checkpoint and delete obsolete columns and tables, moving forward the earliest version you can roll back to.

Of course, that all requires good updateable view support, and I don't think any products offer that.

awinder7y ago

That’s basically the proposition of any event sourcing system https://www.martinfowler.com/eaaDev/EventSourcing.html but you’re kinda right, that involves a level of code complication (but then it would be so simple as you described to restore / rebase / etc)

mchristen7y ago

You can run/test your migrations in a transaction and roll back if it doesn't work.

jolmg7y ago

Not all problems are obvious, though. You can have an app work 100%, your database be completely coherent, and then realize some of the data is missing.

It would be cool to not have to be so careful when committing migrations, needing to be absolutely sure that we're not screwing something up. This is similar to how one would be careful of changes done in source code before we learned how to use version control systems like git.

I can go completely wild with git, deleting random files, overwriting others with random junk, sharing them to my coworkers, etc. It would only take a few minutes to fix that.

1 more reply

jyounker7y ago

This isn't always true. In MySQL many DDL statements can't be rolled back.

See https://dev.mysql.com/doc/refman/5.7/en/cannot-roll-back.htm... for an example.

baq7y ago

sometimes, maybe, if you happen to run the right kind of DBMS...

whitepoplar7y ago

isn't this what datomic does?

smoyer7y ago· 6 in thread

I've read and reread a great article titled "Evolutionary Database Design" on Martin Fowler's web-site [0]. This article describes database migrations as being a process. We've found that for complex changes, we'll often need a pre-migration and a post-migration (temporally being before the code change and after the code change respectively).

We commit the migrations along-side the application code and in our case we use FlywayDB [1]. The only down-side is that this tool doesn't perform roll-back operations automatically. You can always do them by writing another migration that goes forward to the past.

Another popular DB migration tool is Liquibase [2]. I don't have much experience with this tool as it doesn't fit our build pipe-line as well but it does support and encourage defining a roll-back for each migration.

[0] https://www.martinfowler.com/articles/evodb.html

[1] https://flywaydb.org/

[2] http://www.liquibase.org/

EDIT: HN is the new StackOverflow? I think this is a really important question for development teams and yet I could see it being closed due to the "Questions asking for tool or library recommendations are off-topic for Stack Overflow ..." rule. Sad!

mschaef7y ago

> We commit the migrations along-side the application code and in our case we use FlywayDB [1]. The only down-side is that this tool doesn't perform roll-back operations automatically. You can always do them by writing another migration that goes forward to the past.

I've done that too, and agree re: rollbacks... I've also never been a huge fan of the fact that the final, in-production server isn't ever really reified as a source file. What you have instead is the initial state and a bunch of migration files. It works, and it's the most reflective way of how the final schema got to that state, but it's not the final state. I guess that's the trade off of the relatively unsophisticated approach taken by tools like Flyway.

elken7y ago

I'm a fan of liquibase. It's a very mature tool which allows you to run it every time the application starts, useful for development to create in-memory dbs. Or you can just call it as part of the build process if you don't want to check the DB changelog tables on startup.

It tracks changes that have been applied in changelog tables so you don't apply them multiple times.

Common operations such as adding a column are defined in a supported markup language but for more complicated things, such as migrating data, you can reference ad-hoc SQL files. All of which can be checked-in to your codebase.

The only real downside is because you are applying the entire development history of the database, you can sometimes be caught doing illogical things like adding and then removing something later on. This can be mitigated by rewriting history if needed though.

Flyway was not as good for me because it lacked a domain language, raw SQL makes it less easy to interpret but I know developers who preferred that.

freedomben7y ago

> HN is the new StackOverflow?

Could you clarify what you mean by this?

smoyer7y ago

This is actually a good place to ask important questions that will be closed by the moderators there.

1 more reply

abetlenOP7y ago

Thank you, the Fowler article is very helpful. Does your team have someone act as a DBA to oversee each migration?

smoyer7y ago

We differentiate between DBas and DBAs (administrators verses architects). Our team has DBAs who make sure what we're doing is sane and more importantly tune SQL emitted by ORMs as needed.

hestefisk7y ago· 6 in thread

I hand code my migrations for Postgres. When starting a project a create a base.sql file. Then for first database change I create db/000001-<name of change>.sql, db/00002 [...] and so forth. When I want to create a new database I just run psql<base.sql followed by psql<db/*.sql. Very simple but extremely effective. I don’t think any tool can handle migrations better than hand coded sql. Sometimes I use PL/PgSQL to handle procedural data migrations, which cannot be solved using normal sql.

mattmanser7y ago

If you think any tool can't handle this better than you are woefully misinformed.

I was doing this method back in 2006, let me assure you that the automated tools available today make this look like amateur hour.

Automatic schema transactions, automatic up/down scripts, automatic detection that a schema change hasn't been committed. Automatic change tracking table in the db. Ability to seed a database with test data.

With a single command I can go back to any arbitrary point of changes, switch branch, work on a bug fix, switch back to dev and the next time I fire up the Dev site it'll automatically redeploy the new changes.

Just streets ahead of hand coded scripts. Yes, there's a learning curve, yes you have to pay attention to your design still, but it is so good.

I was skeptical at first, but honestly I'm never going back and I don't miss it. I'm using Entity Framework Migrations, I assume other languages have similar/better tools.

hestefisk7y ago

EF Migrations works pretty well if you use the database as a dumb object store. If you need anything remotely advanced than that, eg functions, triggers, custom types or materialised views, then it doesn’t gel at all. Raw sql is the only way to go. To me, the database is an API, not just s dry store for OO.

ec1096857y ago

How do you know which ones have already been run on the environment you are targeting?

hestefisk7y ago

I keep a simple table with the latest version run and a time stamp. Then I check in each file whether its version# (in the filename) is less or greater than what’s in the table. It’s easy because I can do it in simple sql.

tylerrobinson7y ago

Have you experienced any problems with how long it might take to run the whole list of incremental changes?

hestefisk7y ago

No never. It is as fast as you can pipe data through to psql.

theclaw7y ago· 4 in thread

We use dbup[0]. Its philosophy[1] is that you should treat your DB changes like code changes in source control, and only perform up migrations. I agree with this. We previously spent a lot of time and effort writing down migrations that were never, ever used. If you need a down migration, take a backup before running your up migrations.

We're a C# shop. Our DB migration scripts are simply named with a datestamp and a change description. The scripts are added to a console exe project as embedded resources and everything is committed to source control. These exe projects are very thin and contain a single line of C# code that passes the command line arguments directly into an in-house DB deployment library which is installed via nuget. This library handles command line parsing and executing dbup in various ways, using the scripts embedded in the calling assembly.

The result is a simple executable file that, depending on its command line, can upgrade the DB directly, preview which scripts would run against a particular DB instance, test the DB deployment against a clean SQL Server LocalDB instance, or generate SQL scripts for DBAs.

One nice feature is that the exe can also return the connection string of the SQL Server LocalDB test instance to external code as a test fixture. We can use this to directly unit test our repository types against a real, clean database. When the test fixture is disposed by XUnit, the temporary DB is cleaned up and removed.

The console projects are built and tested as part of our CI builds, then pushed to Octopus Deploy[2] as packages. The Octopus deployment process simply runs the executable and passes it the connection string of the DB to update.

[0] https://dbup.github.io/

[1] https://dbup.readthedocs.io/en/latest/philosophy-behind-dbup...

[2] https://octopus.com/

brlewis7y ago

>If you need a down migration, take a backup before running your up migrations.

Aren't you missing some steps? Or does your db back up schema structure and data separately?

  1. Make sure nobody can insert/update/delete anything
  2. Take a backup
  3. Run your up migration
  4. Thoroughly test everything
  5. Allow insert/update/delete again

theclaw7y ago

That is our upgrade process, yes.

keithnz7y ago

a lot of time writing down migrations? why was it so hard?

I use https://fluentmigrator.github.io/ and also used dbup..

I generally like the up/down better, and both the up and down is remarkably trivial to write generally, the downs are useful during development you might change your mind about the DB structure. Never used a down in production.

theclaw7y ago

We used to do this but found it was error prone since the down migrations would have bugs and wouldn’t always catch everything. We have dedicated SQL devs that write the migrations by hand and so moving to generated code is a slow and painful process. We now encourage developers working on the DB to thrash out their design in a local SQL Server Express instance before baking their changes into a migration script that will be automatically run into our dev env by octopus. If the scripts don’t run or the tests against the temporary DB fail during build then they never get to our shared dev DB.

1 more reply

louthy7y ago· 4 in thread

We use the RedGate SQL compare tools [1] to compare our new schema to our old one and auto-apply the diffs to the production DB (this is done automatically by our deployment process).

To reduce the chance of error we don’t destroy columns or tables.

Our application then has an update step which runs on startup for any data migrations (or new data additions), and then updates a version number stored in the DB. The data migrations are super rare.

This all means we can migrate from any past version to our latest one: because we have all previous schemas stored in git and n update functions in our app that can walk the versions

It’s worked reliably for over a decade and is pretty much entirely pain free

[1] https://www.red-gate.com/products/sql-development/sql-compar...

atwebb7y ago

I've got a fair amount of experience with it and the Change Automation tool they have. Both are fantastic products, the latter for certain scenarios over SSDT itself. Sounds like have had it sorted for a while, my main concerns are when it wants to rebuild certain tables when a simple sp_rename or other step would be sufficient.

teddyuk7y ago

This is such a common complaint!

beart7y ago

> To reduce the chance of error we don’t destroy columns or tables.

Does this mean you have a lot of unused tables and columns deprecated in the database?

louthy7y ago

No. It’s very rare that columns or tables are deprecated. It’s a very mature web app that basically just grows.

alanfranz7y ago· 3 in thread

What's your precise problem?

Migrate in a backwards compatible manber, so that version N of the app works with N+1 schema (eg add a column, but don't destroy existing ones, use triggers to keep data aligned). When all nodes for an app are are at N+1, you can make a new version with destructive changes (that would break N but not N+1). There's a Fowler article about this.

abetlenOP7y ago

I'm more interested in hearing about what the workflow is like for developers on larger teams. Do they each work on their own features, write separate migrations, and have a DBA approve and merge them.

evanelias7y ago

For a "very large company dedicated to moving fast" example, here's what the process looked like at Facebook a few years ago. AFAIK same process today, with one improvement noted below.

Background:

* Almost everything is self-service by necessity. Except for some high-blast-radius cases, dev teams are able to manage their own schemas without needing MySQL team intervention. This is made possible by having automation that has appropriate safeties built in.

* There's a repository (git, hg, whatever) storing schemas. It has a couple levels of subdirectories to organize different database tiers and individual databases. In each of the bottom-level subdirs, there are text files containing CREATE TABLE statements, one file per table. In other words, this is a declarative repo, modeling the ideal state of tables in each database.

Process to add or change a table:

1. Just add or change a CREATE TABLE statement, and commit in SCM.

2. Submit a diff (pull request). Someone on your team reviews it, same as a code review.

3. Once merged, the schema change can be kicked off. (A few years ago, a dev would need to run a simple CLI command to tell the automation "please begin working on this table", but I believe this has been automated away since then.)

The tooling automatically manages running the correct DDL safely, on the correct machine(s), even in the case of a large sharded table. Devs never need to write ALTER TABLE statements; everything is just based on CREATE TABLE.

There was a separate flow (with extra steps, on purpose) for destructive actions like dropping tables or columns.

1 more reply

notJim7y ago

When I worked at Etsy (a couple years ago now, so this is out of date), devs wrote `ALTER` statements and included them in a ticket. Once a week, the DBAs would run all the migrations. The only real things I remember worrying about was making sure to set a default if you were adding a column, and if you were changing a large table, it'd take a long time.

My current company uses mongodb, so migrations aren't a thing. It's pretty nice, TBH.

archarios7y ago· 3 in thread

At my previous job we either didn't migrate and wrote application level transforms that would update records as they were encountered by users (mongodb) or we had a custom built migration system that ran JavaScript snippets on our shards. The migration system was miserable to work with and it was hard to debug the code on stage in such a way that would allow us to anticipate whatever might be on prod...

ezekg7y ago

This sounds like a nightmare. Why?

VectorLock7y ago

>(mongodb)

archarios7y ago

This is a fairly typical approach minus the pain points of our specific migration system for Mongodb as far as I know. We favored application level transforms for the sake of safety and speed. Doing that obviously gets really bothersome as time goes on though.

Myrmornis7y ago· 3 in thread

Django. Django generates the schema migrations automatically from the changed table schema definitions that the developer used when testing their branch, and those rarely cause problems. Data migrations need to be tested against staging DBs with realistic data. But neither is really a major pain point: individual developers create and commit the migration files while preparing their branches for review.

nickserv7y ago

Only major gotcha is when renaming fields: it'll drop the old field then create a new field, by default. You have to do the renaming by hand, trivial but can't be forgotten... Which is why you always test on staging first of course.

haihaibye7y ago

Makemigrations detects if the type is the same and asks you if you renamed it.

So, when changing columns do the type change and name change in separate migrations and you won't have to do it by hand.

djstein7y ago

really the best part about Django

gigatexal7y ago· 2 in thread

By hand and with careful consideration. Nope wait we use Alembic. It’s actually pretty good. I like the notion of not doing any data destroying migrations. For example if you are adding a column that replaces a different one keep them both. Then at a later time when no code paths touch the old column and that can be proven drop the column that was deprecated. It’s safer that way. But I’ve not seen this done in practice just something I’ve been thinking about.

abetlenOP7y ago

We use Alembic as well and it does make the process easier. I like the suggestion of not applying destructive migrations immediately.

Just out of curiosity, has your team used Alembic's branching system with any success?

gigatexal7y ago

Is branching where it forks and then comes back? I'm still a bit new to the process but if that's the case then yes. The team is still cool to the idea of my way of doing migrations they tend to do the destructive ones and call it a day albeit with the commonsense to do backups for truly data destroying things like dropping of columns or redefining of columns.

sakopov7y ago· 2 in thread

At my .NET shop we use FluentMigrator [1] which allows you to just run up/down migrations. Our deployments are completely automated and, in all honesty, while this has been a great and a painless way to manage migrations there really is no silver bullet. Minor releases are dead simple. Larger releases require quite a bit of planning to make sure things are deployed in correct order and SQL updates are backwards-compatible (at least during the deployment window) so that we don't incur downtime or completely break things. The only time we ever had to roll back was when one of these large releases deployed a non-backwards-compatible SQL script which caused us a lot of bad data during 5 minute deployment window. So, this is usually a good motivating factor to try to deploy slim releases as frequently as possible. And if you do that you'll appreciate it when your DB migrations become boring and mundane things that just happen on auto-pilot.

[1] https://github.com/fluentmigrator/fluentmigrator

Merad7y ago

How do you apply your migrations? I was setting up a project a few weeks ago and reached for FluentMigrator as it used to be my go to tool... but apparently the command line tool is now deprecated and they recommend writing custom code to do migrations. I eventually decided to give EF Core migrations a try, which is working pretty well so far.

sakopov7y ago

We use the CLI you're referring to on ci/cd servers. Frankly, I had no idea it was deprecated and couldn't find much on this after a little bit of googling. Would mind sharing a link?

1 more reply

aerojoe237y ago· 2 in thread

.net has entity framework an ORM has has migrations. It isn't problem free, but we found workable solutions to our issues.

By default EF will throw an error if the model (in code) and the database are out of sync. There is a setting you can set to tell the ef not to care about the version, but then you have to take the responsibility of making sure the old code will run against the new db version.

Most of our stuff is adding a new column or table so the old code doesn't rely on it and everything goes fine anyway.

sebazzz7y ago

Yes, we have used that for a bit and for the most part this works fine. We had however a case where several migrations were not applied in production, and that is a mess to figure out.

But the development experience, especially at the start of the project (if you don't maintain your seeding code), is awesome. You just point the project at an empty or non-existing database, the schema is automatically created, and any necessary base data can automatically be seeded.

ozim7y ago

But that is most important part of migrations to throw that error so you are aware of it. First thing you fix your db and then push to test server, and you never have broken db on production, NEVER.

1337shadow7y ago· 1 in thread

It doesn't matter what tool you use, as long as you have automated migrations as part of the automated deployment process.

A lot of the implementations look like this:

create a migrations directory; add an initial migration script in it; make a migrate command to execute before service starts but after the backup.

The migrate command recipe: create a migrations table in the db if it doesn't exist, otherwise fetch the list of migrations that have been applied inside this database; then, apply the initial migration script if it's name is not found in the said migrations table, and insert its name in the migrations table so that it will not be executed again in this database by the migrate command.

Bjartr7y ago

We have an ant script that does almost exactly this

rocgf7y ago· 1 in thread

For Java projects, the most common one tends to be Flyway, in my experience. There's also Liquibase that I've heard of, but never used.

Flyway is okay in my experience. Can't complain about it, but I can't praise it either, it just does what you'd expect.

cdepatie7y ago

Liquibase is pretty solid, my company has used it across a couple of projects. Similar to your Flyway experience, it's not anything particularly amazing but it works.

frankwiles7y ago· 1 in thread

Django migrations it’s a truly great tool.

Myrmornis7y ago

Yes it is. (Except when they made makemigrations be like O(2^N) or something in the number of models around 1.9).

jakswa7y ago· 1 in thread

Depends on the database, in my opinion. Each one has quirks to be mindful of, depending on the amount of data you're migrating/touching.

For example, if you accidentally put a 'default' on a column when you add it to postgres, it will lock the entire table while it rewrites every row, inserting that default value.

Another common postgres blunder is creating indexes on big tables without using 'concurrently'. This also locks the table for writing and you'll have a bad time.

adwf7y ago

Just for reference, as of PG11 it no longer locks or rewrites for default column creation.

It's one of the nice things about Postgres, they're always improving. A few years ago, concurrent index creation wasn't a thing either. Nor were "CREATE/DROP IF (NOT) EXIST" statements for various bits and pieces, but they just keep adding to it over time.

In the upcoming PG12 there is REINDEX CONCURRENTLY, so we'll finally be able to fix corrupt/bloated indexes on the fly without having to drop and recreate them.

snicker77y ago· 1 in thread

My team works with five database systems containing more or less the same data because we "switch" to a new DB every year without investing enough in migrating off the old one.

edoceo7y ago

It's "job security" ;)

stock_toaster7y ago· 1 in thread

We use migrate[1].

[1]: https://github.com/golang-migrate/migrate

lovetocode7y ago

I second this. migrate works great for me.

wesleytodd7y ago· 1 in thread

If anyone is looking for a javascript specific solution, I maintain a module originally written (and abandoned) by TJ.

$ npm i migrate

https://www.npmjs.com/package/migrate

smt887y ago

Why does it matter what the tool is written in?

If anything, I'm a lot less likely to use something written in JS when correctness is absolutely paramount.

thrower1237y ago· 1 in thread

SQL scripts. It is painful.

nickserv7y ago

The horror. I actually had a dev tell me he didn't trust doctrine (symfony project), so would do it all in SQL. Thankfully the project was relatively new so it was possible to rewrite everything using migrations.

wutman7y ago· 1 in thread

This doesn't fit everywhere, but... 1) Never modify existing tables (okay, almost never). 2) Add new tables at will. 3) Pay a consultant to write PL/SQL that glues it all together and blame them for any and all issues.

teddyuk7y ago

3) ha ha

barrkel7y ago

We use Rails migrations, which means incremental deltas rather than automatic derivation of the result. Usually, our more interesting migrations include data transformation, and not merely adding / removing tables, columns and indexes. For example, perhaps a field needs to be denormalized on one of its associations to speed up queries, or a 1:1 relationship now needs to become 1:n. I think it's less easy to build declarative tooling around this.

We have a tenanted architecture, so we need to run the same migrations on lots of different databases of wildly different sizes and data complexity. We test migrations by restoring a snapshot of a sample of customers and running the migration through, and some smoke tests before and after that can be compared.

More recently, migrations are becoming a release bottleneck. That is, they take too long to run within a downtime window (fintech, so Sunday afternoon is when releases go out). We're looking at building tooling around Percona OSC, and using triggers to maintain data invariants for data migrations until code is able to catch up.

Migrations aren't what really slows us down though. Instead, it's data volume, and the difficulty in writing efficient queries while also providing a flexible UI experience.

GlennS7y ago

I'm skeptical of tools that claim to handle schema migrations automatically or declaratively. I've had them fail in the past at critical moments. I also don't believe that they can handle the most important cases.

I think that numbered sequential migration steps are still the correct way to go. They should be written at the time the feature/change is developed, ideally by the person doing that development.

Whether they are in SQL or some sort of ORM thing probably doesn't matter that much.

I don't know how to handle branching/merging issues with this approach beyond "pay attention, be careful, talk to your colleagues about database changes you're making".

At my first company we stored the whole schema instead of the incremental steps to get there. We used SQL Delta to synchronize the real database to what we wanted.

SQL Delta is a good tool, but one time we ran into a case it couldn't handle during a big deployment at a large hospital and had to abort and roll everything back. Very embarrassing. We switched to incremental scripts after that.

SQL Delta is still a good tool if you have lots of views, user defined functions, or stored procedures. As long as they aren't needed for the migration itself, you can just synchronize all of those at the end.

It's also a great tool for actually inspecting if the result of your migration is what you expected.

I do like the idea of having easy access to the full schema text as well as the migrations (someone mentioned that Rails does this). Ideally I think this should be generated from the migrations and not checked into version control.

darkr7y ago

We wrote a tool that pulls in several open source tools, the most important of which is sqitch[1] - a beautifully simple Perl based tool for managing migrations that supports rollbacks and verifications - all written in SQL.

All hooked up into CI, as well as being available from the CLI for developers, so they can open a PR and get feedback, as well as seeing what locks are required for a given migration.

1: https://sqitch.org/

config_yml7y ago

rails db:migrate

It’s always one of the first things I miss when I have to work on a non-rails codebase.

tln7y ago

Migrate during continuous deployment.

We use a staging environment, so CD deployed migrations are always run at least once before running on production.

Migrations have to be written so that currently deployed code will work after the migration. Eg, to rename a column, add+copy, deploy, then in a second migration drop the column.

sylvain_7y ago

There is a great writing from Stripe which explains their process for database migration : - Dual writing to the existing and new tables to keep them in sync. - Changing all read paths in our codebase to read from the new table. - Changing all write paths in our codebase to only write to the new table. - Removing old data that relies on the outdated data model.

https://stripe.com/fr/blog/online-migrations

sebringj7y ago

The one pain is having to manually create a cut off period for migration scripts, meaning when to start fresh from a single schema and restart the migrations again. This is basically free food for some incubator at HN so if they are going to make some breakthrough product, make it easier to not ever have to worry about that and it will be worth me posting this.

Foober2237y ago

On my team we use numbered sql scripts. We restore a production db on our local boxes. This is the starting point. The numbered scripts are executed against this. So the actual deployment is getting tested out locally all the time. This is dead reliable, and handles tricky data transformation that would fail under a declarative/calculate approach.

bahador7y ago

I've used Flyway[0] at my current and previous shops. It seems to work well.

[0] https://flywaydb.org/

GordonS7y ago

Using DbUp, which is basically a library that you use to wrap SQL scripts in a console app.

This gives you full control over your migrations, while still being incredibly simple - it's just SQL!

For views, sprocs and functions, we don't use sequential migration scripts, instead opting for idempotent creation. This also means it's easy to see the evolution of views etc when looking in source control.

In the past I've used Entity Framework and Entity Framework Core migrations, and hated them both - I'll never be a fan of codegen, but aside from that they sometimes generated wrong code that had to manually adjusted, and you also quickly end up with hundreds of scripts.

I like DbUp very much.

was_boring7y ago

I don't think there is a way to change the database without slowing down development. After all, the data is the most valuable thing you have.

Besides that, I tend to follow the same general principals RainforestHQ presented in 2014 https://www.rainforestqa.com/blog/2014-06-27-zero-downtime-d...

tluyben27y ago

We have been doing migrations with our internal ORM for many years now; first it was written in Perl, then ported to Java, later ported to PHP, after that ported to C# (past ~10 years) and it does up/down migrations with safe guards in place. It (the underlying algorithm/heuristics) has been working fine for around 20 years on around 1000 projects (client consultancy mostly) in that time. Lessons learned (opinionated/IMHO, no sweeping wisdom attempt); a) database changes should be in code so there is no discrepancy between code + db b) your tooling (in this case our ORM) should figure out how to get from the current state to the new state; it should not be a manual job (because you will make mistakes) c) migrations including rollback should be fully automated (we only had issues when people tried to do clever things manually) d) have common sense safe guards in place; do not try to be too clever to save a few MB's of disk space; for instance, deploy will fail if you attempt to remove a column; you can only do that manually.

stephenr7y ago

We used to use dbschema (https://pypi.org/project/dbschema/) to apply migrations generated by SchemaSync (http://mmatuson.github.io/SchemaSync/) using a helper script to simplify capturing a migration from a known base of previous migrations.

After submitting a number of bug fix and feature patches to the upstream projects I ended up writing a (IMO better) tool to apply the migration scripts (https://bitbucket.org/koalephant/mallard). We currently still rely on SchemaSync for that part, but it’s been more reliable and it’s ultimately a tool for developers who review the generated sql anyway - the tool to apply them needs to run automated on remote environments.

Guillaume867y ago

.NET/MSSQL shop here:

Our source controlled files contain the database definition:

- a dacpac file generated by SSDT for the schema

- sql scripts for seed data (which changes a lot in our case), generated by a custom CLI tool (uses bcp.exe).

We have a "dev" database that is basically the "master". We have a CLI tool to generate the database definition files from the dev database. When someone make/need changes in the database, he makes them in the dev DB then call the CLI tool to update the source files.

When publishing a version the dacpac and data script are included. Migrations are created on demand (no need to go through each version, skipping versions is common). Our migration tool create reference databases of the source and target versions, then generate a schema diff script with SSDT and a data diff script with ApexSQL Data Compare. We can review/editthe migrations scripts before applying migrations.

It works well enough that we run automated migrations at night (with backup before/restore after in case of issue).

sqldba7y ago

Scripts.

And I hate to tell you but despite what you’ll read on HN every single company I’ve seen and every single vendor I’ve dealt with - is using scripts.

Just simple scripts, in source control, amended by hand, lord hopes applied as part of a CI system, often without any kind of version numbering (so once it’s in the database you don’t know where it came from) and with no rollback.

Declarative migrations do exist. If it’s in .NET then it’s invisible to me, but I imagine it’s pretty rare. Using SSDT (Microsoft) for tooling is possible but has so many edge cases and is poorly documented so coming up with an architecture requires the kind of expert where there may literally only be a dozen in the world, so it isn’t done outside of demos that fail and get thrown out the second you’d ever try to implement them.

bungie47y ago

I recently had to update 3 db machines with the updated schemas from one, and the data from a third. Then copy the finished version over to the first and finally the third.

Turns out Visual Studio has a 'diff' generator for both schema and data. Holy hell that worked the treat.

namelosw7y ago

We use fluentMigrator for .Net, but basically the workflow is similar to Rails migration. It is working fine and did not slow us down.

What slows us down and related to this topic is cross-database migrations. We have a bunch of microservices in different codebases. Sometimes we need to adjust bound contexts which move the responsibilities from services to services, we found it's much harder to migrate between different databases behind those services. It's either hard to keep track in the same repo and/or too slow to migrate large volume of data across hosts. In the end we wrote some optimized SQL to do that. Maybe monorepo could help but this doesn't happen too much, and we haven't start trying it yet.

tshanmu7y ago

we have had good success with liquibase[0] and flyway[1].

[1] http://www.liquibase.org/ [2] https://flywaydb.org/

systematical7y ago

We use https://github.com/cakephp/phinx works pretty well. It just runs on deployment. Thankfully I've never had to try using the rollback feature.

sheeshkebab7y ago

- Flyway for relational databases, and

- flyway-like go based homegrown process for Cassandra (that includes support for executing go scripts for data migrations etc)

Above work pretty well - for many, many microservices and continuous deployments all the way to prod.

tylerFowler7y ago

I feel like Terraform would be a fantastic tool for this but never found the time to look into what it would take. Less so for complex data changes maybe (or maybe not) but for things like basic schema changes certainly.

sebazzz7y ago

We use RedGate SQL Source Control. However, we also deploy our applications using MSDeploy and SQL Source Control has no good solution for that. We stitch the migration scripts together manually. Also, no matter how good you control it, your database gets out of sync with the migrations after a while. We also have an integration test which restores a base database and applies the migrations of the current branch on top of it.

We're going to research DACPACs to deploy databases. We hope it will be better.

speedplane7y ago

Is this question asking how to move from one schema to another, or one database backend to another? Most answers here focus on schema changes but I'm not sure that was the question.

Schema changes are relatively straightforward, there are plenty of tools that can help. Changing the database backend is a different story altogether. There are so many unknown unknowns when changing databases, that it's generally best to avoid it if you can.

lmiller19907y ago

I work at a Perl shop and we use Sqitch [1]. It helps manage your migrations, you write three raw SQL scripts per change: deploy, verify and revert. The tool works with any DB or backend, since it only does the schema.

It does NOT integrate with your server framework so you need to figure an ORM solution out, independent of Sqitch.

[1] https://sqitch.org/

ajcodez7y ago

I wrote my own schema as code library and then switched back to normal migrations with no down method. It takes some care to not delete anything for a while until it’s definitely not needed ever again.

https://www.ajostrow.me/articles/how-to-run-migrations

quickthrower27y ago

Previous shop we used entity framework migrations in .net. So migrations are applied by octopus deploy.

This place we do it manually but because of how the app is architected migrations are rare. If we migrate we try to make the code compatible with before and after schemes.

based27y ago

https://github.com/goldmansachs/obevo

https://www.schemacrawler.com/

wenbin7y ago

Django and other modern web development framework handles db migrations pretty well.

$ python manage migrate

paulddraper7y ago

We use dbmate [1].

SQL migration management with in-database version tracking.

Super simple, but works for us.

[1] https://github.com/amacneil/dbmate

pier257y ago

We use https://github.com/rubenv/sql-migrate for Go

You basically write SQL queries for up and down.

ilaksh7y ago

For my current project I have not needed to migrate data.

I have a system called timequerylog and a tool tql-cli where data is just appended to JSONL files organized by key, date, and time.

gregmac7y ago

Ohh, this is a favorite topic of mine and I'm of the opinion there's no clear best solution (possibly good market opportunity here), only a series of trade-offs.

----

In one project, we use DBGhost (for MS SQL Server). It's like RedGate SQL Compare, but in deployable, self-contained executable form. It does a complete schema sync, so internally we run on every build, and externally can upgrade from every previous release, without the pesky "in between" evolution you tend to get with ordered migration scripts. It's run as part of our upgrade process for every app version, and our deploy package is built from the 'initial database create' SQL script in source control.

To make a schema change such as adding a new column, you modify the relevant `CREATE TABLE` script, commit it to source, and that's it.

We also use a hand-built pre- and post-deploy script to do anything tricky that can't be automated (renaming a column, copying/converting data from one column to another, etc). Importantly, these scripts are idempotent (eg: `if (old_column_exists) { start transaction; create new column; copy+transform data; drop old_column; commit transaction; }`). We generally avoid major changes like this as much as we can, but it's possible when necessary. We also have DBGhost configured not to drop anything (to avoid mistakes, or dropping customer customization that they do even though our support contract explicitly says not to), and instead write those by hand.

This process has been in many dozens of customer-facing releases -- including on-premise customers -- for several years, and 'just works'. DBGhost is a clunky app, but our interaction with it is now entirely scripted (via either build or deployment scripts), and the compare engine part of it is actually very good.

----

In another pretty simple app (that runs on our cloud infrastructure only, with 'production' deployments mirrored in a couple different regions, and usually a single 'dev' deployment though sometimes there are other temporary ones) we opted to only use an idempotent, hand-written script, executed as part of deployment. It has `create table if not exists` statements to setup from scratch, and it also has the transforms for changes done over time (`create index if not exists;` `drop column if exists` etc). We periodically remove the transform statements to clean it up after all deployments have been updated past that point.

Even though it's manual, it's actually quite easy to maintain, so long as you're careful about ensuring it's idempotent. The nice part is it typically gets deployed to the dev infrastructure multiple times during the course of working on something, so if there's a problem it becomes obvious very quickly when the deployment fails.

----

There's also another app which uses the more traditional ordered migrations files, which I liked at first but over time I find it annoying. Deploying from scratch installs a really old initial database schema, then proceeds to apply dozens of modifications, many of which overwrite previous modifications.

----

I've also worked on an ordered migrations file app where there was a separate 'create' step for the initial creation (instead of running through each migration). The first time I deployed from scratch I found a bunch of problems where the create script wasn't 100% synchronized with the migrations, but also was synchronized enough that it wasn't possible to 'fix' by running each migration (one of the early ones failed). The only fix was to go through by hand and manually run the relevant migration files. I'm sure there can be better practices to help prevent this (eg some compare step on build), but this happened years ago and still sticks with me as a terrible way to do migrations.

ben_jones7y ago

Poorly

mbostleman7y ago

SSDT / DACPAC

j / k navigate · click thread line to collapse

146 comments

124 comments · 55 top-level

liyanchang7y ago· 18 in thread

I've been really happy with how my current company[0] has been doing migrations and I've seen a couple others do it but it seems like it should be more widespread.

Database Schema as Code

From there, we run it through a safety check. Any unsafe migration (either for data loss or performance issues e.g. DROP TABLE) requires an extra approval file.

DML, that is data changes, are handled via standard check in a sql file and CI will run it before the code deploy and after the schema migration.

Alembic is the one other place I've seen this concept (a couple others mentioned this) so it's not new, but surprised I haven't seen it more places.

[0] Shameless plug: We're hiring if you're interested in changing how healthcare is paid for, delivered, and experienced. https://www.devoted.com/about/tech-jobs/

evanelias7y ago

[1] https://skeema.io

[2] https://sendgrid.com/blog/schema-management-with-skeema/

liyanchang7y ago

Git would never have worked it required devs to write the up/down patches - why should we have to write the up/down migrations for my schema?

Excited to see more tooling around declarative schema!

meowface7y ago

2 more replies

nickserv7y ago

Curious, how do you deal with renaming fields or tables?

This is a (minor) pain point for traditional migration systems.

2 more replies

barrkel7y ago

The concept appeals to me, but it only seems to work for trivial migrations.

liyanchang7y ago

I’ll openly admit that we don’t have everything figured out. You’re absolutely right that currently, we constrain ourselves on what we migrate to admittedly simple migrations.

evanelias7y ago

1 more reply

PaulMest7y ago

I am guessing that you are probably not using Python/Django... but is this any different than what Django offers?

There are also tools that will take an existing database and generate the Django models for you.

All of these operations can also export the exact SQL that will run on each supported RDBMS platform in case you want to be extra sure on what exactly will be executed.

nickserv7y ago

A system like Saltstack or Puppet for databases would not have checked in migrations, these would be generated on the fly at deploy time.

So you could very well have multiple state changes in a single run, by comparing actual DB state and desired DB state, then creating the SQL code as needed for that deployment.

Honestly not having to fiddle with the migrations table on a live server seems pretty nice ;-)

This could very well turn out to be Django's next gen migration tool...

2 more replies

evanelias7y ago

That's great initially, but problems definitely crop up at scale:

* Depending on the DBMS, with large tables and certain types of ALTERs, using an external online schema change tool is greatly preferable to just running an ALTER directly.

* Does Django support sharding / the notion that a change to a sharded model must be made to multiple shards?

1 more reply

henning7y ago

Prepare for the exciting future of DevOps transformation: it's `./manage.py makemigrations`.

daoxid7y ago

liyanchang7y ago

That's correct. The DML changes as part of CI are somewhat new so we haven't ironed it all out yet.

Here's the scenario that I think you're laying out: 1. Commit A creates column foo 2. Commit B has DML that reference column foo 3. Commit C removes column foo

This works fine if our CI deployer does each commit individually. First roll out any schema changes, then run any DML SQL.

1 more reply

wvenable7y ago

Can that handle column renames? Most schema-to-schema diff tools can't tell the difference between a rename and a delete/add.

liyanchang7y ago

I’ll openly admit that we don’t have everything ironed out. In fact my next big project is to tackle derived columns (rename is a column where the transformation is the identity function).

It requires a bit more finesse and integration into our code base as it requires multiple deploys with code that knows how to handle both columns.

djrobstep7y ago

Strong agree. You can do much better than than Rails/Django migrations.

I have been advocating for this approach for a while now: https://djrobstep.com/talks/your-migrations-are-bad-and-you-...

safog7y ago

atsaloli7y ago

> This is just how the industry started managing server configurations (Puppet) ...

Yes, and CFEngine pioneered this in 1993 (Mark Burgess).

I make my living as a CFEngine consultant.

jolmg7y ago· 7 in thread

Would be cool to have git for databases.

"oh no! our migration deleted columns without re-representing that data in the new manner, and our users have already done changes to the database so we can't simply restore from a backup!"

quick!

  dbgit checkout -b fixed-migration before-original-migration
  # *run fixed migration*
  dbgit rebase --onto fixed-migration after-original-migration master

day saved!

If only it were so simple.

ben5097y ago

The main issue is you can't "reverse migrate" creating a table or adding a column as that deletes prod data. So your base tables are always forward migrated, and probably have only basic constraints.

Then the system can do a checkpoint and delete obsolete columns and tables, moving forward the earliest version you can roll back to.

Of course, that all requires good updateable view support, and I don't think any products offer that.

awinder7y ago

mchristen7y ago

You can run/test your migrations in a transaction and roll back if it doesn't work.

jolmg7y ago

Not all problems are obvious, though. You can have an app work 100%, your database be completely coherent, and then realize some of the data is missing.

I can go completely wild with git, deleting random files, overwriting others with random junk, sharing them to my coworkers, etc. It would only take a few minutes to fix that.

1 more reply

jyounker7y ago

This isn't always true. In MySQL many DDL statements can't be rolled back.

See https://dev.mysql.com/doc/refman/5.7/en/cannot-roll-back.htm... for an example.

baq7y ago

sometimes, maybe, if you happen to run the right kind of DBMS...

whitepoplar7y ago

isn't this what datomic does?

smoyer7y ago· 6 in thread

[0] https://www.martinfowler.com/articles/evodb.html

[1] https://flywaydb.org/

[2] http://www.liquibase.org/

mschaef7y ago

elken7y ago

It tracks changes that have been applied in changelog tables so you don't apply them multiple times.

Flyway was not as good for me because it lacked a domain language, raw SQL makes it less easy to interpret but I know developers who preferred that.

freedomben7y ago

> HN is the new StackOverflow?

Could you clarify what you mean by this?

smoyer7y ago

This is actually a good place to ask important questions that will be closed by the moderators there.

1 more reply

abetlenOP7y ago

Thank you, the Fowler article is very helpful. Does your team have someone act as a DBA to oversee each migration?

smoyer7y ago

We differentiate between DBas and DBAs (administrators verses architects). Our team has DBAs who make sure what we're doing is sane and more importantly tune SQL emitted by ORMs as needed.

hestefisk7y ago· 6 in thread

mattmanser7y ago

If you think any tool can't handle this better than you are woefully misinformed.

I was doing this method back in 2006, let me assure you that the automated tools available today make this look like amateur hour.

Just streets ahead of hand coded scripts. Yes, there's a learning curve, yes you have to pay attention to your design still, but it is so good.

I was skeptical at first, but honestly I'm never going back and I don't miss it. I'm using Entity Framework Migrations, I assume other languages have similar/better tools.

hestefisk7y ago

ec1096857y ago

How do you know which ones have already been run on the environment you are targeting?

hestefisk7y ago

tylerrobinson7y ago

Have you experienced any problems with how long it might take to run the whole list of incremental changes?

hestefisk7y ago

No never. It is as fast as you can pipe data through to psql.

theclaw7y ago· 4 in thread

[0] https://dbup.github.io/

[1] https://dbup.readthedocs.io/en/latest/philosophy-behind-dbup...

[2] https://octopus.com/

brlewis7y ago

>If you need a down migration, take a backup before running your up migrations.

Aren't you missing some steps? Or does your db back up schema structure and data separately?

  1. Make sure nobody can insert/update/delete anything
  2. Take a backup
  3. Run your up migration
  4. Thoroughly test everything
  5. Allow insert/update/delete again

theclaw7y ago

That is our upgrade process, yes.

keithnz7y ago

a lot of time writing down migrations? why was it so hard?

I use https://fluentmigrator.github.io/ and also used dbup..

theclaw7y ago

1 more reply

louthy7y ago· 4 in thread

We use the RedGate SQL compare tools [1] to compare our new schema to our old one and auto-apply the diffs to the production DB (this is done automatically by our deployment process).

To reduce the chance of error we don’t destroy columns or tables.

Our application then has an update step which runs on startup for any data migrations (or new data additions), and then updates a version number stored in the DB. The data migrations are super rare.

This all means we can migrate from any past version to our latest one: because we have all previous schemas stored in git and n update functions in our app that can walk the versions

It’s worked reliably for over a decade and is pretty much entirely pain free

[1] https://www.red-gate.com/products/sql-development/sql-compar...

atwebb7y ago

teddyuk7y ago

This is such a common complaint!

beart7y ago

> To reduce the chance of error we don’t destroy columns or tables.

Does this mean you have a lot of unused tables and columns deprecated in the database?

louthy7y ago

No. It’s very rare that columns or tables are deprecated. It’s a very mature web app that basically just grows.

alanfranz7y ago· 3 in thread

What's your precise problem?

abetlenOP7y ago

evanelias7y ago

For a "very large company dedicated to moving fast" example, here's what the process looked like at Facebook a few years ago. AFAIK same process today, with one improvement noted below.

Background:

Process to add or change a table:

1. Just add or change a CREATE TABLE statement, and commit in SCM.

2. Submit a diff (pull request). Someone on your team reviews it, same as a code review.

There was a separate flow (with extra steps, on purpose) for destructive actions like dropping tables or columns.

1 more reply

notJim7y ago

My current company uses mongodb, so migrations aren't a thing. It's pretty nice, TBH.

archarios7y ago· 3 in thread

ezekg7y ago

This sounds like a nightmare. Why?

VectorLock7y ago

>(mongodb)

archarios7y ago

Myrmornis7y ago· 3 in thread

nickserv7y ago

haihaibye7y ago

Makemigrations detects if the type is the same and asks you if you renamed it.

So, when changing columns do the type change and name change in separate migrations and you won't have to do it by hand.

djstein7y ago

really the best part about Django

gigatexal7y ago· 2 in thread

abetlenOP7y ago

We use Alembic as well and it does make the process easier. I like the suggestion of not applying destructive migrations immediately.

Just out of curiosity, has your team used Alembic's branching system with any success?

gigatexal7y ago

sakopov7y ago· 2 in thread

[1] https://github.com/fluentmigrator/fluentmigrator

Merad7y ago

sakopov7y ago

We use the CLI you're referring to on ci/cd servers. Frankly, I had no idea it was deprecated and couldn't find much on this after a little bit of googling. Would mind sharing a link?

1 more reply

aerojoe237y ago· 2 in thread

.net has entity framework an ORM has has migrations. It isn't problem free, but we found workable solutions to our issues.

Most of our stuff is adding a new column or table so the old code doesn't rely on it and everything goes fine anyway.

sebazzz7y ago

Yes, we have used that for a bit and for the most part this works fine. We had however a case where several migrations were not applied in production, and that is a mess to figure out.

ozim7y ago

But that is most important part of migrations to throw that error so you are aware of it. First thing you fix your db and then push to test server, and you never have broken db on production, NEVER.

1337shadow7y ago· 1 in thread

It doesn't matter what tool you use, as long as you have automated migrations as part of the automated deployment process.

A lot of the implementations look like this:

create a migrations directory; add an initial migration script in it; make a migrate command to execute before service starts but after the backup.

Bjartr7y ago

We have an ant script that does almost exactly this

rocgf7y ago· 1 in thread

For Java projects, the most common one tends to be Flyway, in my experience. There's also Liquibase that I've heard of, but never used.

Flyway is okay in my experience. Can't complain about it, but I can't praise it either, it just does what you'd expect.

cdepatie7y ago

Liquibase is pretty solid, my company has used it across a couple of projects. Similar to your Flyway experience, it's not anything particularly amazing but it works.

frankwiles7y ago· 1 in thread

Django migrations it’s a truly great tool.

Myrmornis7y ago

Yes it is. (Except when they made makemigrations be like O(2^N) or something in the number of models around 1.9).

jakswa7y ago· 1 in thread

Depends on the database, in my opinion. Each one has quirks to be mindful of, depending on the amount of data you're migrating/touching.

For example, if you accidentally put a 'default' on a column when you add it to postgres, it will lock the entire table while it rewrites every row, inserting that default value.

Another common postgres blunder is creating indexes on big tables without using 'concurrently'. This also locks the table for writing and you'll have a bad time.

adwf7y ago

Just for reference, as of PG11 it no longer locks or rewrites for default column creation.

In the upcoming PG12 there is REINDEX CONCURRENTLY, so we'll finally be able to fix corrupt/bloated indexes on the fly without having to drop and recreate them.

snicker77y ago· 1 in thread

My team works with five database systems containing more or less the same data because we "switch" to a new DB every year without investing enough in migrating off the old one.

edoceo7y ago

It's "job security" ;)

stock_toaster7y ago· 1 in thread

We use migrate[1].

[1]: https://github.com/golang-migrate/migrate

lovetocode7y ago

I second this. migrate works great for me.

wesleytodd7y ago· 1 in thread

If anyone is looking for a javascript specific solution, I maintain a module originally written (and abandoned) by TJ.

$ npm i migrate

https://www.npmjs.com/package/migrate

smt887y ago

Why does it matter what the tool is written in?

If anything, I'm a lot less likely to use something written in JS when correctness is absolutely paramount.

thrower1237y ago· 1 in thread

SQL scripts. It is painful.

nickserv7y ago

wutman7y ago· 1 in thread

teddyuk7y ago

3) ha ha

barrkel7y ago

Migrations aren't what really slows us down though. Instead, it's data volume, and the difficulty in writing efficient queries while also providing a flexible UI experience.

GlennS7y ago

I think that numbered sequential migration steps are still the correct way to go. They should be written at the time the feature/change is developed, ideally by the person doing that development.

Whether they are in SQL or some sort of ORM thing probably doesn't matter that much.

I don't know how to handle branching/merging issues with this approach beyond "pay attention, be careful, talk to your colleagues about database changes you're making".

At my first company we stored the whole schema instead of the incremental steps to get there. We used SQL Delta to synchronize the real database to what we wanted.

It's also a great tool for actually inspecting if the result of your migration is what you expected.

darkr7y ago

All hooked up into CI, as well as being available from the CLI for developers, so they can open a PR and get feedback, as well as seeing what locks are required for a given migration.

1: https://sqitch.org/

config_yml7y ago

rails db:migrate

It’s always one of the first things I miss when I have to work on a non-rails codebase.

tln7y ago

Migrate during continuous deployment.

We use a staging environment, so CD deployed migrations are always run at least once before running on production.

Migrations have to be written so that currently deployed code will work after the migration. Eg, to rename a column, add+copy, deploy, then in a second migration drop the column.

sylvain_7y ago

https://stripe.com/fr/blog/online-migrations

sebringj7y ago

Foober2237y ago

bahador7y ago

I've used Flyway[0] at my current and previous shops. It seems to work well.

[0] https://flywaydb.org/

GordonS7y ago

Using DbUp, which is basically a library that you use to wrap SQL scripts in a console app.

This gives you full control over your migrations, while still being incredibly simple - it's just SQL!

I like DbUp very much.

was_boring7y ago

I don't think there is a way to change the database without slowing down development. After all, the data is the most valuable thing you have.

Besides that, I tend to follow the same general principals RainforestHQ presented in 2014 https://www.rainforestqa.com/blog/2014-06-27-zero-downtime-d...

tluyben27y ago

stephenr7y ago

Guillaume867y ago

.NET/MSSQL shop here:

Our source controlled files contain the database definition:

- a dacpac file generated by SSDT for the schema

- sql scripts for seed data (which changes a lot in our case), generated by a custom CLI tool (uses bcp.exe).

It works well enough that we run automated migrations at night (with backup before/restore after in case of issue).

sqldba7y ago

Scripts.

And I hate to tell you but despite what you’ll read on HN every single company I’ve seen and every single vendor I’ve dealt with - is using scripts.

bungie47y ago

I recently had to update 3 db machines with the updated schemas from one, and the data from a third. Then copy the finished version over to the first and finally the third.

Turns out Visual Studio has a 'diff' generator for both schema and data. Holy hell that worked the treat.

namelosw7y ago

We use fluentMigrator for .Net, but basically the workflow is similar to Rails migration. It is working fine and did not slow us down.

tshanmu7y ago

we have had good success with liquibase[0] and flyway[1].

[1] http://www.liquibase.org/ [2] https://flywaydb.org/

systematical7y ago

We use https://github.com/cakephp/phinx works pretty well. It just runs on deployment. Thankfully I've never had to try using the rollback feature.

sheeshkebab7y ago

- Flyway for relational databases, and

- flyway-like go based homegrown process for Cassandra (that includes support for executing go scripts for data migrations etc)

Above work pretty well - for many, many microservices and continuous deployments all the way to prod.

tylerFowler7y ago

sebazzz7y ago

We're going to research DACPACs to deploy databases. We hope it will be better.

speedplane7y ago

Is this question asking how to move from one schema to another, or one database backend to another? Most answers here focus on schema changes but I'm not sure that was the question.

lmiller19907y ago

It does NOT integrate with your server framework so you need to figure an ORM solution out, independent of Sqitch.

[1] https://sqitch.org/

ajcodez7y ago

https://www.ajostrow.me/articles/how-to-run-migrations

quickthrower27y ago

Previous shop we used entity framework migrations in .net. So migrations are applied by octopus deploy.

This place we do it manually but because of how the app is architected migrations are rare. If we migrate we try to make the code compatible with before and after schemes.

based27y ago

https://github.com/goldmansachs/obevo

https://www.schemacrawler.com/

wenbin7y ago

Django and other modern web development framework handles db migrations pretty well.

$ python manage migrate

paulddraper7y ago

We use dbmate [1].

SQL migration management with in-database version tracking.

Super simple, but works for us.

[1] https://github.com/amacneil/dbmate

pier257y ago

We use https://github.com/rubenv/sql-migrate for Go

You basically write SQL queries for up and down.

ilaksh7y ago

For my current project I have not needed to migrate data.

I have a system called timequerylog and a tool tql-cli where data is just appended to JSONL files organized by key, date, and time.

gregmac7y ago

Ohh, this is a favorite topic of mine and I'm of the opinion there's no clear best solution (possibly good market opportunity here), only a series of trade-offs.

----

To make a schema change such as adding a new column, you modify the relevant `CREATE TABLE` script, commit it to source, and that's it.

----

ben_jones7y ago

Poorly

mbostleman7y ago

SSDT / DACPAC

j / k navigate · click thread line to collapse