story

Migrating 1200 databases from MySQL to Postgres (opens in new tab)

psyenix.blogspot.com

250 pointsyupyup8y ago82 comments

82 comments

I wrote an open source tool that migrates between any 2 relational databases.

https://github.com/seanharr11/etlalchemy

This includes automatic migration of the schema, constraints, indexes and obviously data. It leverages fast bulk loading tools like mysqlimport, and PostgreSQL's COPY FROM. It doesn't need any config. Just plug and play.

The tool uses SQLAlchemy to abstract column types, and handles various column type conversions between any flavor of SQL.

The one thing it needs are tests, and scalability support for tables bigger than a few GB!

mkez008y ago

Going to give this a try. Thanks for posting!

icelancer8y ago

Thanks much - going to test it between MSSQL and MySQL in a few days here!

seanharr118y ago

Let me know how it goes...please open issues if you find them!

willcodeforfoo8y ago

I'm working on migrating an app right now (just one MySQL database) but can highly recommend [pgloader](http://pgloader.io/). It has a bunch of built-in and configurable rules to cast different column types to PostgreSQL and the developer has been very responsive in helping me make weird column types work, too.

erjiang8y ago

Pgloader can be very powerful. Despite never having using Common Lisp, I found it easy to add in a custom column conversion functions without modifying pgloader itself. You can write additional lisp files with custom procedures and load them at runtime.

ganeshkrishnan8y ago

If I may ask, what are you reasons to migrate from MySQL to Postgres?

We run MySQL on product and am in the process of de-normalizing it. If I have compelling reasons, I might push for migration to Postgres but I never found any genuine reasons. Uber even migrated the opposite way from Postgres to MySQL (Although I think reddit engineers disagreed with their reasons for move).

lbotos8y ago

+1 for PGLoader. Still wrestling with getting it to work for a conversion, but the experience has been a pleasure so far and surprisingly easy.

sigi458y ago

I like the approach and think something like this is fun but i don't understand the businsess case at all.

I would have migrated MySQL 5.5 to 5.6 first, than cleaned the shit out of it, normalized everything, added a few indezes and suddently realize that there is not much left to do.

I like to work with postgresql but if you have 1,2k dbs which are more or less not claned up and crappy, what are they used anyway?

richdougherty8y ago

I agree. It sounds like they decided to do a cleanup ahead of an investment and not much of a real business case. Your migration path seems much, much simpler - and that's assuming the cleanup is even needed in the first place.

SOLAR_FIELDS8y ago

One thing author did not touch on: since the tables were denormalized-ish in the original MySQL DB, did his application lose significant performance by having to perform the joins for every single query in the renormalized PostGres instance? Or were the DB's small enough and indexed properly enough so it didn't really matter?

Might have been worthy of testing this to see if it was worth it to materialize certain views and refresh them every 15-30 minutes or so.

koolba8y ago

Alternatively did the application speed up after it was normalized? A normalized data set can be substantially smaller which may allow the entire data set to fit in memory.

Also, it's officially "PostgreSQL" or informally "Postgres". Not "PostGres".

Amezarak8y ago

At least in my experience, joins in 100s of GB databases are not a performance problem. I've always been somewhat baffled by the claims I see pretty regularly that joins are a major performance concern. Was this a big deal in the past?

gaius8y ago

Was this a big deal in the past?

In MySQL, yes. It lacked the concept of foreign keys.

1 more reply

berns8y ago

One thing that Postgres lacks is accent insensitive collations. Having clients with databases in Spanish, this is one of the reasons I wouldn't consider migrating to Postgres. I know I can use the unaccent extension, but I consider it a poor substitute to proper collations. I guess this isn't a problem for most people because it's never mentioned.

claar8y ago

It appears work is active (May 2017) in this area, but accent-insensitive collations are still a ways off: https://blog.2ndquadrant.com/icu-support-postgresql-10/

edit: fulafel corrects me below that you can get accent-insensitive collations working using PostgreSQL 10's new ICU collation support. Note that PostgreSQL 10 is currently pre-release.

fulafel8y ago

That post says that they currently work, but there is a new os-independent collation method coming.

Edit re your edit: the already working collations are in PG 9.x - the OS independent collations are coming in 10.x. (Hence the post title, "More robust collations with ICU support in PostgreSQL 10")

1 more reply

fulafel8y ago

The docs say that locales and LC_COLLATE work, are you sure?

1 more reply

bmn__8y ago

> One thing that Postgres lacks is accent insensitive collations.

So what if the functionality is not in core, but in an extension? You make it sound as if it does not exist at all and all we can do is gnash our teeth in frustration.

http://www.public-software-group.org/pg_collkey

first release in 2006 for PostgreSQL 8

karmakaze8y ago

Why not just use a functional index? I'd expect it to have much better performance.

medius8y ago

If you are migrating to AWS RDS, I recommend AWS Data Migration service. I migrated my live database (~50GB) from Mysql to Postgres (both RDS) with zero downtime.

I used AWS Schema Conversion Tool for initial PG schema. I customized the generated schema for my specific needs.

seanharr118y ago

A few of these steps could be solved with this tool, including schema/index migrations, and even initial (fast) data transfer: https://github.com/seanharr11/etlalchemy

tkyjonathan8y ago

I'm not against the idea of using postgres, but I have a sneaky suspicion, that you could have gotten that 30% speed benefit from MySQL, if you did some performance work on it. Or in other words, I don't feel that migrating from one relational database to another similar relational database was worth that much trouble.

collyw8y ago

Thats interesting, as my dev experience would allow me to tune queries better in Postgres (the visual query planner makes it easier and there are more join types), while tuning them through configuration on either database would involve a lot more trail and error reading. I would more likely know what to do by migrating.

aidos8y ago

I went through a similar process a few years ago (4k dbs into a single db, all mysql). You'll get something working fairly quickly and then spend days debugging data anomalies that have crept in over the years.

We ended up with a frankenstein mixture of bash scripts, a sequence of sql transformations, some python and some php to decode some of the data that had been stored in pickled / php encoded packages.

It's not something I want to repeat any time soon :-)

fermuch8y ago

It truly sounds like a nightmare. How much time did it took to move everything to psql, and how long until it was stable enough to not keep an eye on it?

aidos8y ago

To clarify, it all stayed on mysql — just pulled everything into a single db.

I can't remember exactly but the whole project was done in a real hurry, about 4 weeks I think. It included a rewrite of the application / website (about 20k lines of backend code, html and js / css), migration of all the data, creation of new transcoders, migration of a lot of data asset/video data from another server.

No idea how we got through it all in retrospect, but I had to. A few days after we finished I got married and went off on my honeymoon, so I guess I decided it was stable enough the day I released it...! :-)

JelteF8y ago

One of the main things I ran into when migrating from MySQL to postgres was that the default text columns (TEXT, VARCHAR, etc) behave differently when searching. MySQL matches text case insensitive, while postgres matches it case sensitive. This resulted in searches for content suddenly not returning data. Luckily there's an official CITEXT extension for postgres [1], which matches text case insensitive and adds the correct indices for doing so efficiently.

[1] https://www.postgresql.org/docs/9.1/static/citext.html

pasta8y ago

"MySQL matches text case insensitive"

This is of course only when you set a case insensitive collation (_ci).

But I agree this can be a problem because case insensitive collations are used a lot in MySQL.

Also other collations can be a trouble. For example the collation that returns results for both 'ß' and 'ss' (German)

ehxcaet8y ago

I'm not entirely sure what the technical reason for this change was. Of course, cleaning up your DBs and whatnot is a good thing. But why did they move it to Postgres instead of simply clean?

noir_lord8y ago

Would love to do this to the monstrosity I inherited in new job but it's a sispheyan task.

One day.

cntlzw8y ago

One time I saw a database, with a single table, with a single text column that contained the old database.

shrikant8y ago

This sounds similar to what I saw at a client a few years back. All XML responses from a REST API were inserted into a single VARCHAR(MAX) column (one row per response, but a response could have multiple "records") in a SQL Server database.

All interaction with this was through a byzantine web of XPath expressions using the severely neutered dialect of XQuery that shipped with SQL Server 2008.

Apologies for the mini-rant, your comment appears to have triggered some painful memories.

ishi8y ago

It's databases all the way down...

overcast8y ago

You should see the one here. An old flat file database monstrosity from the 80s, imported into oracle, complete with all of the duplicated data. It's the most non relational, relational database ever.

zimpenfish8y ago

Would a database stored in an XML file shared between 60+ clients over NFS beat that for monstrositiness?

5 more replies

noir_lord8y ago

I have 177 tables, 764 stored procedures and 44 joins.

Every single table has either a uniqid() or a UUID() which isn't the primary key (they all have incrementing integer keys) but is 'joined' on in code.

When he needs to 'join' he used coalesce() on selects.

Referential integrity doesn't exist.

Oh and the cherry on the shit sandwhich, we have four seperate systems that talk to this database written in 3 different languages (Java (Android scanners), C# (Factory scanners) and PHP (the main system).

So there is no way to accurately know which SP's are used by what without grepping the entire codebase looking for call <foo>.

The PHP is written badly in the old PHP pure procedural style running on an outdated version of PHP/Debian/MySQL.

Basically if you took the absolute worst approach to everything this would be the end result.

It's good fun!.

TekMol8y ago

Is it an universally accepted truth now that it's generally better to use Postgres then to use MySql?

Would anybody here use MySql for a new project? If so, why?

junkilo8y ago

Having used both from the prototyping stage to very large scale my answer: It doesn't matter.

They are both plenty feature complete to work well for relational data type problems. Grab the one you have more exposure to and you'll be fine.

That said, I'm still using mysql-innodb (latest percona) because the built-in replication has had more time to bake and its concurrency model (think 100k qps+) is plain better. For further reading see:

https://eng.uber.com/mysql-migration/

falcolas8y ago

Yes, I would use MySQL. I'm very familiar with it for one, and the extra storage engine options turn a sure thing into a no-brainer decision.

Ultimately, I think familiarity is and will always be the biggest driver for experienced devs who choose a DB.

jeffdavis8y ago

What storage engines and options do you use?

2 more replies

leesalminen8y ago

I chose MySQL (well, MariaDB) for a project that started in 2013. I chose it because I have the most experience maintaining it and have a pretty good handle on how to tune.

porker8y ago

Ditto. I still choose it for small to medium sized webapps because - let's face it - the data storage is not their pain point.

MySQL is fast enough, usually has enough features, I know how to tune it and queries, and I normally only bump into 2-3 things in a project which would've been easier with PostgreSQL. And can work around them.

The backwards-incompatible SQL_MODE changes between MySQL 5.6 and 5.7 made me reconsider, as legacy apps needed updating.

tl;dr: yes. But PostgreSQL would be easier for some items

1 more reply

brightball8y ago

Nope, because using MySQL instead of Postgres essentially cripples your dev tools once you know about everything PG opens up for you.

http://www.brightball.com/articles/why-should-you-learn-post...

sandGorgon8y ago

Most e-commerce systems start off on magento, which is MySQL only. So a LOT of MySQL projects are simply the answer to :

1. Did you use magento/Drupal/WordPress/etc PHP framework

2. Did you host on a 20$ per month server with unlimited bandwidth and free database ?

e12e8y ago

Like other comments here state, in general, use what you know. That said, postgresql has ~always been the better rdbms. Some would argue mysql at times have been a better ~key-value store with ~sql interface.

Apparently postgresql is difficult to run in a properly restricted bsd jail. Other than that, I'd be hard pressed to come up with reasons to prefer mysql on technical merit over either postgresql, or, say redis or something.

Avernar8y ago

I'm running postgresql in a FreeBSD vnet jail without issue. The network stack is not shared as it's a vnet jail. I have it set for per jail sysvipc, new in FreeBSD 11, which was the last thing I needed to fully isolate it from other jails.

Is there anything else that's needed to properly restrict the jail?

1 more reply

deathanatos8y ago

https://grimoire.ca/mysql/choose-something-else

That's essentially my reasons for preferring PostgreSQL.

nrmitchi8y ago

> Now is when one of the main reasons to take Postgres as our new database server. Postgres allows the use of Views(include link) supporting INSERT, UPDATE and DELETE meanwhile the relation between views and tables are one to one, ....isn’t it awesome?

Just a heads up, I think you missed a link in there

ibejoeb8y ago

When talking about replaying live activity on the test system:

>We took the third option because the other two options were quite intrusive and they might interfere at the normal use of our app because in both cases they were going to be in between our users request and our app

One of those two rejected options was goreplay. It is passive, like tcpdump. Right from the docs:

"Gor is not a proxy: you do not need to put 3-rd party tool to your critical path. Instead Gor just silently analyzes the traffic of your application and does not affect it anyhow."

StreamBright8y ago

Now that is what I call a proper migration. Great content! I was wondering how could somebody end up with corrupt data, I guess it was due to MySQL not strict enforcing the types probably.

ris8y ago

https://github.com/the4thdoctor/pg_chameleon

adrianlmm8y ago

I tried to migrate from Firebird to Postgres once and I failed, it needs more administration than Firebird.

ZeusNuts8y ago

What was the reason to first migrate and then repair data? Repair scripts could've been ran against the old setup too and you would've had less stuff to migrate.

j / k navigate · click thread line to collapse

82 comments

seanharr118y ago

I wrote an open source tool that migrates between any 2 relational databases.

https://github.com/seanharr11/etlalchemy

The tool uses SQLAlchemy to abstract column types, and handles various column type conversions between any flavor of SQL.

The one thing it needs are tests, and scalability support for tables bigger than a few GB!

mkez008y ago

Going to give this a try. Thanks for posting!

icelancer8y ago

Thanks much - going to test it between MSSQL and MySQL in a few days here!

seanharr118y ago

Let me know how it goes...please open issues if you find them!

willcodeforfoo8y ago

erjiang8y ago

ganeshkrishnan8y ago

If I may ask, what are you reasons to migrate from MySQL to Postgres?

lbotos8y ago

+1 for PGLoader. Still wrestling with getting it to work for a conversion, but the experience has been a pleasure so far and surprisingly easy.

sigi458y ago

I like the approach and think something like this is fun but i don't understand the businsess case at all.

I would have migrated MySQL 5.5 to 5.6 first, than cleaned the shit out of it, normalized everything, added a few indezes and suddently realize that there is not much left to do.

I like to work with postgresql but if you have 1,2k dbs which are more or less not claned up and crappy, what are they used anyway?

richdougherty8y ago

SOLAR_FIELDS8y ago

Might have been worthy of testing this to see if it was worth it to materialize certain views and refresh them every 15-30 minutes or so.

koolba8y ago

Alternatively did the application speed up after it was normalized? A normalized data set can be substantially smaller which may allow the entire data set to fit in memory.

Also, it's officially "PostgreSQL" or informally "Postgres". Not "PostGres".

Amezarak8y ago

gaius8y ago

Was this a big deal in the past?

In MySQL, yes. It lacked the concept of foreign keys.

1 more reply

berns8y ago

claar8y ago

It appears work is active (May 2017) in this area, but accent-insensitive collations are still a ways off: https://blog.2ndquadrant.com/icu-support-postgresql-10/

edit: fulafel corrects me below that you can get accent-insensitive collations working using PostgreSQL 10's new ICU collation support. Note that PostgreSQL 10 is currently pre-release.

fulafel8y ago

That post says that they currently work, but there is a new os-independent collation method coming.

Edit re your edit: the already working collations are in PG 9.x - the OS independent collations are coming in 10.x. (Hence the post title, "More robust collations with ICU support in PostgreSQL 10")

1 more reply

fulafel8y ago

The docs say that locales and LC_COLLATE work, are you sure?

1 more reply

bmn__8y ago

> One thing that Postgres lacks is accent insensitive collations.

So what if the functionality is not in core, but in an extension? You make it sound as if it does not exist at all and all we can do is gnash our teeth in frustration.

http://www.public-software-group.org/pg_collkey

first release in 2006 for PostgreSQL 8

karmakaze8y ago

Why not just use a functional index? I'd expect it to have much better performance.

medius8y ago

If you are migrating to AWS RDS, I recommend AWS Data Migration service. I migrated my live database (~50GB) from Mysql to Postgres (both RDS) with zero downtime.

I used AWS Schema Conversion Tool for initial PG schema. I customized the generated schema for my specific needs.

seanharr118y ago

A few of these steps could be solved with this tool, including schema/index migrations, and even initial (fast) data transfer: https://github.com/seanharr11/etlalchemy

tkyjonathan8y ago

collyw8y ago

aidos8y ago

We ended up with a frankenstein mixture of bash scripts, a sequence of sql transformations, some python and some php to decode some of the data that had been stored in pickled / php encoded packages.

It's not something I want to repeat any time soon :-)

fermuch8y ago

It truly sounds like a nightmare. How much time did it took to move everything to psql, and how long until it was stable enough to not keep an eye on it?

aidos8y ago

To clarify, it all stayed on mysql — just pulled everything into a single db.

JelteF8y ago

[1] https://www.postgresql.org/docs/9.1/static/citext.html

pasta8y ago

"MySQL matches text case insensitive"

This is of course only when you set a case insensitive collation (_ci).

But I agree this can be a problem because case insensitive collations are used a lot in MySQL.

Also other collations can be a trouble. For example the collation that returns results for both 'ß' and 'ss' (German)

ehxcaet8y ago

I'm not entirely sure what the technical reason for this change was. Of course, cleaning up your DBs and whatnot is a good thing. But why did they move it to Postgres instead of simply clean?

noir_lord8y ago

Would love to do this to the monstrosity I inherited in new job but it's a sispheyan task.

One day.

cntlzw8y ago

One time I saw a database, with a single table, with a single text column that contained the old database.

shrikant8y ago

All interaction with this was through a byzantine web of XPath expressions using the severely neutered dialect of XQuery that shipped with SQL Server 2008.

Apologies for the mini-rant, your comment appears to have triggered some painful memories.

ishi8y ago

It's databases all the way down...

overcast8y ago

zimpenfish8y ago

Would a database stored in an XML file shared between 60+ clients over NFS beat that for monstrositiness?

5 more replies

noir_lord8y ago

I have 177 tables, 764 stored procedures and 44 joins.

Every single table has either a uniqid() or a UUID() which isn't the primary key (they all have incrementing integer keys) but is 'joined' on in code.

When he needs to 'join' he used coalesce() on selects.

Referential integrity doesn't exist.

So there is no way to accurately know which SP's are used by what without grepping the entire codebase looking for call <foo>.

The PHP is written badly in the old PHP pure procedural style running on an outdated version of PHP/Debian/MySQL.

Basically if you took the absolute worst approach to everything this would be the end result.

It's good fun!.

TekMol8y ago

Is it an universally accepted truth now that it's generally better to use Postgres then to use MySql?

Would anybody here use MySql for a new project? If so, why?

junkilo8y ago

Having used both from the prototyping stage to very large scale my answer: It doesn't matter.

They are both plenty feature complete to work well for relational data type problems. Grab the one you have more exposure to and you'll be fine.

https://eng.uber.com/mysql-migration/

falcolas8y ago

Yes, I would use MySQL. I'm very familiar with it for one, and the extra storage engine options turn a sure thing into a no-brainer decision.

Ultimately, I think familiarity is and will always be the biggest driver for experienced devs who choose a DB.

jeffdavis8y ago

What storage engines and options do you use?

2 more replies

leesalminen8y ago

I chose MySQL (well, MariaDB) for a project that started in 2013. I chose it because I have the most experience maintaining it and have a pretty good handle on how to tune.

porker8y ago

Ditto. I still choose it for small to medium sized webapps because - let's face it - the data storage is not their pain point.

The backwards-incompatible SQL_MODE changes between MySQL 5.6 and 5.7 made me reconsider, as legacy apps needed updating.

tl;dr: yes. But PostgreSQL would be easier for some items

1 more reply

brightball8y ago

Nope, because using MySQL instead of Postgres essentially cripples your dev tools once you know about everything PG opens up for you.

http://www.brightball.com/articles/why-should-you-learn-post...

sandGorgon8y ago

Most e-commerce systems start off on magento, which is MySQL only. So a LOT of MySQL projects are simply the answer to :

1. Did you use magento/Drupal/WordPress/etc PHP framework

2. Did you host on a 20$ per month server with unlimited bandwidth and free database ?

e12e8y ago

Avernar8y ago

Is there anything else that's needed to properly restrict the jail?

1 more reply

deathanatos8y ago

https://grimoire.ca/mysql/choose-something-else

That's essentially my reasons for preferring PostgreSQL.

nrmitchi8y ago

Just a heads up, I think you missed a link in there

ibejoeb8y ago

When talking about replaying live activity on the test system:

One of those two rejected options was goreplay. It is passive, like tcpdump. Right from the docs:

"Gor is not a proxy: you do not need to put 3-rd party tool to your critical path. Instead Gor just silently analyzes the traffic of your application and does not affect it anyhow."

StreamBright8y ago

Now that is what I call a proper migration. Great content! I was wondering how could somebody end up with corrupt data, I guess it was due to MySQL not strict enforcing the types probably.

ris8y ago

https://github.com/the4thdoctor/pg_chameleon

adrianlmm8y ago

I tried to migrate from Firebird to Postgres once and I failed, it needs more administration than Firebird.

ZeusNuts8y ago

What was the reason to first migrate and then repair data? Repair scripts could've been ran against the old setup too and you would've had less stuff to migrate.

j / k navigate · click thread line to collapse