Show HN: Stellar – Git for PostreSQL and MySQL (opens in new tab)

(github.com)

325 pointsobsession11y ago77 comments

77 comments

67 comments · 23 top-level

robert_tweed11y ago· 8 in thread

Generally the hardest thing with version control on a database (for an evolving codebase) is separating unrelated changes - such as schema changes vs content updates - and branching and merging those changes in sync with the code dependencies. Another issue is non-destructively replaying development changes into test/production environments.

So for example, you might have a feature branch that includes some schema changes and some value modifications, and a content branch that includes a bunch of inserts into a few content tables that happen to include foreign key references to each other (so you need to maintain referential integrity when replaying those updates/inserts).

I don't see anything in the description that indicates this tool address those problems. For me, those are really the only problems that a DB version control system ought to be focused on. Speed of snapshotting is not all that important in a development environment as you typically work on a cut-down dataset anyway. A minute or so to take a snapshot a few times a day isn't a huge deal, whereas taking more frequent snapshots doesn't seem like something that adds any value, if it doesn't address any of the other problems.

davesque11y ago

"Another issue is non-destructively replaying development changes into test/production environments."

"...you need to maintain referential integrity when replaying those updates/inserts"

I think most of your misgivings about this tool are grounded in the expectation that it would provide merge features like Git. Since it claims to be "git for databases", I suppose that's a fair expectation.

However, the tool does provide the core functionality of Git, which is to implement a tracking system for hash-tagged objects (or, in this case, tables). This does have value. Not only is it faster than standard db dumps, it's also more space-efficient. For some people, this could be really valuable.

I think it's an interesting tool and could serve as a launching point for more powerful/useful functionality.

baudehlo11y ago

This sounds like something Sqitch would help with.

http://sqitch.org

mbucc11y ago

FWIW, I found shmig [1] much simpler than sqitch.

Last I used sqitch, it returned a zero value on error, which made me leery of using it for scripted deployments.

[1] https://github.com/naquad/shmig

1 more reply

shkkmo11y ago

Sqitch sounds like just another migration tool, not a DB change tracking tool.

I have been looking for a tool that will allow me to track, diff and revert changes to the content of specific "business logic" tables so that we can acurately track and test those changes.

It doesn't look like anything like that exists, so eventually I'll have to roll my own.

bigpeopleareold11y ago

I back up this recommendation whole-heartedly. Sqitch is a fantastic database tool. I stopped using Rails-type migrations because of it (I write PL/pgSQL functions and wanted a migration tool to handle that well, amongst other reasons.)

abhishivsaxena11y ago

I would disagree that referential integrity and schemas are the only issues a DB version control should focus on.

For example, I would be very interested in having git like branching on top of something like this - http://sandbox.substance.io/docs/lorem_ipsum.json. Basically version controlled schemaless object tree.

I recently implemented something like this, backed by mongodb, and by exposing a HTTP api which mimicked git. I had to relax these two requirements, but it is still worth it.

However I would be very much interested in using libgit2 with a database backend instead of filesystem.

EDIT: not affiliated with substance in any way.

cbsmith11y ago

You can implement a change DAG for a DB inside the DB. It works pretty well.

1 more reply

akurilin11y ago

I'd love a tool that solves that problem. Schema / data / user / permission migrations, support for both production and development environments and integration with CM tools like ansible would be fantastic.

lucian190011y ago· 8 in thread

This sort of thing is useful, but already supported by Postgres through transactional DDL. Migrations that fail will have their transaction reverted.

michaelmior11y ago

Suppose I'm working on a new feature branch and I run a bunch of migrations and add new data. Then I need to go make a bug fix on the master branch. Using Stellar, I could just take a snapshot before starting my feature branch, restore the DB state, do my bug fix, then go back to the DB as it was in my feature branch. I haven't tried Stellar yet, but this seems like a scenario where it would be useful. (And definitely not supported through Postgres transactions.)

masklinn11y ago

You could use schemas or multiple databases for that.

1 more reply

obsessionOP11y ago

Transaction is only reverted if the migration fails. Stellar helps you if your migration succeeds but does the wrong thing (deleting wrong column, missing WHERE in UPDATE statement).

killing_time11y ago

The use case which this really excites me about is automated testing from the GUI level. If the performance is good enough, this would be really useful for restoring DB state in between tests.

j_s11y ago

ASP.NET developers using TransactionScope can use http://amnesia.codeplex.com/ to manage distributed transactions for automated UI testing.

obsessionOP11y ago

While doing single restore is fast enough, you will have problems doing multiple restores in a row as the background process needs to finish before another restore is possible (which may take several seconds when you are dealing with big databases).

mdellabitta11y ago

What if you're writing a migration that includes DML, and it fails partway through? Wouldn't it be good to be able to restore the DB state automatically and try again?

AlterEgo2011y ago

Postgres has transactional DDL and DML. If anything fails - everything is rolled back.

1 more reply

squigs2511y ago· 7 in thread

The implications for this extend beyond backing up your database.

Imagine a world where daily time-series data can be stored efficiently: This is a lesser known use case, but it works like this: I'm a financial company and I want to store 1000 metrics about a potential customer. Maybe the number of transactions in the past year, the number of defaults, the number of credit cards, etc.

Normally I would have to duplicate this row in the database every day/week/month/year for every potential customer. With some kind of git-like storing of diffs between the row today and the row yesterday, I could easily have access to time series information without duplicating unchanged information. This would accomplish MASSIVE storage savings.

FWIW efficiently storing time series data is big problem at my company. No off the shelf solution makes this easy for us right now, and we would rather throw cheap hard disk at the problem rather than expensive engineers.

michaelmior11y ago

There are a lot of existing compression algorithms for time series data that do just this. I'm not sure how well any of these are implemented however. I think the problem is not necessarily how the data is stored, since that's fairly easy to fix with a bit of engineering effort if you're willing to write your own system. The harder part is rewriting query engines to take advantage of this sort of compression. Although ideally this could just be abstracted away by the storage layer.

beagle311y ago

Kx systems kdb+ does this incredibly quickly and easily. I'm sure OneTick, Vhayu and others do too, though I son't have experience with them.

If you insist on standard SQL databases for time series, you'll have a lot more pain

symfrog11y ago

Have you looked at Datomic? It seems to fit your problem description well.

AlwaysBCoding11y ago

+1 for Datomic, seems to be right in the wheelhouse for this problem.

habosa11y ago

What about Cassandra? I believe it efficiently stores multiple time values for each (row, column) value as it changes. Google's BigTable design does this, and I believe you can use BigTable through Appengine.

Terr_11y ago

Sounds like a case where Event-sourcing & CQRS might've been handy. (Not reality something you can easily bolt-on afterwards, though.]

personZ11y ago

Column-oriented databases virtually all feature this in the form of column compression (e.g. "repeat this value for the next 1000 rows"). And if you don't want column compression, they have sparse data filling/interpolation -- e.g. use the last available value from a time series. This is pretty much their bread and butter. Interpolation is essentially making the query engine smarter, so you don't end up in the situation you're apparently facing where you have to insert duplicate records purely to satisfy a simplistic join.

Back to this product (which appears to simply wholesale copy databases?), I use LVM for exactly what it is doing -- I create and rollback and access and update LVM snapshots of databases. The snapshots are instant, and in most situations the data duplications is very limited. LVM is one of the coolest, most under-appreciated facets of most Linux installs -- http://goo.gl/J2mIvG

mbrock11y ago· 6 in thread

I wish projects like these would always include some basic info in their README about: (1) how it works, and (2) how it might fail.

danudey11y ago

Seconded. The extent of what I know about this project is:

1. It's like git somehow?

2. I can run some commands which presumably do something?

3. That something happens faster than something postgres can do, which I assume accomplishes the same thing?

4. I assume there's nothing else I need to know?

ireflect11y ago

That's the first thing I looked for too. Anyway, I opened an issue: https://github.com/fastmonkeys/stellar/issues/13

areski11y ago

Totally agree, peoples needs to know how it works before starting using it.

davesque11y ago

From what little I've gathered, the internal implementation is basically a system which tracks hash-tagged copies of database tables.

davesque11y ago

There are only like 700 lines of Python code. Figure it out and submit a pull request to update the README.

mbca11y ago

So it's yet another toy project with grandiose, overblown claims ("git for databases"? seriously?!)

An honest update to the README would be to take the whole thing down several notches, if nothing else then to avoid the kind of confusion the post you're replying to expresses.

1 more reply

amirmc11y ago· 3 in thread

If anyone's interested in git-like storage systems then it's work checking out Irmin [1]. Previous discussion is at [2].

Excerpt: "Irmin is a library to persist and synchronize distributed data structures both on-disk and in-memory. It enables a style of programming very similar to the Git workflow, where distributed nodes fork, fetch, merge and push data between each other. The general idea is that you want every active node to get a local (partial) copy of a global database and always be very explicit about how and when data is shared and migrated

Irmin is

not, strictly speaking, a full database engine. It is, as are all other components of Mirage OS, a collection of libraries designed to solve different flavours of the challenges raised by the CAP theorem. Each application can select the right combination of libraries to solve its particular distributed problem."

[1] http://openmirage.org/blog/introducing-irmin

[2] https://news.ycombinator.com/item?id=8053687

sgrove11y ago

I've spent a day looking at Irmin, trying to figure out the JSON CRUD interface, but there's just no documentation, and the irc channel is pretty much dead. The code looks good, but it's my first introduction to OCaml, coming from Clojure. Would love to see some more documentation on it.

amirmc11y ago

Irmin's still being developed, hence not many docs at the moment. Apologies there was no-one responding in irc (I assume you mean #mirage - there are usually 30+ folks there). The mailing list is quite active and is a good place to pose questions.

http://lists.xenproject.org/cgi-bin/mailman/listinfo/mirageo...

mamcx11y ago

This could work to syncronize data from multiple clients for a invoice, for example?

falcolas11y ago· 3 in thread

So, it appears to just copy tables around within the database. I wouldn't want to use this on a DB over a few MB in size. Sure, restores are "fast" (a table rename), but copies are not so much.

I can't imagine this would be kind to a production database (lots of cleanup from copied & deleted tables), and would consume a lot more space than a gripped logical backup of the tables in question.

obsessionOP11y ago

I have regularly used this with database that's nearing 1000 megabytes. I don't particularly mind slow snapshotting because my workflow is more about restoring database back to baseline than taking copies.

Please don't use this for production. It is not stable enough and you only end up with lost data.

falcolas11y ago

Why not use a binary backup method? Faster to backup and restore.

1 more reply

sciurus11y ago

For those curious, on PostgreSQL it executes

    CREATE DATABASE "snapshot" WITH TEMPLATE "source";

and on MySQL it loops creates the new database and loops over all the tables running

    INSERT INTO snapshot.table SELECT * FROM source.table;

https://github.com/fastmonkeys/stellar/blob/master/stellar/o...

Gigablah11y ago· 2 in thread

From the code:

    INSERT INTO %s.%s SELECT * FROM %s.%s

Yeah, good luck with that.

kstrauser11y ago

What's wrong with that? Assuming you escape the table names correctly, that seems reasonable. And if you can't escape the table names, you're going to have a hard time dynamically generating queries anyway. Parameterized queries are a baseline requirement for values, but are rarely supported by client libraries for things like table or schema names.

Gigablah11y ago

Nothing wrong with the syntax, it's just that if you're going to create an entire mirror of your database, you might as well use mysqldump. (Plus it'll actually be portable).

codeoclock11y ago· 2 in thread

Unfortunate name, excellent project :)

gexla11y ago

To elaborate on what this comment may be referring to. A Ripple (cryptocurrency) fork was recently released with the name of Stellar.

uptown11y ago

It's also the phonetical-match to an iPhone app:

https://steller.co/

JohnDotAwesome11y ago· 2 in thread

How does it work? Where does it breakdown? Why are these things not in the README?

mixologic11y ago

Because its not doing anything magic whatsoever.

https://github.com/fastmonkeys/stellar/blob/master/stellar/o...

davesque11y ago

There isn't that much code. It's a new project. Do some spelunking and find out.

swehner11y ago· 1 in thread

Line 53 of https://github.com/fastmonkeys/stellar/blob/master/stellar/o... is

                CREATE TABLE %s.%s LIKE %s.%s

This made me think of a table called

                create table `a; drop table users;`  (col int);

... which works in mysql.

I don't know if the stellar code will trip over something like this. But mysql (SQL) shouldn't even allow names like that.

obsessionOP11y ago

Yes, Stellar will probably trip over this. Similarly, if the attacker can edit stellar.yaml, they probably can edit your .bashrc as well.

Im adding this to my TODO list.

jamesmoss11y ago· 1 in thread

Interestingly they don't show MySQL benchmarks in the readme; I suspect it might be because the MySQL implementation is pretty basic

https://github.com/fastmonkeys/stellar/blob/master/stellar/o...

obsessionOP11y ago

MySQL support was definitely an afterthought and could probably be improved (maybe tracking the binary files directly?).

iso8859-111y ago· 1 in thread

how does this compare to time travel queries? http://en.wikipedia.org/w/index.php?title=Temporal_database#...

walterbell11y ago

Any databases that you would recommend for for temporal logic?

m3h11y ago

Why does the author compare it to Git? The functions this software performs are no where near those performed by Git. Nor it is a proper version control system.

bronson11y ago

Nice. I wrote a similar tool for Rails / ActiveRecord models: https://github.com/bronson/table_differ

It takes snapshots and computes diffs between snapshots or the live database. It lets me drop and re-import some of my app's tables, then compute the minimum set of changes between the previous import and the new import. I wouldn't call it "git for ActiveRecord models" but it appears to be similar to this project.

Comments welcome! The docs, as always, could use some help.

crad11y ago

Maybe I'm missing something, but I didn't see anything with regard to indexes, users, stored procedures, views or what not.

Seems like it's for table schema snapshotting in a database without any external storage.

Browsing through the code, I see that it's highly table centric using SQLAlchemy.

jdc058911y ago

Shameless plug for mite: https://github.com/jdc0589/mite-node Simple migrations that take advantage of everything you already know about git and sql, plus some other cool stuff.

It's not too mature yet, the readme is mediocre at best, and it has some issues that will popup when working with a team, but it's pretty damn useful.

jimktrains211y ago

While not exactly the same thing, I've recently found and started using https://github.com/nkiraly/DBSteward to specify schema and then store the spec in my repo with the code. It also supports diffing the current schema against a previous one, so that nice upgrade sql scripts can be generated.

iurisilvio11y ago

I expected something related with Stellar coins.

Looks like a good project, I definitely want an easy way to manage development databases.

level0911y ago

This is a nice project. I used to have my database dump tracked by git (in binary mode). anytime my db changes I'll have to overwrite the file with the new database dump and include it with the commit.

I'm just wondering if this project offers anything special/better than the method I described.

ZenoArrow11y ago

Just a small correction; it's not PostreSQL, it's PostgreSQL.

edem11y ago

Folks might confuse this with the Stellar currency (stellar.org). You might give some emphasis in the title.

mosselman11y ago

Looks very nice, could you put up some practical examples?

josephcooney11y ago

Typo? Shouldn't it be PostgreSQL not PostreSQL?

j / k navigate · click thread line to collapse

77 comments

67 comments · 23 top-level

robert_tweed11y ago· 8 in thread

davesque11y ago

"Another issue is non-destructively replaying development changes into test/production environments."

"...you need to maintain referential integrity when replaying those updates/inserts"

I think it's an interesting tool and could serve as a launching point for more powerful/useful functionality.

baudehlo11y ago

This sounds like something Sqitch would help with.

http://sqitch.org

mbucc11y ago

FWIW, I found shmig [1] much simpler than sqitch.

Last I used sqitch, it returned a zero value on error, which made me leery of using it for scripted deployments.

[1] https://github.com/naquad/shmig

1 more reply

shkkmo11y ago

Sqitch sounds like just another migration tool, not a DB change tracking tool.

I have been looking for a tool that will allow me to track, diff and revert changes to the content of specific "business logic" tables so that we can acurately track and test those changes.

It doesn't look like anything like that exists, so eventually I'll have to roll my own.

bigpeopleareold11y ago

abhishivsaxena11y ago

I would disagree that referential integrity and schemas are the only issues a DB version control should focus on.

I recently implemented something like this, backed by mongodb, and by exposing a HTTP api which mimicked git. I had to relax these two requirements, but it is still worth it.

However I would be very much interested in using libgit2 with a database backend instead of filesystem.

EDIT: not affiliated with substance in any way.

cbsmith11y ago

You can implement a change DAG for a DB inside the DB. It works pretty well.

1 more reply

akurilin11y ago

lucian190011y ago· 8 in thread

This sort of thing is useful, but already supported by Postgres through transactional DDL. Migrations that fail will have their transaction reverted.

michaelmior11y ago

masklinn11y ago

You could use schemas or multiple databases for that.

1 more reply

obsessionOP11y ago

Transaction is only reverted if the migration fails. Stellar helps you if your migration succeeds but does the wrong thing (deleting wrong column, missing WHERE in UPDATE statement).

killing_time11y ago

The use case which this really excites me about is automated testing from the GUI level. If the performance is good enough, this would be really useful for restoring DB state in between tests.

j_s11y ago

ASP.NET developers using TransactionScope can use http://amnesia.codeplex.com/ to manage distributed transactions for automated UI testing.

obsessionOP11y ago

mdellabitta11y ago

What if you're writing a migration that includes DML, and it fails partway through? Wouldn't it be good to be able to restore the DB state automatically and try again?

AlterEgo2011y ago

Postgres has transactional DDL and DML. If anything fails - everything is rolled back.

1 more reply

squigs2511y ago· 7 in thread

The implications for this extend beyond backing up your database.

michaelmior11y ago

beagle311y ago

Kx systems kdb+ does this incredibly quickly and easily. I'm sure OneTick, Vhayu and others do too, though I son't have experience with them.

If you insist on standard SQL databases for time series, you'll have a lot more pain

symfrog11y ago

Have you looked at Datomic? It seems to fit your problem description well.

AlwaysBCoding11y ago

+1 for Datomic, seems to be right in the wheelhouse for this problem.

habosa11y ago

Terr_11y ago

Sounds like a case where Event-sourcing & CQRS might've been handy. (Not reality something you can easily bolt-on afterwards, though.]

personZ11y ago

mbrock11y ago· 6 in thread

I wish projects like these would always include some basic info in their README about: (1) how it works, and (2) how it might fail.

danudey11y ago

Seconded. The extent of what I know about this project is:

1. It's like git somehow?

2. I can run some commands which presumably do something?

3. That something happens faster than something postgres can do, which I assume accomplishes the same thing?

4. I assume there's nothing else I need to know?

ireflect11y ago

That's the first thing I looked for too. Anyway, I opened an issue: https://github.com/fastmonkeys/stellar/issues/13

areski11y ago

Totally agree, peoples needs to know how it works before starting using it.

davesque11y ago

From what little I've gathered, the internal implementation is basically a system which tracks hash-tagged copies of database tables.

davesque11y ago

There are only like 700 lines of Python code. Figure it out and submit a pull request to update the README.

mbca11y ago

So it's yet another toy project with grandiose, overblown claims ("git for databases"? seriously?!)

An honest update to the README would be to take the whole thing down several notches, if nothing else then to avoid the kind of confusion the post you're replying to expresses.

1 more reply

amirmc11y ago· 3 in thread

If anyone's interested in git-like storage systems then it's work checking out Irmin [1]. Previous discussion is at [2].

Irmin is

[1] http://openmirage.org/blog/introducing-irmin

[2] https://news.ycombinator.com/item?id=8053687

sgrove11y ago

amirmc11y ago

http://lists.xenproject.org/cgi-bin/mailman/listinfo/mirageo...

mamcx11y ago

This could work to syncronize data from multiple clients for a invoice, for example?

falcolas11y ago· 3 in thread

So, it appears to just copy tables around within the database. I wouldn't want to use this on a DB over a few MB in size. Sure, restores are "fast" (a table rename), but copies are not so much.

I can't imagine this would be kind to a production database (lots of cleanup from copied & deleted tables), and would consume a lot more space than a gripped logical backup of the tables in question.

obsessionOP11y ago

Please don't use this for production. It is not stable enough and you only end up with lost data.

falcolas11y ago

Why not use a binary backup method? Faster to backup and restore.

1 more reply

sciurus11y ago

For those curious, on PostgreSQL it executes

    CREATE DATABASE "snapshot" WITH TEMPLATE "source";

and on MySQL it loops creates the new database and loops over all the tables running

    INSERT INTO snapshot.table SELECT * FROM source.table;

https://github.com/fastmonkeys/stellar/blob/master/stellar/o...

Gigablah11y ago· 2 in thread

From the code:

    INSERT INTO %s.%s SELECT * FROM %s.%s

Yeah, good luck with that.

kstrauser11y ago

Gigablah11y ago

Nothing wrong with the syntax, it's just that if you're going to create an entire mirror of your database, you might as well use mysqldump. (Plus it'll actually be portable).

codeoclock11y ago· 2 in thread

Unfortunate name, excellent project :)

gexla11y ago

To elaborate on what this comment may be referring to. A Ripple (cryptocurrency) fork was recently released with the name of Stellar.

uptown11y ago

It's also the phonetical-match to an iPhone app:

https://steller.co/

JohnDotAwesome11y ago· 2 in thread

How does it work? Where does it breakdown? Why are these things not in the README?

mixologic11y ago

Because its not doing anything magic whatsoever.

https://github.com/fastmonkeys/stellar/blob/master/stellar/o...

davesque11y ago

There isn't that much code. It's a new project. Do some spelunking and find out.

swehner11y ago· 1 in thread

Line 53 of https://github.com/fastmonkeys/stellar/blob/master/stellar/o... is

                CREATE TABLE %s.%s LIKE %s.%s

This made me think of a table called

                create table `a; drop table users;`  (col int);

... which works in mysql.

I don't know if the stellar code will trip over something like this. But mysql (SQL) shouldn't even allow names like that.

obsessionOP11y ago

Yes, Stellar will probably trip over this. Similarly, if the attacker can edit stellar.yaml, they probably can edit your .bashrc as well.

Im adding this to my TODO list.

jamesmoss11y ago· 1 in thread

Interestingly they don't show MySQL benchmarks in the readme; I suspect it might be because the MySQL implementation is pretty basic

https://github.com/fastmonkeys/stellar/blob/master/stellar/o...

obsessionOP11y ago

MySQL support was definitely an afterthought and could probably be improved (maybe tracking the binary files directly?).

iso8859-111y ago· 1 in thread

how does this compare to time travel queries? http://en.wikipedia.org/w/index.php?title=Temporal_database#...

walterbell11y ago

Any databases that you would recommend for for temporal logic?

m3h11y ago

Why does the author compare it to Git? The functions this software performs are no where near those performed by Git. Nor it is a proper version control system.

bronson11y ago

Nice. I wrote a similar tool for Rails / ActiveRecord models: https://github.com/bronson/table_differ

Comments welcome! The docs, as always, could use some help.

crad11y ago

Maybe I'm missing something, but I didn't see anything with regard to indexes, users, stored procedures, views or what not.

Seems like it's for table schema snapshotting in a database without any external storage.

Browsing through the code, I see that it's highly table centric using SQLAlchemy.

jdc058911y ago

Shameless plug for mite: https://github.com/jdc0589/mite-node Simple migrations that take advantage of everything you already know about git and sql, plus some other cool stuff.

It's not too mature yet, the readme is mediocre at best, and it has some issues that will popup when working with a team, but it's pretty damn useful.

jimktrains211y ago

iurisilvio11y ago

I expected something related with Stellar coins.

Looks like a good project, I definitely want an easy way to manage development databases.

level0911y ago

I'm just wondering if this project offers anything special/better than the method I described.

ZenoArrow11y ago

Just a small correction; it's not PostreSQL, it's PostgreSQL.

edem11y ago

Folks might confuse this with the Stellar currency (stellar.org). You might give some emphasis in the title.

mosselman11y ago

Looks very nice, could you put up some practical examples?

josephcooney11y ago

Typo? Shouldn't it be PostgreSQL not PostreSQL?

j / k navigate · click thread line to collapse