So for example, you might have a feature branch that includes some schema changes and some value modifications, and a content branch that includes a bunch of inserts into a few content tables that happen to include foreign key references to each other (so you need to maintain referential integrity when replaying those updates/inserts).
I don't see anything in the description that indicates this tool address those problems. For me, those are really the only problems that a DB version control system ought to be focused on. Speed of snapshotting is not all that important in a development environment as you typically work on a cut-down dataset anyway. A minute or so to take a snapshot a few times a day isn't a huge deal, whereas taking more frequent snapshots doesn't seem like something that adds any value, if it doesn't address any of the other problems.
"...you need to maintain referential integrity when replaying those updates/inserts"
I think most of your misgivings about this tool are grounded in the expectation that it would provide merge features like Git. Since it claims to be "git for databases", I suppose that's a fair expectation.
However, the tool does provide the core functionality of Git, which is to implement a tracking system for hash-tagged objects (or, in this case, tables). This does have value. Not only is it faster than standard db dumps, it's also more space-efficient. For some people, this could be really valuable.
I think it's an interesting tool and could serve as a launching point for more powerful/useful functionality.
Last I used sqitch, it returned a zero value on error, which made me leery of using it for scripted deployments.
I have been looking for a tool that will allow me to track, diff and revert changes to the content of specific "business logic" tables so that we can acurately track and test those changes.
It doesn't look like anything like that exists, so eventually I'll have to roll my own.
For example, I would be very interested in having git like branching on top of something like this - http://sandbox.substance.io/docs/lorem_ipsum.json. Basically version controlled schemaless object tree.
I recently implemented something like this, backed by mongodb, and by exposing a HTTP api which mimicked git. I had to relax these two requirements, but it is still worth it.
However I would be very much interested in using libgit2 with a database backend instead of filesystem.
EDIT: not affiliated with substance in any way.
1. It's like git somehow?
2. I can run some commands which presumably do something?
3. That something happens faster than something postgres can do, which I assume accomplishes the same thing?
4. I assume there's nothing else I need to know?
An honest update to the README would be to take the whole thing down several notches, if nothing else then to avoid the kind of confusion the post you're replying to expresses.
Excerpt: "Irmin is a library to persist and synchronize distributed data structures both on-disk and in-memory. It enables a style of programming very similar to the Git workflow, where distributed nodes fork, fetch, merge and push data between each other. The general idea is that you want every active node to get a local (partial) copy of a global database and always be very explicit about how and when data is shared and migrated
Irmin is not, strictly speaking, a full database engine. It is, as are all other components of Mirage OS, a collection of libraries designed to solve different flavours of the challenges raised by the CAP theorem. Each application can select the right combination of libraries to solve its particular distributed problem."
http://lists.xenproject.org/cgi-bin/mailman/listinfo/mirageo...
I can't imagine this would be kind to a production database (lots of cleanup from copied & deleted tables), and would consume a lot more space than a gripped logical backup of the tables in question.
Please don't use this for production. It is not stable enough and you only end up with lost data.
CREATE DATABASE "snapshot" WITH TEMPLATE "source";
and on MySQL it loops creates the new database and loops over all the tables running INSERT INTO snapshot.table SELECT * FROM source.table;
https://github.com/fastmonkeys/stellar/blob/master/stellar/o... INSERT INTO %s.%s SELECT * FROM %s.%s
Yeah, good luck with that.It takes snapshots and computes diffs between snapshots or the live database. It lets me drop and re-import some of my app's tables, then compute the minimum set of changes between the previous import and the new import. I wouldn't call it "git for ActiveRecord models" but it appears to be similar to this project.
Comments welcome! The docs, as always, could use some help.
Imagine a world where daily time-series data can be stored efficiently: This is a lesser known use case, but it works like this: I'm a financial company and I want to store 1000 metrics about a potential customer. Maybe the number of transactions in the past year, the number of defaults, the number of credit cards, etc.
Normally I would have to duplicate this row in the database every day/week/month/year for every potential customer. With some kind of git-like storing of diffs between the row today and the row yesterday, I could easily have access to time series information without duplicating unchanged information. This would accomplish MASSIVE storage savings.
FWIW efficiently storing time series data is big problem at my company. No off the shelf solution makes this easy for us right now, and we would rather throw cheap hard disk at the problem rather than expensive engineers.
If you insist on standard SQL databases for time series, you'll have a lot more pain
Back to this product (which appears to simply wholesale copy databases?), I use LVM for exactly what it is doing -- I create and rollback and access and update LVM snapshots of databases. The snapshots are instant, and in most situations the data duplications is very limited. LVM is one of the coolest, most under-appreciated facets of most Linux installs -- http://goo.gl/J2mIvG
Seems like it's for table schema snapshotting in a database without any external storage.
Browsing through the code, I see that it's highly table centric using SQLAlchemy.
CREATE TABLE %s.%s LIKE %s.%s
This made me think of a table called create table `a; drop table users;` (col int);
... which works in mysql.I don't know if the stellar code will trip over something like this. But mysql (SQL) shouldn't even allow names like that.
Im adding this to my TODO list.
It's not too mature yet, the readme is mediocre at best, and it has some issues that will popup when working with a team, but it's pretty damn useful.
https://github.com/fastmonkeys/stellar/blob/master/stellar/o...
Looks like a good project, I definitely want an easy way to manage development databases.
I'm just wondering if this project offers anything special/better than the method I described.
https://github.com/fastmonkeys/stellar/blob/master/stellar/o...