Each time, I wonder why people are recreating the database's journal, within the database, where this meta-journal will itself be getting journaled.
Why not just consume the actual DB journal?
I'm not sure how that'd work for SQLite—it'd probably require a hack of some sort, since SQLite's journal is an ephemeral part of the DB file. But for more full-fledged RDBMSes, this would just be a matter of hijacking a streaming-replication-based backup system to act as a store of record for events. E.g., in the case of Postgres, you'd just have an ETL pipeline pointed at your WAL-E bucket, parsing through your WAL files either to load into a separate OLAP store, or for stateless map-reduction whenever you need to answer a question.
It's decidedly non-trivial to do so, as usually the journal doesn't contain all the information to do so. Journal writes are often a bottleneck so DB engines try to restrict their contents to just the necessary parts. Typically it'll e.g. not contain information about table schemas etc.
You can do so, see e.g. postgresql's logical decoding, but it's plenty of additional work.
The point I was trying to make is that the alternative—writing all your DB activity into the DB—means that you're already, 100% guaranteed, creating more overhead than the heaviest possible journal-safety properties would create. You're writing unabridged journal entries into the DB, where they then get even more journal attached.
There is no world where it makes more sense to write a copy of everything you do into an "everything I did" table, than it does to use an append-only / immutable / log-centric DBMS (where "Postgres with an ETL pipeline pointed at its WAL logs" is a rather hacky immutable DBMS.)
If you know, before you ever start coding, that you will need to know everything that ever happened, writing everything twice—when your DBMS is already writing everything twice—is a rather silly way to choose to go about solving your problems. You're already in a world where you need different guarantees than a traditional (ephemeral) RDBMS gives you; you just haven't realized it yet.
select * from journal
Neither stuff like select * from index_customer
and many other things. This is good and bad. Good because is less likely to people to mess with the engine and bad when you think in your DB as more that a restricted data store.My dream is build one more flexible, starting with a relational lang:
https://bitbucket.org/tablam/tablam/wiki/Home
But this is still in prototyping.
I have lots of wishes in that direction. For example, I wish that my programs could ask 'has this table been written to since I last checked' instead of having to read all the price lists into the application cache every n minutes whether there have been any changes or not etc.
I only want READ access to the change logs and audit logs...if databases only did this, it would be so much better.
Though, I get what you mean... the format in question would allow for querying changes against a given table for the last 5. With your solution, you'd have to go through the whole journal to track all the changes for a given record, and present it in a history format, with say diffing.
You can query the oplog yourself easily [0]. I also remember someone doing a version for MySql using the same approach.
[0] http://www.briancarpio.com/2012/04/21/mongodb-the-oplog-expl...
import sqlite3
def delta(*args):
ret = {}
for name, old, new in zip(args[::3], args[1::3], args[2::3]):
if old != new:
ret[name] = old
if ret:
return ret
db = sqlite3.connect(':memory:')
db.create_function('delta1', delta, 3)
db.create_function('delta2', delta, 6)
db.create_function('delta3', delta, 9)
print db.execute('select delta2("b", "20", "20", "c", 3.4, 3.6)').fetchone()
>> (u'{"c": 3.4}',) import sqlite3
import json
def delta(*args):
ret = {}
for name, old, new in zip(args[::3], args[1::3],
args[2::3]):
if old != new:
ret[name] = old
if ret:
return json.dumps(ret)
db = sqlite3.connect(':memory:')
db.create_function('delta1',3, delta)
db.create_function('delta2',6, delta)
db.create_function('delta3',9, delta)
print(db.execute('select delta2("b", 20, 20, "c", 3.4,
3.6)').fetchone())
# ('{"c": 3.4}',)