it really isn't if it's written in python
This... Loses that.
Is it semantically perfect? Probably not. Will Mongita be useful to Python developers who are accustomed to saving their data in idiosyncratic JSON files because they don't want the overhead of a MongoDB server? Maybe! I hope so.
I would never pick MongoDB over a traditional relational database, so I can't judge this well, but I can still see how people would find it useful. Nice work!
Seems like '.. to (My|Postgre)SQL' would be a better fit?
Mongita is only useful for Python developers.
I could be wrong on that though.
Your project looks very high quality -- benchmarks, tests, and comparisons are basically an indicator in my mind. Looking through the code you've also already left me (or someone else) space to try doing the SQLite as long as we implement it as an Engine[0] -- am I understanding that right? If I trace the code from database.py to engines/.py it looks like that. I really like the balance you've picked between pragmatism and space for expansion/modification.
A couple questions:
- How do you feel about type annotations
in* the code (as opposed to just the comments, as far as I can see)- PyPy? I wonder if you'd get a ~free speedup
[0]: https://github.com/scottrogowski/mongita/blob/master/mongita...
Unfortunately every time I see the misspelling in this thread I involuntarily cringe. I suppose "MongoDB" is named after a slur used to insult people with Down syndrome, so maybe calling this project the Spanish equivalent of "Magolia", "Mamalian", or "Meercat" is a clever reversal of the insult into a form of self-deprecation on the part of the author, who is wittily feigning illiteracy? Or perhaps it is intended to ridicule the speling of Spainards and other speekers of Spansh? Or programmers who decided to yoke their applications to fake open source?
Even if correctly spelled, perhaps the name would be more appropriate to a debugging tool than to a hash table implementation.
While your guess about the thought processes of the originator may well be correct, it is still the case that the result, "Mongita", is ① unambiguously Spanish and ② unambiguously pronounced in Spanish as [monxita], which is a real Spanish word, the diminutive of the common word monja, meaning "nun". But [monxita] is spelled "monjita".
The result is that what may well have been an incorrect application of the diminutive suffix (the correct result would be "Monguito") produced a misspelling of "monjita". It's just as clearly misspelled Spanish as "Ke keres aser?" or "yerba maté", if not more so. So you can expect most Spanish speakers to read it as ridiculing the literacy of an unspecified person—more so if they also know English, given that "mongo" has been an English word used for ridiculing someone's intelligence for many generations.
Whether you believe this claim or not, it's entirely down to whether you want to take a charitable or negative guess at the original intention.
I think it would be very difficult to completely beat SQLite with a Python library. My goal wasn't to beat it but to have performance that's within an order of magnitude which I think I've achieved.
In my opinion, the MongoDB interface has a lot of advantages over SQL that make sense in a lot of use cases. Certainly, there are times when a traditional relational database is the right choice. But I do think Mongita fills a niche.
Separately, the JSON1 extension, which the top-level comment refers to, is nice technically but has a challenging interface IMHO https://www.sqlite.org/json1.html
SQLite is a production-grade DBMS: though it has fewer features compared to e.g. PostgreSQL, if those feature suffice, you can throw very sizable workloads at it.
The irony, of course, was that it turned out that if you used PostgreSQL and simply stored json in it, you could get better performance without giving up on a relational database... and even more damning was that the translation from MongoDB's query syntax to PostgreSQL's query syntax was trivial, leading to people building adapter layers that were drop-in compatible (and yet still faster and safer than MongoDB). So I guess there was something fitting about seeing someone post a project that is trying to the SQLite of MongoDB, but with benchmarks right off the bar showing it slower than SQLite ;P.
I thereby continue to feel that if you want to use MongoDB, but want a library version of it, it would seem--based on this project's own benchmarks, unless you are doing a read-heavy workload of random documents by index--that what you probably want is a query translator to go from MongoDB's syntax to SQL (working with the SQLite json API), and then store your workload in SQLite... which is (critically) a battle-tested production-grade database engine used as a foundational storage layer for lots of projects.
That said, it isn't like SQLite was a drop-in replacement for other database engines: when I think of "X is to SQLite as MongoDB is to PostgreSQL" I picture something that is attempting to being benefits over SQLite -- in way of performance or scalability -- at the cost of losing the power of being a full relational database (and, because of Mongo's legacy, probably a lot of safety and stability guarantees ;P). (FWIW, I remembered there's being a project like this: UnQLite-- https://news.ycombinator.com/item?id=18101689 --where people ironically seem to have wanted to get it benchmarked against SQLite with its json API ;P.)
It sounds like maybe this project is just trying to provide MongoDB's query layer? You simply don't use SQLite until you "migrate to the full PostgreSQL"... they are designed for different scenarios, and while SQLite is good enough you might be able to use it in a place where PostgreSQL "was called for", a migration might be brutal (as the syntax and type system are different). The project tagline is thereby leading to the wrong mental space--as seemingly multiple other people have since now mentioned on this thread--particularly given that it is written in Python.
if stuff like mongita and sqlite would exist for all kinds of databases (graph,kv,xml; as for document and sql it already exists), couldn't we "just make distributed versions" if we put stuff ontop of it? like with dqlite/rqlite with sqlite?
or does there have to be some inherent mechanisms withIN the database to support distributed versions?
Making it fast - or usable at all in the presence of heavy contention - is another story. Distributing a write-heavy workload over a cluster is useless if the cluster ends up rejecting most updates because they get preempted by some other write. Solving that problem usually means analyzing the underlying system to figure out which parts need to be truly atomic and which you can get away with doing in parallel. That job is a) really complex and b) filled with opportunities to make significant performance gains in exchange for weaker safety guarantees, like losing committed writes in a crash, or allowing individual nodes to reorder independent writes.
You should check out http://jepsen.io/analyses if this stuff interests you.
This is really interesting and is something I came across while writing this. It turns out that concurrency is actually quite difficult because either you have global locks, which means only one process can write to the database/indicies at once and slows things down considerably, or you have to do a lot of clever things to avoid those locks.
Exactly!
> This is really interesting and is something I came across while writing this. It turns out that concurrency is actually quite difficult because either you have global locks, which means only one process can write to the database/indicies at once and slows things down considerably, or you have to do a lot of clever things to avoid those locks.
well, that would be also the case with traditional db services, the question can they have more granular mechanisms for more granular locking than embedded databases. but perhaps they even can have only less granular locking?
The disk engine does store data in memory which is part of the design. You wouldn't want to use a database that doesn't utilize caching.
The last benchmark panel, https://raw.githubusercontent.com/scottrogowski/mongita/mast..., shows cold starts where I test it without cache. So in that, it does hit the disk.
Okay, so looking at the first two tests - "Retrieve all documents" and "Get 1000 documents by ID" ...
If you switch the order around, does it make a difference to the benchmark? Because I suspect that the first test preloads all records into RAM, and the second test simply searches RAM, which is not what we usually do with SQLite. We don't cache all records before searching.
Switch those first two tests around, and lets see if it makes a difference.
https://www.opencypher.org/projects
None of those appear to be a SQLite-like file-based embeddable database system.
How can I trust something that makes a comparison that's not right, or a product that compares itself to mongodb
> Mongita is a lightweight embedded document database that implements a commonly-used subset of the MongoDB/PyMongo interface.
Genuine question!
There is less happening algorithmically than you would think. Where the tricky slow bits do exist, they have largely fallen into the happy-path of fast data structures in the Python language/stdlib. I also use sortedcontainers for indexes which helped quite a bit (http://www.grantjenks.com/docs/sortedcontainers/).
If you're curious, the benchmark code is in the repo: https://github.com/scottrogowski/mongita/blob/master/benchma...
SQLite is a embeddable SQL implementation which has been ported to dozens of platforms with no requirements.
Mongita is a Python library.
I like Python as much as the next guy, but the comparison is pretty far off whack. SQLite is popular because it embeds everywhere easily. This doesn't. I can't use this on my iPhone app. It's likely way too fat for Android and awkward at best on Android.
- From: "Mongita is to MongoDB as SQLite is to SQL"
- To: "Mongita is to MongoDB as SQLite is to MySQL"
When I see "SQL" I think of the textual query language (not a server SQL process/engine).
I dunno.. wouldn't touch it with a pole wearing a hazmat suit, sorry.
Using sqlite for storage and querying would've been better. Heck, that would be pretty great for moving a few smalller (server) applications off of mongodb. Although they're using ruby
Not sure this project understands that