> This directory contains source code to an experimental "version 4" of SQLite that was being developed between 2012 and 2014.
> All development work on SQLite4 has ended. The experiment has concluded.
> Lessons learned from SQLite4 have been folded into SQLite3 which continues to be actively maintained and developed. This repository exists as an historical record. There are no plans at this time to resume development of SQLite4.
I'd be interested to hear why they stopped. Presumably reimplementing SQL on a KV store was seen as not worth it, when applications that are satisfied with an embedded KV store backend (which is much faster and simpler to write!) already have many options.
I think part of this is because of a fundamental limitation of sqlite that it's an embedded database that has to persist data on disk at all times: The design of LSM trees works well with databases with a resident in-memory component because it's an approximation of just dumping every new thing you see at the end of an unordered in-memory array. This is as opposed to a data structure like a b-tree where you have to /find/ exactly where to put the data first, and then put it there. This finding bit means you're doing a lot of random access in memory, which is thrashing all of your caches (CPU / disk etc). LSM trees avoid this thrashing by just dumping stuff at the end of an array. However this means you have to scan that array to do lookups (as opposed to something easier like binary search). Then as your array gets big, you merge and flush it down to a lower "layer" of the lsm tree which is slightly bigger and sorted. And when that one fills, you flush further. And these merge-flushes are nice big sequential writes so that's nice too.
Anyway, with SQLite, the highest layer of your LSM tree would probably (this is conjecture) have to be on disk because of the way that there is no server component, versus in an in-memory system it'd probably be in your L2/L3 cache or at least your main memory. So this could be one reason why that model didn't work out as well for them.
Regarding the LSM engine, you can find all the relevant implementation details here: https://sqlite.org/src4/doc/trunk/www/lsm.wiki#summary
> The in-memory tree is an append-only red-black tree structure used to stage user data that has not yet flushed into the database file by the system. Under normal circumstances, the in-memory tree is not allowed to grow very large.
I also hoped for big win on the insertion-heavy loads, and I also haven't succeed in that. The problem is that every insert statement must read back something from DB to verify DB state against schema for correctness. As reads in LSM are slower, the net win is either absent or negligible. I have to say I wrote "must" in sentence above because you sometimes can get away without reading back, but not always. In the end, worst case scenario is always "read and write", not just "write".
But!
I devised a scheme to lay out layers' data so that they are as contiguous as they can be. Or get a very good approximation to that contiguousness, basically (O(1) "pages" per level). Thus contiguous reads got very high performance and beat old storage on read scheme, despite the need of level merging, etc.
https://www.sqlite.org/inmemorydb.html
Perhaps the key (ha!, pun) is that you're talking about using RAM _and_ disk with the RAM being for caching/fast access that eventually hits the disk. Whereas, I think, in this case sqlite is either on the disk, or in RAM. There is no multiple tiers.
Correct me if I'm off here. Thanks.
Which, while a totally alien concept in the modern software world, is actually a pretty cool thought.
(I'm sure under the hood bugs are getting fixed and all)
[0] https://www.cockroachlabs.com/blog/sql-in-cockroachdb-mappin...
Presto is a distributed SQL query engine for big data, so basically the complete opposite of SQLite, though it often gets used in federation scenarios, as does SQLite.
An interesting anecdote is that the team working on what would become osquery (https://osquery.io/) asked if they could reuse the SQL parser from Presto. We get that question a lot, and after explaining that the parser is the easy part (semantic analysis and execution is the real work), I determined that what they really wanted were SQLite virtual tables: https://sqlite.org/vtab.html (and those worked out great for them)
I created a logsql.py plugin for skybot, it just logs to the DB instead of to text files. Is that something you'd be interested in merging back in?
To internalize it better I invented a "project" for myself - http://thredis.org/ which was (and is, but I'm not maintaining it) a Redis/SQLite hybrid. It was fun to hack on.
Another invaluable source of DB internals information is PostgreSQL. Both projects have amazingly well written and detailed comments.
The number of oddball, often critical, places where I've found SQLite being used would defy belief. As far as I can tell, the "expected" place for SQLite to work seems to be almost anything that's not your normal dB driving some web-based CRUD app...all kinds of embedded systems, easy to manipulate in-memory scratch pads for bioinformatics, lots of data analysis tools in mobile communications.
It's so good, and so obvious, that I think sometimes it makes other tools that might be simpler fits for many use-cases less likely to be used, like leveldb.
That can totally be handled with SQLite.
https://www.hwaci.com/sw/sqlite/prosupport.html
> Paid support options and products are provided by Hipp, Wyrick & Company, Inc., (Hwaci), a Georgia corporation with headquarters in Charlotte, North Carolina and has been in business since 1992. Hwaci has an international team of employees and associates representing the best available talent. We are a 100% engineering company. There is no sales staff. Our goal is to provide outstanding service and honest advice without spin or sales-talk.
> Hwaci is a small company but it is also closely held and debt-free and has low fixed costs, which means that it is largely immune to buy-outs, take-overs, and market down-turns. Hwaci intends to continue operating in its current form, and at roughly its current size until at least the year 2050. We expect to be here when you need us, even if that need is many years in the future.
It is not as shiny, but in the long run, you still get all the goodness. Nevermind the name / version number.
Example: industrial equipment, military stuff, bridges, aircraft, etc.
[1] For pretty much complete nonsense NIH and standards-lawyering reasons.
Doesn't look like this is fixed in sqlite4 though...
Other DBMSes often have to do complete table rewrites for certain types of ALTER anyway - so while, yes, it's faintly annoying to have to do that in userspace rather than having the db engine do it for you, it's not nearly as big a disadvantage as it first appears.
Since you were nominally optimizing for migration, a zoom-out perspective may be to note that upgrading SQLite3 versions vs. upgrading major RDBMS versions is trivial/fast, relatively rarely required, also cohabitation of multiple versions works a lot easier, any kind of CI/CD process is going to be orders of magnitude faster and use much less CPU/memory/disk space, which means smaller build artifacts and thus faster transfer/download.
This was a partially right assumption, but only for writes.
If you write something in a DB you check some constraints and those checks are reads.
So most DB writes come with a bunch of reads.
The reads were slower with the LSMs, so the B-Trees performed better in "real world" writes (which come with reads) and LSMs only performed better in "artificial" writes (without reads).
Shameless plug https://github.com/maxpert/lsm-windows (I did port the LSM storage to windows).
https://www.sqlite.org/testing.html
"The TH3 test harness is a set of proprietary tests, written in C that provide 100% branch test coverage (and 100% MC/DC test coverage) to the core SQLite library. The TH3 tests are designed to run on embedded and specialized platforms that would not easily support TCL or other workstation services. TH3 tests use only the published SQLite interfaces. TH3 consists of about 57.3 MB or 782.3 KSLOC of C code implementing 42213 distinct test cases. TH3 tests are heavily parameterized, though, so a full-coverage test runs about 1.7 million different test instances. The cases that provide 100% branch test coverage constitute a subset of the total TH3 test suite. A soak test prior to release does hundreds of millions of tests. Additional information on TH3 is available separately."
https://www.sqlite.org/th3.html
"TH3 License
SQLite itself is in the public domain and can be used for any purpose. But TH3 is proprietary and requires a license.
Even though open-source users do not have direct access to TH3, all users of SQLite benefit from TH3 indirectly since each version of SQLite is validated running TH3 on multiple platforms (Linux, Windows, WinRT, Mac, OpenBSD) prior to release. So anyone using an official release of SQLite can deploy their application with the confidence of knowing that it has been tested using TH3. They simply cannot rerun those tests themselves without purchasing a TH3 license."
<from more info>
Obtaining A License To Use SQLite
Even though SQLite is in the public domain and does not require a license, some users want to obtain a license anyway. Some reasons for obtaining a license include:
Your company desires warranty of title and/or indemnity against claims of copyright infringement.
You are using SQLite in a jurisdiction that does not recognize the public domain.
You are using SQLite in a jurisdiction that does not recognize the right of an author to dedicate their work to the public domain.
You want to hold a tangible legal document as evidence that you have the legal right to use and distribute SQLite.
Your legal department tells you that you have to purchase a license.
If you feel like you really need to purchase a license for SQLite, Hwaci, the company that employs all the developers of SQLite, will sell you one. All proceeds from the sale of SQLite licenses are used to fund continuing improvement and support of SQLite.</from more info>
How is it possible that they can sell licenses to the code that was put into the public domain by other contributors?
A contributor must attach the following declaration[1] to contribute. So now their contributions are in public domain. Now in a place where the law doesn't recognize public domain, doesn't the code belong to the original authors? How can an unaffiliated company license it as if they wrote the code?
[1]: "The author or authors of this code dedicate any and all copyright interest in this code to the public domain. We make this dedication for the benefit of the public at large and to the detriment of our heirs and successors. We intend this dedication to be an overt act of relinquishment in perpetuity of all present and future rights to this code under copyright law."
SQLite is developed and maintained by the SQLite Consortium, who's members include Mozilla, Adobe, Oracle and Adobe. The embedded SQL database is used in a number of well-known applications, such as Adobe's Lightroom, Apple's Mac OS X operating system and Mozilla's Firefox and Thunderbird.
http://www.h-online.com/open/news/item/SQLite-3-7-0-adds-Wri...
And since the link above is fairly old, I suspect since then the list of big companies supporting the project has probably grown.
https://github.com/LMDB/sqlightning
The performance there shows either little to no performance difference, up to substantial speed increases.
http://charlesleifer.com/blog/using-sqlite4-s-lsm-storage-en...
In peewee 3.0a I've also added built-in support for using the lsm1 virtual table if you're interested.
EDIT Seeing as I am getting slammed by downvotes, my comment here was simply pointing out that the headline I saw on HN could be read in multiple ways. As a long time user of SQLite3, I was initially excited when I read the title as I had thought it meant something good coming from the SQLite team. Turns out not to be. That, to me, still entails doubt.
Right there on the web page. Highlighted in green even.
> This repository exists as an historical record. There are no plans at this time to resume development of SQLite4.
(To clarify, this is directed at all the downvoters, not the commentator I'm replying to.)