Show HN: Stanchion – Column-oriented tables in SQLite (opens in new tab)

(github.com)

273 pointsdgllghr2y ago60 comments

Hello HN!

I built stanchion to scratch my own itch. I have personal applications running on SQLite where I want to store log and metric data (basically forever) but don't want to add complexity by bringing in a new database or whole separate server. Some of these applications are running on a Raspberry Pi, where storage and compute are limited, but I still want data warehouse like capabilities.

I envision stanchion being used in similar scenarios: on phones and resource-limited devices in applications that are already using SQLite. I know that there are alternatives like DuckDB (which is very cool), but I want stanchion to be "good enough" that it is useful without having to add whole new database technology.

If you think stanchion may be a good fit for your use case and you are interested in contributing, please test it and provide feedback by opening issues for any bugs, difficulties, or missing features you would need! Ideas are also welcome in this thread or as a github issue. Of course stars are always appreciated as well. The CONTRIBUTING doc in the repository has more details.

- Dan

Show HN: Stanchion – Column-oriented tables in SQLite

(github.com)

273 pointsdgllghr2y ago60 comments

Hello HN!

- Dan

60 comments

47 comments · 15 top-level

jitl2y ago· 6 in thread

Note that DuckDB can ingest SQLite tables. If you need OLAP on your SQLite system today instead of whenever this is stable, consider pairing up DuckDB. In the simplest case if you don’t need amazing latency and have small data (<1gb), I would copy the whole table from SQLite into an in-memory or temp file DuckDB, do your OLAP queries, then throw it away. For larger datasets, you can incrementally replicate the SQLite table into DuckDB by adding a logical clock column or something to the SQLite table, and then copy rows where logical_clock>last_duckdb_change before running your next DuckDB query.

We’re currently doing a bake-off between a few databases for small datasets (<10m rows) with dynamic schemas, and have pretty comparable read latencies between DuckDB and SQLite for our workflow, so you might be able to get away with switching wholesale from SQLite to DuckDB if you don’t do a lot of update.

dgllghrOP2y ago

DuckDB is great! But row-oriented storage is also great for many use cases. My goal is to provide a way for people to have access to the benefits of column-oriented storage without leaving behind the benefits of OLTP. Many people are already using sqlite and might not have the time/energy/budget to make a full switch to a new database. I also think there should be many options for embedded columnar storage, and right now there really aren't.

jitl2y ago

Absolutely, it's cool to see a real extension in this space, I didn't mean to diminish your work at all. If it were a few years more mature I'd try it out. I just happen to be messing around with DuckDB today for this exact use-case, but your thing is new, so I can't adopt it.

dgllghrOP2y ago

No diminishment taken :) I hope you find something that works for your needs

chrisjc2y ago

Adding to this.

https://duckdb.org/2024/01/26/multi-database-support-in-duck...

Unsure what the parent means by "ingest" SQLite tables (although i believe you can use COPY with sqlite tables in DuckDB), but you can interact with sqlite tables in DuckDB using the extension to attach to a sqlite db.

https://duckdb.org/docs/extensions/sqlite

jitl2y ago

Yeah by ingest I mean COPY FROM or attach via the SQLite extension. Then you can `INSERT INTO duckdbTable (select * from sqliteDb.sqliteTable)` or similar. I don’t think duckdb aggregate on native SQLite format will have any advantage; I think the native duckdb format would be necessary to see a big advantage over doing the aggregate in SQLite directly based on my reading of the duckdb extension code.

hantusk2y ago

That's not true for postgres scanning: https://duckdb.org/2022/09/30/postgres-scanner.html

So i would think we'd see similar speedups for sqlite even without copying any data

datadrivenangel2y ago· 6 in thread

I'm excited to see how this compares to DuckDB. More local/embedded analytics databases will be good for the industry.

lainga2y ago

Well, if Wikipedia's anything to go by, it seems to be the first in-mem column-oriented DB written in Zig

Iridescent_2y ago

As far as I can tell, the linked github repository is written only in C. Zig is used only as build system

EDIT: nvm, the github stats seem to be completely wrong? It shows 96% C to me...

Found it, I believe this is the result of including code from SQLite in the repo

2 more replies

BiteCode_dev2y ago

DuckDB can read from several heterogenous sources, but is read only.

SQLite is limited to its own format but can write.

So very different use cases.

speedgoose2y ago

Actually DuckDB has its own database format for persistence and can support writes. It’s not obvious from their documentation. People told me that on HN not long ago, as I thought the same than you.

2 more replies

orlp2y ago

DuckDB isn't optimized for writing speed, but it is not read-only.

jitl2y ago

I’m looking at some DuckDB code right here that does `INSERT INTO … ON CONFLICT (id) DO UPDATE SET` so it can definitely do upserts :)

1 more reply

alexgarcia-xyz2y ago· 5 in thread

This is an awesome project, I love SQLite extensions and I think they have a ton of use. Giants props to Dan here, I haven't seen many SQLite extensions written in Zig, and I'm learning a ton reading through the source code.

The column-oriented data is stored in large BLOBs inside of regular SQLite tables. It uses the SQLite incremental BLOB I/O API [0] to incrementally read/write data in a column oriented way.

However, this project (and other SQLite extensions) will eventually hit a limit with SQLite's virtual table API. When you create a virtual table, you can perform a number of optimizations on queries. For examples, SQLite will tell your virtual table implementation the WHERE clauses that appear on the virtual table, any ORDER BYs, which columns are SELECT'ed, and other limited information. This allows extension developers to do things like predicate + projection pushdowns to make queries faster.

Unfortunately, it doesn't offer many ways to make analytical queries faster. For example, no matter what you do, a `SELECT COUNT(*) FROM my_vtab` will always iterate through every single row in your virtual table to determine a count. There's no "shortcut" to provide top-level counts. Same with other aggregate functions like SUM() or AVERAGE(), SQLite will perform full scans and do calculations themselves.

So for this project, while column-oriented datasets could make analytical queries like that much faster, the SQLite API does limit you quite a bunch. I'm sure there are workarounds around this (by custom UDFs or exposing other query systems), but would be hefty to add.

That being said, I still love this project! Really would love to see if there's any size benefit to this, and will definitely contribute more when I get a chance. Great job Dan!

[0] https://www.sqlite.org/c3ref/blob_open.html

dgllghrOP2y ago

Thanks for the kind words, Alex! You have probably written more great sqlite extensions than anyone, so your feedback is very meaningful.

You are right that there are limitations. The sqlite virtual table API is very clearly designed with row-oriented access in mind. However, there are still ways that stanchion makes analytical queries faster: lazily loading segments (which are the BLOBs that contain the actual values) so data that is not accessed is skipped, using encodings like bit packing to reduce the size of data that needs to be traversed on disk, and allowing users to specify a clustered index to ensure records are ordered in a way that allows querying to minimize data access.

One area that I want to explore more deeply is the xFindFunction callback in the virtual table API. It allows the a virtual table to "overload" a function with its own implementation. I believe there are some opportunities to work around some of the limitations you are describing, but I'm not even sure at this point if they can apply to aggregate functions.

This is all theoretical until there are solid benchmarks, which is something that I want to add in the near term. And if you know of any workarounds to the limitations that you think may be useful, I am all ears!

alexgarcia-xyz2y ago

That's great to hear! The clustered index sounds really cool. Especially since SQLite tells you about ORDER BYs in xBestIndex (with the nOrderBy[0]), so it would be super cool to have super-fast ORDER BYs with those.

Very interested to see how xFindFunction works for you. One limitation I've found is that you don't know if a user uses a xFindFunction inside of xBestIndex (ie at query time), unless 1) it's part of a WHERE clause and 2) only two arguments are provided, the first being a column value and the 2nd any literal. I've found this limiting in the past, only having 1 argument to work with in that narrow case. But I'm sure there's clever tricks there!

One trick I've noticed: You can kindof detect a COUNT(*) with the `colUsed` field in xBestIndex. In that case, `colUsed` will be 0 (ie 0 columns are requested), so you can use that as a signal to just iterate over N times instead of accessing the underlying data. Still slow, but you can probably do something like ~1 million/sec, but better than accessing the data that many times!

[0] https://www.sqlite.org/vtab.html#order_by_and_orderbyconsume...

2 more replies

jerrysievert2y ago

> The column-oriented data is stored in large BLOBs inside of regular SQLite tables.

this is similar to how citus and hydra columnar engines for postgres work - it seems to be a fairly successful use-case.

zzo38computer2y ago

This is true, and I had the similar issues with the virtual table API. They did add some things, but they won't do everything. There are some other problems with SQLite as well. However, there are benefits of SQLite, and I wrote some extensions, too.

m1sta_2y ago

Is there any value in just treating rows as columns and vice versa?

breadchris2y ago· 4 in thread

It seems like there are a lot of extensions that are being built for sqlite. I would like to use these extensions, but I am skeptical about their support over time. I like sqlite for how freakin stable it is. How do people feel about sqlite extensions?

mikepurvis2y ago

My limitation with it is that it means I have to recompile sqlite for my use case. That's sometimes easy and obvious to do, but it's a lot more of a pain if it's, say, the sqlite embedded in my language interpreter and I just signed myself up for compiling a custom Python to support my project.

That said, I just googled it and it turns out I'm being a bit dramatic— it's actually not super hard to dynamically link Python to a custom-built sqlite: https://charlesleifer.com/blog/compiling-sqlite-for-use-with...

yellowapple2y ago

SQLite being as stable as it is means that even an unmaintained extension will probably continue to work for a very long time.

breadchris2y ago

that is what i was thinking about, which is encouraging.

justinclift2y ago

I guess it depends on what kind of support time scale you're wanting?

Popular curated extension collections like sqlean (https://github.com/nalgeon/sqlean) seem like they'll have a shelf life of many years.

tbragin2y ago· 2 in thread

Interesting project! Thank you for open sourcing and sharing. Agree that local and embedded analytics are an increasing trend, I see it too.

A couple of questions:

* I’m curious what the difficulties were in the implementation. I suspect it is quite a challenge to implement this support in the current SQLite architecture, and would curious to know which parts were tricky and any design trade-off you were faced with.

* Aside from ease-of-use (install extension, no need for a separate analytical database system), I wonder if there are additional benefits users can anticipate resulting from a single system architecture vs running an embedded OLAP store like DuckDB or clickhouse-local / chdb side-by-side with SQLite? Do you anticipate performance or resource efficiency gains, for instance?

* I am also curious, what the main difficulty with bringing in a separate analytical database is, assuming it natively integrates with SQLite. I may be biased, but I doubt anything can approach the performance of native column-oriented systems, so I'm curious what the tipping point might be for using this extension vs using an embedded OLAP store in practice.

Btw, would love for you or someone in the community to benchmark Stanchion in ClickBench and submit results! (https://github.com/ClickHouse/ClickBench/)

Disclaimer: I work on ClickHouse.

dgllghrOP2y ago

Thanks for your thoughtful questions!

* SQLite has rock solid support for transactionally persisting data to disk, which is a difficult and complex thing that stanchion gets almost for free. But the downside is fitting the architecture of stanchion into sqlite's way of doing things. So that is a long way of saying yes, the biggest difficulty was first learning how sqlite works and the implementing features to work with sqlite.

* It's a good question and certainly an area worth exploring. I don't know the answer, but doing, for instance, a comparison of power consumption and performance on a phone for the same use case between sqlite+stanchion and chdb running side-by-side with sqlite would be very interesting. To spitball some ideas of areas that may benefit stanchion: caching (both code and data, application caching and hardware caching), data sharing (no need for decoding and re-encoding either within the host application or between databases), and unified transactions (this one is a stretch). As you mention, chDB (and also DuckDB) benefit from having an architecture designed for analytics.

* As mentioned elsewhere in these comments, the sqlite virtual table system does have some limitations, so I think you're right when it comes to query performance. However, those limitations are limited to the way sqlite queries data in virtual tables, so I think stanchion can be competitive on data size on disk and potentially on insert performance.

I do plan to run and publish some benchmarks of stanchion against chDB and DuckDB in the near term. So far, I haven't focused on performance with stanchion, but that will be more of a focus going forward. Plus it's good to measure first and use that to track improvements. Stay tuned!

ncruces2y ago

Judging from [1] I think your intuition about performance and particularly storage improvements is correct.

1: https://cldellow.com/2018/06/22/sqlite-parquet-vtable.html

didgetmaster2y ago· 2 in thread

Is this really an important feature for DB users here? I built my own hobby database system that was column oriented (using the metadata tagging system I invented for a file system replacement I built). It does lightning fast analytics and even outperforms SQLite in a bunch of other operations. I posted a benchmark video on my YouTube channel and made a free beta download available; but the response has been lukewarm at best.

https://YouTube.com/@didgetsdemos https://www.didgets.com

lukevp2y ago

It’s not open source (or even source available), and it’s targeted at users who, what, use csv files but can’t use excel very well, but are searching for a new revolutionary file system?

I don’t really see a use case for your product based on the pitch page, just a lot of marketing speak that (to an engineer) seems like an apples and oranges comparison (eg. You’re running a local db, could be in memory even, and benchmarking it against Postgres, which is an incredibly flexible and comprehensive piece of software with multitenancy, transactions, replication, schema management, query planning, and a full on webserver? How is that the same product? Why not benchmark this against Excel? Cuz I’ve already got excel installed and I can open a CSV and grep for a string and count the matches in like 5 seconds. That seems to be what you’re competing against.

If you open source the code and make it a general purpose library (and not windows only) and market it toward developers, you might get some more interest.

didgetmaster2y ago

I never claimed that my project in its current form is a drop-in replacement for Postgres or even SQLite. It still lacks many features that those databases have accumulated over decades of development by large communities.

It is not just an in-memory DB. All its data is persisted on disk and I made sure whenever I did a benchmark on any query; that I was doing an apples-to-apples comparison.

Even though the beta consists of a Windows executable; there is nothing Windows specific about it. It is built to be cross-platform and it has been tested on Linux. With my small team, we just maintain a single platform at the moment for simplicity sake.

We have not yet open-sourced the code as you pointed out; but we have considered it. But if everyone just wants to dismiss it out of hand because it fails to check some box and just focuses on what it can't do, instead of what it does well; then there might not be any point in doing so.

hifikuno2y ago· 1 in thread

This is great. I love using DuckDB to do initial analytics on new datasets we get at work. The issue I run into a lot is that our sources can give us data that has non UTF-8 characters in it which DuckDB won't read so I spend a bit of time just trying to find the few non UTF-8 characters in a multi-GB file.

SQLite does not seem to care about the bogus characters so I would be really interested in using Stanchion and SQLite to see if it speeds up my exploration.

Also, I do not know how I have never known about SQLite extensions before now. I feel like I just found out that Narwhals are actually real.

gregw22y ago

Re non-UTF8 inputs, more could be said (resolve root encoding conversion issues with iconv), but see: https://unix.stackexchange.com/questions/201751/replace-non-...

nattaylor2y ago· 1 in thread

This is particularly interesting to me for Android/iOS. I can't even picture the use case where there'd be enough data on the device for the row-based format to b a bottleneck, but maybe some case that involves many, many aggregations

simcop23872y ago

Maybe not a bottleneck but maybe it could improve battery life there for those tasks by being better performing

speedgoose2y ago· 1 in thread

If you are into alternative storage engines for SQLite, there is also an LSM (Log-Structured Merge-tree) extension in the main repository that is not announced nor documented but seems to work. It’s based on the SQLite 4 project.

https://github.com/sqlite/sqlite/tree/master/ext/lsm1

https://www.charlesleifer.com/blog/lsm-key-value-storage-in-...

conradev2y ago

"SQLite4 was an experimental rewrite of SQLite that was active from 2012 through 2014. All development work on SQLite4 has ended. Lessons learned from SQLite4 have been folded into the main SQLite3 product. SQLite4 was never released. There are no plans to revive it. You should be using SQLite3."

https://sqlite.org/src4/doc/trunk/www/index.wiki

devdiary2y ago· 1 in thread

Can you ELI5, how did you make this possible? I see the readme asking to download a binary and then also mentioning sqlite extension. I have never had exp with sqlite ext, so not quite sure how do they work. Appreciate how does this solution work under the hood.

dgllghrOP2y ago

SQLite has a built in mechanism for loading extensions at runtime. The extensions themselves are just dynamic libraries. The main entry point for the extension is an init function that SQLite calls when the library is loaded. Within that init function, the extension can register a number of different kinds of functionality:

* custom functions, including aggregate functions, window functions, and scalar functions: https://sqlite.org/appfunc.html

* virtual tables: https://sqlite.org/appfunc.html (This is how stanchion and other extensions like FTS and sqlite-vss are implemented)

* table valued functions (also implemented through the virtual table mechanism)

* virtual file systems: https://www.sqlite.org/vfs.html

It's really impressive how extensible SQLite is, and it's the power of that extensibility that makes stanchion possible.

thamer2y ago· 1 in thread

The "Data Storage Internals" section[1] of the README sounds to me like it has its own column-oriented format for these tables, at least that's how I'm reading the part about segments. Is that the case? If so, have you tried using Apache Arrow or Parquet to see how they compare?

[1] https://github.com/dgllghr/stanchion#data-storage-internals

dgllghrOP2y ago

Yes it does. I have found it easier to start simple than try to integrate the complexity of Parquet whole hog. The Parquet format is also designed with the idea of having multiple columns in the same "file". I'm sure there are ways to split the components of a Parquet file across multiple BLOBs and be able to append those BLOBs together to create a valid Parquet file, but that is more complexity and does not lend itself to existing Parquet code. Keeping the blobs separate is valuable because it means not touching data for columns that are not being read in queries.

My understanding is that Arrow has really focused on an in-memory format. It is binary, so it can be written to disk, but they are seemingly just scratching the surface on compression. Compression is a big advantage of columnar storage because really cool compression schemes like bit packing, run-length encoding, dictionary compression, etc. can be used to significantly reduce the size of data on disk.

Arrow interop especially is a great idea, though, regardless of the storage format. And if they are developing a storage-friendly version of arrow, I would certainly consider it.

thecowgoes2y ago· 1 in thread

Can't wait to see compression!

Any timeline for that?

dgllghrOP2y ago

Stanchion already has support for a few kinds of compression that leverage patterns in a sequence of data of the same type (DuckDB calls this "lightweight compression"). Stanchion supports: bit packing for booleans and integers and constant compression (if all the values in a run of data are the same, only store the value once). The next type of lightweight compression that is planned is Run Length Encoding (RLE). And I also want to add support for lossless byte encoding (probably zstd) at some point.

skissane2y ago· 1 in thread

Question: Why do you choose LGPL-3.0? For many, one of the most attractive features of SQLite is its license (or should I say lack thereof).

I realise some people view public domain as legally problematic. I think the best answer for that is public-domain equivalent licenses such as 0BSD [0] or MIT-0 [1] – technically still copyrighted, but effectively not, since they let you do anything under zero conditions. (There are other, possibly more well-known options such as Unlicense or WTFPL or CC0; however, those tend to upset lawyers more than 0BSD and MIT-0 do.)

Of course, it is your work, and you are free to license it however you like. Still, some potential users are going to be put off by the licensing.

[0] https://opensource.org/license/0bsd/

[1] https://opensource.org/license/mit-0/

ncruces2y ago

SQLite extensions are mostly meant to be dynamically linked, so the LGPL is entirely appropriate, IMO.

If anything I'd ask for an exception to allow statically linking unmodified builds (e.g. my use case, WASM, requires static linking all extensions).

Or a dual MPL+LGPL license.

vincent-manis2y ago

This sounds like an excellent project, and I hope it continues to a production release.

I do have one qualm, though. SQLite is written in C, and if I were writing plugins for it, I would seriously consider myself bound to using C for them, regardless of the merits of the language. I can easily imagine myself trying to build an extended SQLite library where one plugin is written in Zig, another in Rust, and perhaps a third in Hare. A fourth plugin might be written in C, but be built using Meson. Yet another plugin written in C is built with Bazel. And here I come, writing my plugin in Chibi Scheme!

Eventually, the toolchain needed to build the library and all its plugins overflows.

I would strongly recommend that people who write plugins for programs or libraries give serious consideration to using the programming language(s) and build system(s) of the parent software.

ngrilly2y ago

Great example of Zig code, and of seamless interaction with C!

j / k navigate · click thread line to collapse

60 comments

47 comments · 15 top-level

jitl2y ago· 6 in thread

dgllghrOP2y ago

jitl2y ago

dgllghrOP2y ago

No diminishment taken :) I hope you find something that works for your needs

chrisjc2y ago

Adding to this.

https://duckdb.org/2024/01/26/multi-database-support-in-duck...

https://duckdb.org/docs/extensions/sqlite

jitl2y ago

hantusk2y ago

That's not true for postgres scanning: https://duckdb.org/2022/09/30/postgres-scanner.html

So i would think we'd see similar speedups for sqlite even without copying any data

datadrivenangel2y ago· 6 in thread

I'm excited to see how this compares to DuckDB. More local/embedded analytics databases will be good for the industry.

lainga2y ago

Well, if Wikipedia's anything to go by, it seems to be the first in-mem column-oriented DB written in Zig

Iridescent_2y ago

As far as I can tell, the linked github repository is written only in C. Zig is used only as build system

EDIT: nvm, the github stats seem to be completely wrong? It shows 96% C to me...

Found it, I believe this is the result of including code from SQLite in the repo

2 more replies

BiteCode_dev2y ago

DuckDB can read from several heterogenous sources, but is read only.

SQLite is limited to its own format but can write.

So very different use cases.

speedgoose2y ago

2 more replies

orlp2y ago

DuckDB isn't optimized for writing speed, but it is not read-only.

jitl2y ago

I’m looking at some DuckDB code right here that does `INSERT INTO … ON CONFLICT (id) DO UPDATE SET` so it can definitely do upserts :)

1 more reply

alexgarcia-xyz2y ago· 5 in thread

The column-oriented data is stored in large BLOBs inside of regular SQLite tables. It uses the SQLite incremental BLOB I/O API [0] to incrementally read/write data in a column oriented way.

That being said, I still love this project! Really would love to see if there's any size benefit to this, and will definitely contribute more when I get a chance. Great job Dan!

[0] https://www.sqlite.org/c3ref/blob_open.html

dgllghrOP2y ago

Thanks for the kind words, Alex! You have probably written more great sqlite extensions than anyone, so your feedback is very meaningful.

alexgarcia-xyz2y ago

[0] https://www.sqlite.org/vtab.html#order_by_and_orderbyconsume...

2 more replies

jerrysievert2y ago

> The column-oriented data is stored in large BLOBs inside of regular SQLite tables.

this is similar to how citus and hydra columnar engines for postgres work - it seems to be a fairly successful use-case.

zzo38computer2y ago

m1sta_2y ago

Is there any value in just treating rows as columns and vice versa?

breadchris2y ago· 4 in thread

mikepurvis2y ago

yellowapple2y ago

SQLite being as stable as it is means that even an unmaintained extension will probably continue to work for a very long time.

breadchris2y ago

that is what i was thinking about, which is encouraging.

justinclift2y ago

I guess it depends on what kind of support time scale you're wanting?

Popular curated extension collections like sqlean (https://github.com/nalgeon/sqlean) seem like they'll have a shelf life of many years.

tbragin2y ago· 2 in thread

Interesting project! Thank you for open sourcing and sharing. Agree that local and embedded analytics are an increasing trend, I see it too.

A couple of questions:

Btw, would love for you or someone in the community to benchmark Stanchion in ClickBench and submit results! (https://github.com/ClickHouse/ClickBench/)

Disclaimer: I work on ClickHouse.

dgllghrOP2y ago

Thanks for your thoughtful questions!

ncruces2y ago

Judging from [1] I think your intuition about performance and particularly storage improvements is correct.

1: https://cldellow.com/2018/06/22/sqlite-parquet-vtable.html

didgetmaster2y ago· 2 in thread

https://YouTube.com/@didgetsdemos https://www.didgets.com

lukevp2y ago

It’s not open source (or even source available), and it’s targeted at users who, what, use csv files but can’t use excel very well, but are searching for a new revolutionary file system?

If you open source the code and make it a general purpose library (and not windows only) and market it toward developers, you might get some more interest.

didgetmaster2y ago

It is not just an in-memory DB. All its data is persisted on disk and I made sure whenever I did a benchmark on any query; that I was doing an apples-to-apples comparison.

hifikuno2y ago· 1 in thread

SQLite does not seem to care about the bogus characters so I would be really interested in using Stanchion and SQLite to see if it speeds up my exploration.

Also, I do not know how I have never known about SQLite extensions before now. I feel like I just found out that Narwhals are actually real.

gregw22y ago

Re non-UTF8 inputs, more could be said (resolve root encoding conversion issues with iconv), but see: https://unix.stackexchange.com/questions/201751/replace-non-...

nattaylor2y ago· 1 in thread

simcop23872y ago

Maybe not a bottleneck but maybe it could improve battery life there for those tasks by being better performing

speedgoose2y ago· 1 in thread

https://github.com/sqlite/sqlite/tree/master/ext/lsm1

https://www.charlesleifer.com/blog/lsm-key-value-storage-in-...

conradev2y ago

https://sqlite.org/src4/doc/trunk/www/index.wiki

devdiary2y ago· 1 in thread

dgllghrOP2y ago

* custom functions, including aggregate functions, window functions, and scalar functions: https://sqlite.org/appfunc.html

* virtual tables: https://sqlite.org/appfunc.html (This is how stanchion and other extensions like FTS and sqlite-vss are implemented)

* table valued functions (also implemented through the virtual table mechanism)

* virtual file systems: https://www.sqlite.org/vfs.html

It's really impressive how extensible SQLite is, and it's the power of that extensibility that makes stanchion possible.

thamer2y ago· 1 in thread

[1] https://github.com/dgllghr/stanchion#data-storage-internals

dgllghrOP2y ago

Arrow interop especially is a great idea, though, regardless of the storage format. And if they are developing a storage-friendly version of arrow, I would certainly consider it.

thecowgoes2y ago· 1 in thread

Can't wait to see compression!

Any timeline for that?

dgllghrOP2y ago

skissane2y ago· 1 in thread

Question: Why do you choose LGPL-3.0? For many, one of the most attractive features of SQLite is its license (or should I say lack thereof).

Of course, it is your work, and you are free to license it however you like. Still, some potential users are going to be put off by the licensing.

[0] https://opensource.org/license/0bsd/

[1] https://opensource.org/license/mit-0/

ncruces2y ago

SQLite extensions are mostly meant to be dynamically linked, so the LGPL is entirely appropriate, IMO.

If anything I'd ask for an exception to allow statically linking unmodified builds (e.g. my use case, WASM, requires static linking all extensions).

Or a dual MPL+LGPL license.

vincent-manis2y ago

This sounds like an excellent project, and I hope it continues to a production release.

Eventually, the toolchain needed to build the library and all its plugins overflows.

I would strongly recommend that people who write plugins for programs or libraries give serious consideration to using the programming language(s) and build system(s) of the parent software.

ngrilly2y ago

Great example of Zig code, and of seamless interaction with C!

j / k navigate · click thread line to collapse