SQLite performance tuning: concurrent reads, multiple GBs and 100k SELECTs/s (opens in new tab)

(phiresky.github.io)

219 pointsMaksadbek3y ago69 comments

69 comments

45 comments · 12 top-level

MaksadbekOP3y ago· 13 in thread

I was benchmarking SQLite3 with Python on my MBP M1 and with default settings could achieve 2-4000 inserts per sec. Then applied suggested settings:

  pragma journal_mode = WAL;
  pragma synchronous = normal;
  pragma temp_store = memory;
  pragma mmap_size = 30000000000;

As a result the average number inserts per second was 80K

jlokier3y ago

I was going to say I expect much of the gain to come from "pragma synchronous=normal", which tells your MBP M1 to not bother committing the data to SSD robustly before telling the application that the insert is committed. From Sqlite docs:

> WAL mode is always consistent with synchronous=NORMAL, but WAL mode does lose durability. A transaction committed in WAL mode with synchronous=NORMAL might roll back following a power loss or system crash.

But then I realised "pragma fullfsync=off" is the default, which is a MacOS-specific pragma.

So, on MacOS only, WAL with synchronous=FULL (the default for that pragma) has the same durability issue as synchronous=NORMAL, that committed transactions might be rolled back following power loss or system crash, albeit with different probabilities.

deepsun3y ago

Curious what the performance would be with "pragma fullfsync=on", as it is on other platforms.

infogulch3y ago

If you really want to maximize sqlite inserts, prepare a statement with sqlite3_prepare_v3, then in your inner loop call sqlite3_bind_* + sqlite3_step. 100k easily. Unfortunately this interface is not exposed in most language bindings.

Groxx3y ago

A fair number of ORMs will do this for you fwiw, but then you'll spend more CPU in the ORM stack so it is a bit of a tradeoff for micro-benchmarks like this.

bob10293y ago

80K is a good figure in my experience.

If you think about this in latency terms, you are able to insert a row and be done with the entire ceremony in about 12 microseconds. This is serialized throughput too.

I think it is unlikely you would get this kind of performance with a hosted solution across the network. You could cheat with batching & caching, but for a cold, one-record non-query, nothing comes close to the latency of SQLite on a local NVMe device.

MaksadbekOP3y ago

I tried the same test on Hetzner VPS (4VCPU AMD EPYC, 8 GB RAM, 160 GB NVMe SSD), and the result is about 20K inserts per sec. It's not 80K, but still exciting, I would even be happy with 10K. I am sure there is a big room for optimizations, but I am planning to use SQLite3 with Python in a very simple way.

Source code of Python test file: https://gist.github.com/maksadbek/2385b002b439e03dc948b05593...

Scarbutt3y ago

Putting postgres and the app in the same server will shrink the latency and in the end it will win in throughput as your data and QPS increases.

1 more reply

electric_mayhem3y ago

Does SQLite do direct disk I/o or something?

I’m open to being convinced otherwise, but I would expect that inserts/s would vary a bunch by filesystem.

But I do agree that local nvme with any file system is absurdly better than what you’re likely to find in typical cloud envs.

1 more reply

stefanos823y ago

For a project a couple of years ago, I used Go language and what I did was a very simple thing: I took advantage its channel mechanism and created a slice to save (memoize more or less) 100K generated data which I inserted with a BEGIN TRANSACTION / END; in other words, every 100K I would insert them once and repeat.

If my memory serves me well, in approximately 6.8 seconds (give or take), I could insert 10M rows and bear in mind I didn't use WAL at all.

Not bad I would say, not bad at all.

pstuart3y ago

Were those individual inserts or as a transaction? The latter should significantly bump those numbers up. Also consider doing a prepared statement while you're at it.

MaksadbekOP3y ago

I assume it were individial inserts. https://docs.python.org/3/library/sqlite3.html#sqlite3.Curso...

Scarbutt3y ago

What was the insert command?

MaksadbekOP3y ago

It's just :)

    conn.execute("insert into cell_towers values (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)", row)

vaughan3y ago· 11 in thread

I hate the way so much of programming is just fiddling knobs on some old systems. It always gets to a point where I would prefer to rewrite the whole thing myself with only the functionality I need and to truly understand what is going on under the hood. I just wish these projects were built in a modular way so you can re-assemble parts of them back together easily.

WJW3y ago

That's the eternal pendulum of programming:

1. Ugh this thing is too integrated, we should break it up into modular parts so we can re-assemble them using only the parts we need. (Examples: most OS kernels, sqlite apparently, big excel spreadsheets, web frameworks like Rails)

2. Ugh this thing is too modular, you need a bazillion parts just to get a useful system. (Examples: analytics platforms consisting of 20+ unix tools strapped together in a pipeline, most microservices architectures, everything in the node ecosystem)

vaughan3y ago

Spot on.

bomewish3y ago

SQLite is a really remarkable piece of software -- decades of peak human capital have been invested in it. Why wouldn't you want to make the most of that to achieve your objectives?

vaughan3y ago

> decades of peak human capital have been invested in it

Measuring software in human capital isn't a good idea. Software becomes crusty very quickly. Once you get a huge user base re-architecting becomes almost impossible without a re-write.

A lot of velocity can come from completely freeing yourself from an existing solution and it's customers in the short-term.

1 more reply

rasz3y ago

Google didnt, thus Chrome started replacing sqlite with https://github.com/google/leveldb

4 more replies

resonious3y ago

I definitely feel you. What's interesting is that it seems like a lot of the big popular tools everyone uses all started out this way. Someone thought "man, all the existing solutions suck and I don't understand them - let's make a new one that does only what I need." Then it turns out, many other people also needed that. Then people who need slightly different things come in and knobs are added.

vlovich1233y ago

FWIW, the whole reason for SQLite is the observation that getting storage correct was extremely hard. Even with a filesystem and knowing to do fsync+rename, you can have data loss in surprising scenarios. The defaults for SQLite are tuned for durability and even still perform reasonably well. A lot of the tuning is about turning off that durability default.

These knobs by the way are part and parcel for generic storage systems. Including filesystems a bit too and definitely all databases I've ever seen (because different HW will have different optimal settings).

znpy3y ago

> I hate the way so much of programming is just fiddling knobs on some old systems.

actually, i believe it's quite the contrary: most of what we need nowadays it's already been invented and developed.

you can be incredibly successful as an engineer at many companies just by knowing what knobs are there and how to turn them properly.

and i think this is relevant because most people end up reinventing a square wheel that only does 10% of what the most used, widely spread, already-open-source wheel does.

ogurechny3y ago

To be honest, the author is literally fiddling knobs to enable existing features described in documentation and mentioned everywhere on the net. All the “performance tuning” here is done by SQLite developers.

joppy3y ago

SQLite is so modular that someone could write a replacement for the filesystem layer that ended up sending requests across HTTP to query a database on another server [1]. Without touching any other layer of the code. How much more modular do you want a database to be, without making other compromises?

[1]: https://phiresky.github.io/blog/2021/hosting-sqlite-database...

withinboredom3y ago

hmmm, running the dom demos causes a segfault in the browser. That probably shouldn't happen...

zackees3y ago· 3 in thread

    WAL mode has some issues where depending on the write pattern, the WAL size can grow to infinity, slowing down performance a lot. I think this usually happens when you have lots of writes that lock the table so sqlite never gets to doing wal_autocheckpoint.

I believe that WAL2 fixes this:

    Wal2 mode does not have this problem. In wal2 mode, wal files do not grow indefinitely even if the checkpointer never has a chance to finish uninterrupted

https://sqlite.org/cgi/src/doc/wal2/doc/wal2.md

freddw3y ago

d’oh, I kind of speculated that this pattern might be possible to apply to WAL above without having read far enough down to see it was implemented in WAL2. Though I mentioned RCU the hot/cold partitions were something I was also thinking of.

I wonder if this could be further extended to better support concurrent writes. Depending on the implementation, with wal2 readers may be reading from both hot and cold files without blocking. So this may potentially be extendable to read from two hot files, or two hot files and two cold files.

datadeft3y ago

I am not sure if this works on my end:

  sqlite> PRAGMA journal_mode = delete;
  delete
  sqlite> PRAGMA journal_mode = wal2;
  delete

Does this mean wal2 is not available?

polyrand3y ago

wal2 mode is not part of the main development branch of SQLite, you need to compile SQLite yourself from that branch. Same with BEGIN CONCURRENT [0]

[0]: https://www.sqlite.org/src/doc/begin-concurrent/doc/begin_co...

1 more reply

zylepe3y ago· 2 in thread

I spent a while optimizing sqlite inserts for planetiler, this is what I came up with:

https://github.com/onthegomap/planetiler/blob/db0ab02263baaa...

It batches inserts into bulk statements and is able to do writes in the 500k+ per second range, and reads are 300-400k/s using those settings.

alberth3y ago

You’re eliminating the “D” from “ACID” with these settings, and risk data loss on power failure and/or entire database corruption.

I can make a car go really fast if I eliminate the weight of having safety equipment on it like a bumper, seatbelt and airbag.

thadt3y ago

Which, for this use case - bulk generating a read-only data file from a source dataset in a batch job - seems like a pretty good performance tradeoff, no? In the case of a failure of some kind - no big deal, just restart the process anew.

This is one of those use cases where SQLite isn’t replacing <database> - it’s replacing fopen.

2 more replies

datadeft3y ago· 2 in thread

We are in the process of moving from Postgres to SQLite so these suggestions are very useful.

alberth3y ago

Would you mind elaborating on how you came to that decision and why?

(Not suggesting it’s wrong, just interesting to hear from others as to why)

criddell3y ago

I’m thinking about doing this as well. Sometimes SQLite is a better tool for the job. For example, there’s a lot of software that uses a database as their file format. SQLite is ideal for this.

To answer the person’s question, there are easy ways and hard ways with all the trade offs you would expect. The easy way is to use an ODBC connector. That makes it easier to change the DB engine but it’s going to hurt performance. Chances are though that the performance will still be good enough for applications like database-as-file-format.

avinassh3y ago· 1 in thread

I am experimenting with SQLite, where I try inserting 1B rows in under a minute. The current best is inserting 100M rows at 23s. I cut many corners to get performance, but the tweaks might suit your workload.

I have explained my rationale and approach here - https://avi.im/blag/2021/fast-sqlite-inserts/

the repo link - https://github.com/avinassh/fast-sqlite3-inserts

iveqy3y ago

You probably already seen it but here's a very good write up on how to improve insert performance: https://stackoverflow.com/questions/1711631/improve-insert-p...

Scarbutt3y ago· 1 in thread

'Multiple GB' is ambiguous(is it 20GB or 200GB?) and the article doesn't specify the size.

deepsun3y ago

Same with mmap > "Should be much faster".

It's _easier_ to write IO with mmap, but hand-optimized file IO could do even faster. And DBMSes historically cared a lot about optimizing file access.

So I'd expect SQLite to be faster without mmap, as I expect their developers to nurture the file access, instead of relying on OS-provided mmap.

PS: For example, say your code has a byte array, memory-mapped from a file. If the code needs to do a random read from it, it has no way of knowing whether that read would require waiting for a page to be read from disk, or it's already cached. Hand-optimized file IO have an option to maybe do some other things instead of waiting for disk (or during, or before).

freddw3y ago

It might be helpful to link the docs for some of these configs in the section where you mention them. Some thoughts:

Based on https://www.sqlite.org/wal.html it seems the WAL index is mmapped as a workaround to some edge cases not relevant to many application developers. They say it shouldn’t matter, but with the larger page sizes you’re using, using the original implementation approach they describe (volatile shared memory) actually might improve performance slightly - do you know if your WAL index ever exceeds 32KiB? Not sure as to the difficulty of resurrecting that old functionality though.

Also, this case seems like something that could happen from time to time depending on what you’re doing. Did you encounter it? Case:

> When the last connection to a particular database is closing, that connection will acquire an exclusive lock for a short time while it cleans up the WAL and shared-memory files. If a second database tries to open and query the database while the first connection is still in the middle of its cleanup process, the second connection might get an SQLITE_BUSY error.

Both the WAL docs and the article mention blocking checkpointing/a need for reader gaps to ensure the WAL flushes, or a possibility that WAL files will grow indefinitely. I had some speculation that this was an implementation limitation, and it turns out another comment mentions WAL2 may relax this requirement by using two WAL files split between “hot” and “cold”. Curious how the performance might compare with this: https://sqlite.org/cgi/src/doc/wal2/doc/wal2.md

d1l3y ago

The fact that he doesn't understand why his wal file is growing without bounds should be a warning to take his suggestions with a grain of salt. This is yet another lazy benchmark with lazy suggestions by someone who is uselessly retreading the path laid down in the SQLite documentation.

yencabulator3y ago

> Normal is still completely corruption safe in WAL mode, and means only WAL checkpoints have to wait for FSYNC.

I would consider data loss on crash to be "corruption", for sure. And synchronous=normal + journal_mode=WAL can lose data:

https://www.sqlite.org/pragma.html#pragma_synchronous

> A transaction committed in WAL mode with synchronous=NORMAL might roll back following a power loss or system crash.

Dave3of53y ago

Doesn't this put the entire dataset into memory no wonder it's so fast.

withinboredom3y ago

I'd be curious for a similar tuning with Dqlite: https://github.com/canonical/dqlite

j / k navigate · click thread line to collapse

69 comments

45 comments · 12 top-level

MaksadbekOP3y ago· 13 in thread

I was benchmarking SQLite3 with Python on my MBP M1 and with default settings could achieve 2-4000 inserts per sec. Then applied suggested settings:

  pragma journal_mode = WAL;
  pragma synchronous = normal;
  pragma temp_store = memory;
  pragma mmap_size = 30000000000;

As a result the average number inserts per second was 80K

jlokier3y ago

But then I realised "pragma fullfsync=off" is the default, which is a MacOS-specific pragma.

deepsun3y ago

Curious what the performance would be with "pragma fullfsync=on", as it is on other platforms.

infogulch3y ago

Groxx3y ago

A fair number of ORMs will do this for you fwiw, but then you'll spend more CPU in the ORM stack so it is a bit of a tradeoff for micro-benchmarks like this.

bob10293y ago

80K is a good figure in my experience.

If you think about this in latency terms, you are able to insert a row and be done with the entire ceremony in about 12 microseconds. This is serialized throughput too.

MaksadbekOP3y ago

Source code of Python test file: https://gist.github.com/maksadbek/2385b002b439e03dc948b05593...

Scarbutt3y ago

Putting postgres and the app in the same server will shrink the latency and in the end it will win in throughput as your data and QPS increases.

1 more reply

electric_mayhem3y ago

Does SQLite do direct disk I/o or something?

I’m open to being convinced otherwise, but I would expect that inserts/s would vary a bunch by filesystem.

But I do agree that local nvme with any file system is absurdly better than what you’re likely to find in typical cloud envs.

1 more reply

stefanos823y ago

If my memory serves me well, in approximately 6.8 seconds (give or take), I could insert 10M rows and bear in mind I didn't use WAL at all.

Not bad I would say, not bad at all.

pstuart3y ago

Were those individual inserts or as a transaction? The latter should significantly bump those numbers up. Also consider doing a prepared statement while you're at it.

MaksadbekOP3y ago

I assume it were individial inserts. https://docs.python.org/3/library/sqlite3.html#sqlite3.Curso...

Scarbutt3y ago

What was the insert command?

MaksadbekOP3y ago

It's just :)

    conn.execute("insert into cell_towers values (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)", row)

vaughan3y ago· 11 in thread

WJW3y ago

That's the eternal pendulum of programming:

vaughan3y ago

Spot on.

bomewish3y ago

SQLite is a really remarkable piece of software -- decades of peak human capital have been invested in it. Why wouldn't you want to make the most of that to achieve your objectives?

vaughan3y ago

> decades of peak human capital have been invested in it

Measuring software in human capital isn't a good idea. Software becomes crusty very quickly. Once you get a huge user base re-architecting becomes almost impossible without a re-write.

A lot of velocity can come from completely freeing yourself from an existing solution and it's customers in the short-term.

1 more reply

rasz3y ago

Google didnt, thus Chrome started replacing sqlite with https://github.com/google/leveldb

4 more replies

resonious3y ago

vlovich1233y ago

znpy3y ago

> I hate the way so much of programming is just fiddling knobs on some old systems.

actually, i believe it's quite the contrary: most of what we need nowadays it's already been invented and developed.

you can be incredibly successful as an engineer at many companies just by knowing what knobs are there and how to turn them properly.

and i think this is relevant because most people end up reinventing a square wheel that only does 10% of what the most used, widely spread, already-open-source wheel does.

ogurechny3y ago

joppy3y ago

[1]: https://phiresky.github.io/blog/2021/hosting-sqlite-database...

withinboredom3y ago

hmmm, running the dom demos causes a segfault in the browser. That probably shouldn't happen...

zackees3y ago· 3 in thread

    WAL mode has some issues where depending on the write pattern, the WAL size can grow to infinity, slowing down performance a lot. I think this usually happens when you have lots of writes that lock the table so sqlite never gets to doing wal_autocheckpoint.

I believe that WAL2 fixes this:

    Wal2 mode does not have this problem. In wal2 mode, wal files do not grow indefinitely even if the checkpointer never has a chance to finish uninterrupted

https://sqlite.org/cgi/src/doc/wal2/doc/wal2.md

freddw3y ago

datadeft3y ago

I am not sure if this works on my end:

  sqlite> PRAGMA journal_mode = delete;
  delete
  sqlite> PRAGMA journal_mode = wal2;
  delete

Does this mean wal2 is not available?

polyrand3y ago

wal2 mode is not part of the main development branch of SQLite, you need to compile SQLite yourself from that branch. Same with BEGIN CONCURRENT [0]

[0]: https://www.sqlite.org/src/doc/begin-concurrent/doc/begin_co...

1 more reply

zylepe3y ago· 2 in thread

I spent a while optimizing sqlite inserts for planetiler, this is what I came up with:

https://github.com/onthegomap/planetiler/blob/db0ab02263baaa...

It batches inserts into bulk statements and is able to do writes in the 500k+ per second range, and reads are 300-400k/s using those settings.

alberth3y ago

You’re eliminating the “D” from “ACID” with these settings, and risk data loss on power failure and/or entire database corruption.

I can make a car go really fast if I eliminate the weight of having safety equipment on it like a bumper, seatbelt and airbag.

thadt3y ago

This is one of those use cases where SQLite isn’t replacing <database> - it’s replacing fopen.

2 more replies

datadeft3y ago· 2 in thread

We are in the process of moving from Postgres to SQLite so these suggestions are very useful.

alberth3y ago

Would you mind elaborating on how you came to that decision and why?

(Not suggesting it’s wrong, just interesting to hear from others as to why)

criddell3y ago

I’m thinking about doing this as well. Sometimes SQLite is a better tool for the job. For example, there’s a lot of software that uses a database as their file format. SQLite is ideal for this.

avinassh3y ago· 1 in thread

I have explained my rationale and approach here - https://avi.im/blag/2021/fast-sqlite-inserts/

the repo link - https://github.com/avinassh/fast-sqlite3-inserts

iveqy3y ago

You probably already seen it but here's a very good write up on how to improve insert performance: https://stackoverflow.com/questions/1711631/improve-insert-p...

Scarbutt3y ago· 1 in thread

'Multiple GB' is ambiguous(is it 20GB or 200GB?) and the article doesn't specify the size.

deepsun3y ago

Same with mmap > "Should be much faster".

It's _easier_ to write IO with mmap, but hand-optimized file IO could do even faster. And DBMSes historically cared a lot about optimizing file access.

So I'd expect SQLite to be faster without mmap, as I expect their developers to nurture the file access, instead of relying on OS-provided mmap.

freddw3y ago

It might be helpful to link the docs for some of these configs in the section where you mention them. Some thoughts:

Also, this case seems like something that could happen from time to time depending on what you’re doing. Did you encounter it? Case:

d1l3y ago

yencabulator3y ago

> Normal is still completely corruption safe in WAL mode, and means only WAL checkpoints have to wait for FSYNC.

I would consider data loss on crash to be "corruption", for sure. And synchronous=normal + journal_mode=WAL can lose data:

https://www.sqlite.org/pragma.html#pragma_synchronous

> A transaction committed in WAL mode with synchronous=NORMAL might roll back following a power loss or system crash.

Dave3of53y ago

Doesn't this put the entire dataset into memory no wonder it's so fast.

withinboredom3y ago

I'd be curious for a similar tuning with Dqlite: https://github.com/canonical/dqlite

j / k navigate · click thread line to collapse