The perils of UUID primary keys in SQLite (opens in new tab)

(andersmurphy.com)

184 pointsemschwartz21d ago108 comments

108 comments

67 comments · 18 top-level

blopker20d ago· 22 in thread

UUIDs are way over used. There is almost always a better key to use, usually a bigint for databases. If you're making some kind of leaderless distributed data store, then maybe, but even then there are other ID sharding strategies I'd go for first depending on the constraints.

For a single database, bigints are smaller and faster, with less footguns.

UUIDs can be nice for an opaque public ID, however I'd still prefer something like a Sqid for space and usability.

Fabricio2020d ago

> bigints are smaller and faster, with less footguns

But be careful!! Javascript WILL interpret your bigints as Number() and round them down because they are too big without telling you!!!

Famously seen by every snowflake user that has interacted with Javascript, quite an annoying problem.

silvestrov20d ago

Good trick is to prefix all such keys with magic, i.e. a couple of letters that identify type type of key.

Then it will always be a string and you will be free to change the format/type of the key in the future to UUID or whatever you like.

1 more reply

Terr_19d ago

> Javascript WILL interpret your bigints as Number()

A similar horror story from PHP, which I discovered by diagnosing a test failure. (Or maybe it was in production? Long ago, can't remember.)

I think the code in question was for some kind of web auth, comparing random 32-character hexadecimal strings. PHP has a "feature" where its == operator falls back to trying certain strings as numbers... and that includes a version with scientific notation. (12000 == "12000" == "12e3")

Such a collision through bad comparison may seem unlikely, but there are two islands of higher odds: 0*10^X is zero for any X, and X*10^0 is one for any X. Finally, leading zeros can be included. ("0e1234" == "00000e1" and "1234e0" == "9e0000")

The fix was simply going to stricter ===, but it definitely reinforced my dislike of "loose" languages.

paulddraper20d ago

Node.js drivers will correctly read int64 as string or bigint, not number.

E.g. pg for PostgreSQL

Maybe there’s a buggy driver but I don’t know it.

1 more reply

Piezoid20d ago

Using a Feistel cipher and base 32 encoding at the boundaries of the system can help catching vibe coded edge code that attempt to decode identifiers in javascript. It also somewhat obfuscate the cardinalities and fill rate of the tables.

sheept20d ago

This can be avoided by supplying a reviver:

    const json = '{ "a": 9007199254740993 }'
    JSON.parse(json, (_key, value, context) => /^\d+$/.test(context.source) ? BigInt(context.source) : value)

1 more reply

spiffytech20d ago

Fortunately we're seeing more JS DB libraries offering to read large numbers as the BigInt type.

1 more reply

JamesSwift20d ago

UUIDs also have a nice benefit of it being impossible to query the wrong table with one if you mixup what an FK goes to

chrismorgan20d ago

You can achieve this with numeric sequences too, by having a consistent step and unique offset in all your sequences. For example, if you will never exceed 16 types, reserve four bits as the type discriminant. (You don’t have to use powers of two, but it may be convenient.)

All sequences use step 16.

Type A has discriminant/offset 0, yielding IDs {0, 16, 32, 48, 64, …}.

Type B has discriminant/offset 1, mapping to IDs {1, 17, 33, 49, 65, …}.

All the way up to Type P with discriminant/offset 15 and IDs {15, 31, 47, 63, 79, …}.

This is also trivially invertible so that you can determine the type from the ID.

A more common approach is to make IDs opaque strings and put a type prefix—A0, B12, P34, that kind of thing. But this way you can keep it as a number, if you wish.

throwawayo2oe20d ago

Alternatively just use a shared sequence for all tables.

sgarland19d ago

Or just write tests, instead of relying on statistical improbability to prevent disaster.

1 more reply

pyuser58320d ago

Yeah this is nice - also helps with grepping dump files.

mamcx20d ago

How is this done?

4 more replies

tannenfreund8714d ago

On the contrary, the need for UUIDs is growing. Once you have multiple users collecting and editing data for a central database, you'll run into primary key conflicts. Of course not, if your users are constantly online and never lose the connection to the database. But modern usecases have a remote database over the internet and distributed users, often with slow or spotty connection.

I've developed a field survey app for foresters. They use it on toughbooks, tablets and phones. They are collecting spatial data, so the geometry column in the tables gets quite big. The app on the device uses a SQLite (Spatialite) database, the central database is Postgres (PostGIS). They will often edit the same area, so without UUIDs, there will be duplication of primary keys, thus making the database inconsistent. Then I will be flooded in support tickets and it will cause more slowdown than just using UUID4. And the performance drop of UUID7 is negligible compared to bigint for primary key.

willtemperley20d ago

UUIDs make client code so much simpler. Just create a UUID, use it client side to create your object graph and commit or not as appropriate. No need to retrieve an incremented integer.

sgarland19d ago

Every DB, even MySQL can return the autoincrementing integer for you as part of the insert. Postgres, SQLite, and MariaDB (likely others, I’m just not familiar) can even return the rest of the data, should you need that.

IME, most of the arguments for why UUIDs make things better are due to developer ignorance of RDBMS features (or B+tree performance).

1 more reply

bob102920d ago

I am finding UUIDs help a lot if your primary schema consumer is an LLM.

Inappropriate aliasing of integer keys allows for silent errors in queries because it will actually return some result a lot of the time. A UUID is immune to this problem. The model recognizes its mistake a lot more reliably when previously non-empty tables start showing up empty after attempting a join.

andersmurphy20d ago

Yes this matters even more if you are doing a lot of joins. Naive string UUIDs are 32 bytes (though I use binary uuid in the post which is 16) compared to 8 bytes for a 64-bit int. This matters even more with sqlite as it uses varint encoding. The upshot of all this is your indexes take up a lot less space in memory.

Fire-Dragon-DoL20d ago

Providing an ID from the client is a big advantage that's missing though. Especially if you want a UI with optimistic rendering that's dealing with something async

PUSH_AX20d ago

What are uuid foot guns?

JamesSwift18d ago

They are generally distributed in such a way that values created at nearly the same time dont cluster together which is less efficient for DBs

crubier20d ago

No one ever got fired for using UUIDs

dumbledorf21d ago· 5 in thread

Wait how is sqlite doing a million inserts a second?

kg20d ago

sqlite is really fast. I'm surprised it's only a million.

andersmurphy20d ago

It's running on an M1 mac with synchronous full. Wouldn't surprise me if it's possible to get higher numbers.

JSR_FDED20d ago

In batches

smitty1e20d ago

':memory:'

https://sqlite.org/inmemorydb.html

KPGv220d ago

Except this source code is not using :memory: The linked source code has

    (defonce db
      (d/init-db! "db/db.db"
        {:pool-size 4 :pragma {:synchronous "FULL"}}))

That's writing to disk.

1 more reply

yepyoukno21d ago· 4 in thread

Perils of “UUIDv4”. Everyone knows that’s what UUIDv7 was really for, and you should always convert that to binary to optimize everything.

themafia20d ago

> and you should always convert that to binary to optimize everything

I disagree. I tried this once. Now you need a client access layer to touch the DB in any context. All your console tools no longer work well or at all. If they show up in URLs you need to deoptimize them for transport.

You give up a lot of convenience for this optimization. You should be absolutely sure your design requires it before using it.

JSR_FDED20d ago

Small nit: uuid7 is 128 bits (16 bytes) by definition. So there’s no need to convert it to binary. It already is. Unless you’re working with a stringified version of the uuid7.

yepyoukno20d ago

Oh yes, I meant don’t store as an ID in its string format!

1 more reply

antihero20d ago

Doesn't Postgres' UUID type just do this for you anyway?

Why would you store it as as str column and not the inbuilt type for this?

https://www.postgresql.org/docs/current/datatype-uuid.html

If you are using SQLite well I guess that doesn't work.

adityaathalye20d ago· 3 in thread

Thanks for the benching, Anders! So grateful for the stuff you've shared over the years. Invariably, every single post has been useful and/or educational to me.

I read this post more as an illustration of the *value* of UUIDv7 as primary key, over integer primary keys, in lieu of minimal loss of read/write performance, and marginally more data on disk bloat.

SQLite's automatic integer rowID primary key is a no-brainer, when the SQLite application is local-only, such as application storage format (mobile and desktop). Or is never intended to grow beyond a single server instance. Basically, where each SQLite file is private to a singular instance of the application.

However, if there is even an outside chance of needing to cooperate across application instances, e.g. the minimal limit case of a personal knowledge base that should seamlessly sync across a person's devices, as well as a hosted service, then a high-quality sequential random ID starts to make a lot more sense. (No-brainer arbitrary table merges / splits / remerges, de-duplication, etc.)

Random ID primary key is a bad idea period, whether it be the UU kind or the SQ kind, or any other kind. As far as my DB knowledge goes, this class of ID destroys all tree-algorithms, and we are stuck with the fact that there is no practically better way, than an appropriate tree-structure, to group and organise a meaningful amount of data, efficiently and effectively.

andersmurphy20d ago

I've updated the article with the correct rowid alias (integer not int) so the rowid version is now 715ms. I've also added an example of rowid and a secondary index UUID4, and that also seems to be bad for performance (as although it's not a clustered index it's still random inserts into a b-tree).

adityaathalye20d ago

Well, I expect to never need WITHOUT ROWID. And even if such an arcane situation hits my system, WITHOUT ROWID has so many ifs and buts that I'll probably elect to eat the $$$ cost of running an un-optimised normie SQLite as far as possible.

cf. https://sqlite.org/withoutrowid.html

> The WITHOUT ROWID syntax is an optimization. It provides no new capabilities. Anything that can be done using a WITHOUT ROWID table can also be done in exactly the same way, and exactly the same syntax, using an ordinary rowid table. The only advantage of a WITHOUT ROWID table is that it can sometimes use less disk space and/or perform a little faster than an ordinary rowid table.

As of now, I am doing the following in my (Bitemporal data system) experiment (When will it see the light of day? Nobody knows.).

All data are globally uniquely identified by a UUIDv7. However all tables have `rowid` integer primary key asc (which is just an alias for SQLite's autoincrement int id). The `rowid` is the basis for joins, and is the foreign key reference. This lets me offload some useful disambiguation work to the DB as well as have it enforce global (across data systems) record uniqueness guarantees, while retaining local (within process) query efficiency by retaining the ability to use integer rowids.

While the idealised insert performance in your bench is indeed mind-boggling, the DB Schema isn't doing anything CPU-intensive during inserts (checks, constraints, triggers etc.). My schema / query pattern yields comparatively meagre throughput, but I am happy with the ballpark it has landed in, given all the work I'm making SQLite do for me on each `assert!` and `redact!`.

cf. my dirty-but-useful-enough bench, with production-like record content:

A poor man's napkin-mathy, append-only SQLite write/read benchmark

https://gist.github.com/adityaathalye/3c8195dc70626b33c23867...

Summary:

  ;; Okay, I think I can live with this...

  ;; - "facts" table: 12M+ records
  ;;     - single process writes to it
  ;;     - ~ 400 transactions/second
  ;;     - append-only table, enforced via SQLite "before" triggers
  ;; - "now" table: 
  ;;     - updates on every assert/redact on "facts" table, via triggers
  ;;     - currently at "limit case": for each read it is empty, or very small, because writes do back-to-back assert/redact of the same fact
  ;;     - gets reads from two reader threads (evenly split)
  ;;     - ~41,000 reads/second
  ;; - all reads are concurrent with writes (poor man's futures)

adityaathalye20d ago

Aside: Specific to SQLite...

Thanks to its oh so convenient automatic integer rowIDs, I believe one can amortise some of the other overheads of UUIDv7s for "in-between" queries, viz. indices, joins, ctes, virtual tables etc., with appropriate schema / query design.

andersmurphy20d ago· 3 in thread

This is actually a draft. I Wanted to add more details about how this changes with row size etc. I might get time to update it later today.

ysleepy20d ago

Maybe you could explain why one would use "without rowid" in the first place.

I get saving 8 bytes per row seems attractive, but the tradeoff is not explained.

andersmurphy20d ago

Update the article there's now a section for UUID4 with rowid. It's less bad than UUID4 without rowid but it's still about 4-6x slower than UUID7 without rowid.

keynha20d ago

The reason to use it is that it skips the double lookup. A normal rowid table with a UUID primary key keeps two B-trees: the table itself keyed by the hidden rowid, and a separate index from your UUID to that rowid. A lookup by UUID walks the index to find the rowid, then walks the table to find the row. WITHOUT ROWID makes the UUID the table's key directly, so the row sits in that leaf and you walk one tree instead of two, and you don't store the UUID a second time.

The tradeoff is what the benchmark is hitting. Once the table is physically ordered by the key, a random v4 scatters every insert across the tree and you pay for the page splits. A plain rowid table keeps that churn in the secondary index, which is just the key plus a rowid, while the table itself stays append-ordered. So it only really pays off when the key is something you look up directly and is roughly sequential, which is why v7 comes back near baseline.

michaelcampbell20d ago· 3 in thread

How much time is `(random-uuid7-bytes)` taking?

andersmurphy20d ago

An insignificant amount for the comparison (why I didn't mention it), it's a fast implementation and the JVM C2 JIT has kicked in by the time the first batch has completed.

u1hcw9nx20d ago

I can't believe I had to scroll down to this far to see someone making this point.

Also INSERT speed instead of SELECT? Typically most time is spend in SELECT or UPDATE.

andersmurphy20d ago

Although not as prominent as insert SELECT and UPDATE both benefit from page cache locality, assuming rows that are stored near each other are often selected/updated together.

jdthedisciple20d ago· 2 in thread

So UUID isn't the problem but UUID v4 is, just like any random ID-scheme, correct?

UUID v7 so far seems like the best solution if you want UUID benefits and ordering.

scotty7920d ago

It's " WITHOUT ROWID" problem.

Why would you force database to order rows on the drive according to random id?

chromatin20d ago

If you had read the article, you'd have seen that UUIDv4 with Rowid was slightly slower than UUIDv7

1 more reply

bambax20d ago· 2 in thread

Why would you use UUIDs a primary keys? Let SQLite use rowids internally (which is automatic and invisible), and have a different (indexed) column with UUID if you need that for publishing the ID somewhere.

elcomet20d ago

UUID as key is useful when you have a distributed system where multiple workers create items independently

victorbjorklund20d ago

Because another app can then create the id and add it to the db later.

ItsBob20d ago· 2 in thread

My rule for primary keys and id's is simple: Sequential integer (or bigint) as the PK and if I need to make it public, I have a GUID (or UUID) in the row too, e.g. tbl_person would have Id (int|bigint) and person_guid as (UUID).

The Integer id is used for joins and looks ups and such but that's it. If I need to send anything to the frontend or outside of the app/DB then that's the UUID.

gvkhna19d ago

I agree technically but in most use cases the timestamp from uuidv7 is not a security leak. Especially where you’re already sharing that data in some way or another. A default guid is unnecessary if you use uuidv7 I think (in most situations).

ItsBob18d ago

It's more for performance that you shouldn't use them as PK's - If you insert a lot you'll get massive fragmentation over time. A sequential Id avoids that and still gives you a unique row.

The Guid is purely for an external system to grab onto something that I can tie back to an actual row in the database but the external system does not need to know anything about the backend other than <guid>.

w10-120d ago· 1 in thread

Isn't the solution just to use the rowid (after doing the read-id-after-insert dance)?

How much trouble does SQLite reysing rowid's actually cause?

andersmurphy20d ago

You don't even need to that. SQLite auto increments the ids and is a single writer (which you should be coordinating at the application level.

Regular rowids are definitely the way to go if you can use them.

kjgkjhfkjf20d ago· 1 in thread

The script to create the benchmark numbers appears to be inserting 100 batches, not 10. (The benchmark numbers in the table appear to be consistent with the text, so I guess the actual script used to create them was correct.)

andersmurphy20d ago

Yeah that was just a holdover from when I was playing with smaller batch sizes. It's not in the actual linked source.

cropcirclbureau20d ago· 1 in thread

Is this relevant for other databases? For postgres for example, which supports concurrent writers, wouldn't sequential keys lead to contention on the page at the frontier?

andersmurphy20d ago

That's a good question. I don't know the answer. I will say, generally you can get higher write throughput with a single writer. Even more so if you're prepared to shard along boundaries where you don't need atomic transactions.

Contention and coordination are real killers, concurrent writes (that require coordination like postgres) often underdeliver.

rajnathani7d ago

Isn’t the case with Postgres too if one doesn’t change the default primary key index for which is B+ Trees? At least Postgres supports a hash index IIRC, but I’m not sure if most developers set it to that when they use UUIDs as primary keys.

sedatk20d ago

UUIDv7 and sequential integers are quite similar. Sequential integers disclose count and neighboring IDs while UUIDv7 discloses timestamp. Either can be a security issue in certain cases.

So, UUIDv4 as a PK on a clustered index can be perfectly feasible for cases where you want to avoid disclosing stuff and row insertion performance isn’t that important.

pyuser58320d ago

Oh gosh the ints v uuids debate for pks. This is worse than vim v eMacs or brackets v braces.

ac50hz20d ago

I enjoy these carefully worded posts from Anders Murphy, illustrative and informative, not opinionated and preachy. Very useful, it’s great to see the process, and ofc bookmarkeable material for sharing with others.

gvkhna19d ago

UUIDv7 as pk is inherently the best option besides just a bigint where you really don’t need a public id.

But a Url62 as a url safe public id from the pk is simple and straightforward to use and comes with few risks of leak issues. Wish postgres had native base62 encoding for url62 now that it has uuidv7 native.

wood_spirit20d ago

If you need (or want the convenience of) a uuid and the time of creation is not secret then use ulids eg uuid v7.

j / k navigate · click thread line to collapse

108 comments

67 comments · 18 top-level

blopker20d ago· 22 in thread

For a single database, bigints are smaller and faster, with less footguns.

UUIDs can be nice for an opaque public ID, however I'd still prefer something like a Sqid for space and usability.

Fabricio2020d ago

> bigints are smaller and faster, with less footguns

But be careful!! Javascript WILL interpret your bigints as Number() and round them down because they are too big without telling you!!!

Famously seen by every snowflake user that has interacted with Javascript, quite an annoying problem.

silvestrov20d ago

Good trick is to prefix all such keys with magic, i.e. a couple of letters that identify type type of key.

Then it will always be a string and you will be free to change the format/type of the key in the future to UUID or whatever you like.

1 more reply

Terr_19d ago

> Javascript WILL interpret your bigints as Number()

A similar horror story from PHP, which I discovered by diagnosing a test failure. (Or maybe it was in production? Long ago, can't remember.)

The fix was simply going to stricter ===, but it definitely reinforced my dislike of "loose" languages.

paulddraper20d ago

Node.js drivers will correctly read int64 as string or bigint, not number.

E.g. pg for PostgreSQL

Maybe there’s a buggy driver but I don’t know it.

1 more reply

Piezoid20d ago

sheept20d ago

This can be avoided by supplying a reviver:

    const json = '{ "a": 9007199254740993 }'
    JSON.parse(json, (_key, value, context) => /^\d+$/.test(context.source) ? BigInt(context.source) : value)

1 more reply

spiffytech20d ago

Fortunately we're seeing more JS DB libraries offering to read large numbers as the BigInt type.

1 more reply

JamesSwift20d ago

UUIDs also have a nice benefit of it being impossible to query the wrong table with one if you mixup what an FK goes to

chrismorgan20d ago

All sequences use step 16.

Type A has discriminant/offset 0, yielding IDs {0, 16, 32, 48, 64, …}.

Type B has discriminant/offset 1, mapping to IDs {1, 17, 33, 49, 65, …}.

All the way up to Type P with discriminant/offset 15 and IDs {15, 31, 47, 63, 79, …}.

This is also trivially invertible so that you can determine the type from the ID.

A more common approach is to make IDs opaque strings and put a type prefix—A0, B12, P34, that kind of thing. But this way you can keep it as a number, if you wish.

throwawayo2oe20d ago

Alternatively just use a shared sequence for all tables.

sgarland19d ago

Or just write tests, instead of relying on statistical improbability to prevent disaster.

1 more reply

pyuser58320d ago

Yeah this is nice - also helps with grepping dump files.

mamcx20d ago

How is this done?

4 more replies

tannenfreund8714d ago

willtemperley20d ago

UUIDs make client code so much simpler. Just create a UUID, use it client side to create your object graph and commit or not as appropriate. No need to retrieve an incremented integer.

sgarland19d ago

IME, most of the arguments for why UUIDs make things better are due to developer ignorance of RDBMS features (or B+tree performance).

1 more reply

bob102920d ago

I am finding UUIDs help a lot if your primary schema consumer is an LLM.

andersmurphy20d ago

Fire-Dragon-DoL20d ago

Providing an ID from the client is a big advantage that's missing though. Especially if you want a UI with optimistic rendering that's dealing with something async

PUSH_AX20d ago

What are uuid foot guns?

JamesSwift18d ago

They are generally distributed in such a way that values created at nearly the same time dont cluster together which is less efficient for DBs

crubier20d ago

No one ever got fired for using UUIDs

dumbledorf21d ago· 5 in thread

Wait how is sqlite doing a million inserts a second?

kg20d ago

sqlite is really fast. I'm surprised it's only a million.

andersmurphy20d ago

It's running on an M1 mac with synchronous full. Wouldn't surprise me if it's possible to get higher numbers.

JSR_FDED20d ago

In batches

smitty1e20d ago

':memory:'

https://sqlite.org/inmemorydb.html

KPGv220d ago

Except this source code is not using :memory: The linked source code has

    (defonce db
      (d/init-db! "db/db.db"
        {:pool-size 4 :pragma {:synchronous "FULL"}}))

That's writing to disk.

1 more reply

yepyoukno21d ago· 4 in thread

Perils of “UUIDv4”. Everyone knows that’s what UUIDv7 was really for, and you should always convert that to binary to optimize everything.

themafia20d ago

> and you should always convert that to binary to optimize everything

You give up a lot of convenience for this optimization. You should be absolutely sure your design requires it before using it.

JSR_FDED20d ago

Small nit: uuid7 is 128 bits (16 bytes) by definition. So there’s no need to convert it to binary. It already is. Unless you’re working with a stringified version of the uuid7.

yepyoukno20d ago

Oh yes, I meant don’t store as an ID in its string format!

1 more reply

antihero20d ago

Doesn't Postgres' UUID type just do this for you anyway?

Why would you store it as as str column and not the inbuilt type for this?

https://www.postgresql.org/docs/current/datatype-uuid.html

If you are using SQLite well I guess that doesn't work.

adityaathalye20d ago· 3 in thread

Thanks for the benching, Anders! So grateful for the stuff you've shared over the years. Invariably, every single post has been useful and/or educational to me.

andersmurphy20d ago

adityaathalye20d ago

cf. https://sqlite.org/withoutrowid.html

As of now, I am doing the following in my (Bitemporal data system) experiment (When will it see the light of day? Nobody knows.).

cf. my dirty-but-useful-enough bench, with production-like record content:

A poor man's napkin-mathy, append-only SQLite write/read benchmark

https://gist.github.com/adityaathalye/3c8195dc70626b33c23867...

Summary:

  ;; Okay, I think I can live with this...

  ;; - "facts" table: 12M+ records
  ;;     - single process writes to it
  ;;     - ~ 400 transactions/second
  ;;     - append-only table, enforced via SQLite "before" triggers
  ;; - "now" table: 
  ;;     - updates on every assert/redact on "facts" table, via triggers
  ;;     - currently at "limit case": for each read it is empty, or very small, because writes do back-to-back assert/redact of the same fact
  ;;     - gets reads from two reader threads (evenly split)
  ;;     - ~41,000 reads/second
  ;; - all reads are concurrent with writes (poor man's futures)

adityaathalye20d ago

Aside: Specific to SQLite...

andersmurphy20d ago· 3 in thread

This is actually a draft. I Wanted to add more details about how this changes with row size etc. I might get time to update it later today.

ysleepy20d ago

Maybe you could explain why one would use "without rowid" in the first place.

I get saving 8 bytes per row seems attractive, but the tradeoff is not explained.

andersmurphy20d ago

Update the article there's now a section for UUID4 with rowid. It's less bad than UUID4 without rowid but it's still about 4-6x slower than UUID7 without rowid.

keynha20d ago

michaelcampbell20d ago· 3 in thread

How much time is `(random-uuid7-bytes)` taking?

andersmurphy20d ago

An insignificant amount for the comparison (why I didn't mention it), it's a fast implementation and the JVM C2 JIT has kicked in by the time the first batch has completed.

u1hcw9nx20d ago

I can't believe I had to scroll down to this far to see someone making this point.

Also INSERT speed instead of SELECT? Typically most time is spend in SELECT or UPDATE.

andersmurphy20d ago

Although not as prominent as insert SELECT and UPDATE both benefit from page cache locality, assuming rows that are stored near each other are often selected/updated together.

jdthedisciple20d ago· 2 in thread

So UUID isn't the problem but UUID v4 is, just like any random ID-scheme, correct?

UUID v7 so far seems like the best solution if you want UUID benefits and ordering.

scotty7920d ago

It's " WITHOUT ROWID" problem.

Why would you force database to order rows on the drive according to random id?

chromatin20d ago

If you had read the article, you'd have seen that UUIDv4 with Rowid was slightly slower than UUIDv7

1 more reply

bambax20d ago· 2 in thread

elcomet20d ago

UUID as key is useful when you have a distributed system where multiple workers create items independently

victorbjorklund20d ago

Because another app can then create the id and add it to the db later.

ItsBob20d ago· 2 in thread

The Integer id is used for joins and looks ups and such but that's it. If I need to send anything to the frontend or outside of the app/DB then that's the UUID.

gvkhna19d ago

ItsBob18d ago

It's more for performance that you shouldn't use them as PK's - If you insert a lot you'll get massive fragmentation over time. A sequential Id avoids that and still gives you a unique row.

w10-120d ago· 1 in thread

Isn't the solution just to use the rowid (after doing the read-id-after-insert dance)?

How much trouble does SQLite reysing rowid's actually cause?

andersmurphy20d ago

You don't even need to that. SQLite auto increments the ids and is a single writer (which you should be coordinating at the application level.

Regular rowids are definitely the way to go if you can use them.

kjgkjhfkjf20d ago· 1 in thread

andersmurphy20d ago

Yeah that was just a holdover from when I was playing with smaller batch sizes. It's not in the actual linked source.

cropcirclbureau20d ago· 1 in thread

Is this relevant for other databases? For postgres for example, which supports concurrent writers, wouldn't sequential keys lead to contention on the page at the frontier?

andersmurphy20d ago

Contention and coordination are real killers, concurrent writes (that require coordination like postgres) often underdeliver.

rajnathani7d ago

sedatk20d ago

UUIDv7 and sequential integers are quite similar. Sequential integers disclose count and neighboring IDs while UUIDv7 discloses timestamp. Either can be a security issue in certain cases.

So, UUIDv4 as a PK on a clustered index can be perfectly feasible for cases where you want to avoid disclosing stuff and row insertion performance isn’t that important.

pyuser58320d ago

Oh gosh the ints v uuids debate for pks. This is worse than vim v eMacs or brackets v braces.

ac50hz20d ago

gvkhna19d ago

UUIDv7 as pk is inherently the best option besides just a bigint where you really don’t need a public id.

wood_spirit20d ago

If you need (or want the convenience of) a uuid and the time of creation is not secret then use ulids eg uuid v7.

j / k navigate · click thread line to collapse