For a single database, bigints are smaller and faster, with less footguns.
UUIDs can be nice for an opaque public ID, however I'd still prefer something like a Sqid for space and usability.
But be careful!! Javascript WILL interpret your bigints as Number() and round them down because they are too big without telling you!!!
Famously seen by every snowflake user that has interacted with Javascript, quite an annoying problem.
Then it will always be a string and you will be free to change the format/type of the key in the future to UUID or whatever you like.
A similar horror story from PHP, which I discovered by diagnosing a test failure. (Or maybe it was in production? Long ago, can't remember.)
I think the code in question was for some kind of web auth, comparing random 32-character hexadecimal strings. PHP has a "feature" where its == operator falls back to trying certain strings as numbers... and that includes a version with scientific notation. (12000 == "12000" == "12e3")
Such a collision through bad comparison may seem unlikely, but there are two islands of higher odds: 0*10^X is zero for any X, and X*10^0 is one for any X. Finally, leading zeros can be included. ("0e1234" == "00000e1" and "1234e0" == "9e0000")
The fix was simply going to stricter ===, but it definitely reinforced my dislike of "loose" languages.
Node.js drivers will correctly read int64 as string or bigint, not number.
E.g. pg for PostgreSQL
Maybe there’s a buggy driver but I don’t know it.
const json = '{ "a": 9007199254740993 }'
JSON.parse(json, (_key, value, context) => /^\d+$/.test(context.source) ? BigInt(context.source) : value)All sequences use step 16.
Type A has discriminant/offset 0, yielding IDs {0, 16, 32, 48, 64, …}.
Type B has discriminant/offset 1, mapping to IDs {1, 17, 33, 49, 65, …}.
All the way up to Type P with discriminant/offset 15 and IDs {15, 31, 47, 63, 79, …}.
This is also trivially invertible so that you can determine the type from the ID.
A more common approach is to make IDs opaque strings and put a type prefix—A0, B12, P34, that kind of thing. But this way you can keep it as a number, if you wish.
I've developed a field survey app for foresters. They use it on toughbooks, tablets and phones. They are collecting spatial data, so the geometry column in the tables gets quite big. The app on the device uses a SQLite (Spatialite) database, the central database is Postgres (PostGIS). They will often edit the same area, so without UUIDs, there will be duplication of primary keys, thus making the database inconsistent. Then I will be flooded in support tickets and it will cause more slowdown than just using UUID4. And the performance drop of UUID7 is negligible compared to bigint for primary key.
IME, most of the arguments for why UUIDs make things better are due to developer ignorance of RDBMS features (or B+tree performance).
Inappropriate aliasing of integer keys allows for silent errors in queries because it will actually return some result a lot of the time. A UUID is immune to this problem. The model recognizes its mistake a lot more reliably when previously non-empty tables start showing up empty after attempting a join.
I disagree. I tried this once. Now you need a client access layer to touch the DB in any context. All your console tools no longer work well or at all. If they show up in URLs you need to deoptimize them for transport.
You give up a lot of convenience for this optimization. You should be absolutely sure your design requires it before using it.
Why would you store it as as str column and not the inbuilt type for this?
https://www.postgresql.org/docs/current/datatype-uuid.html
If you are using SQLite well I guess that doesn't work.
I read this post more as an illustration of the *value* of UUIDv7 as primary key, over integer primary keys, in lieu of minimal loss of read/write performance, and marginally more data on disk bloat.
SQLite's automatic integer rowID primary key is a no-brainer, when the SQLite application is local-only, such as application storage format (mobile and desktop). Or is never intended to grow beyond a single server instance. Basically, where each SQLite file is private to a singular instance of the application.
However, if there is even an outside chance of needing to cooperate across application instances, e.g. the minimal limit case of a personal knowledge base that should seamlessly sync across a person's devices, as well as a hosted service, then a high-quality sequential random ID starts to make a lot more sense. (No-brainer arbitrary table merges / splits / remerges, de-duplication, etc.)
Random ID primary key is a bad idea period, whether it be the UU kind or the SQ kind, or any other kind. As far as my DB knowledge goes, this class of ID destroys all tree-algorithms, and we are stuck with the fact that there is no practically better way, than an appropriate tree-structure, to group and organise a meaningful amount of data, efficiently and effectively.
cf. https://sqlite.org/withoutrowid.html
> The WITHOUT ROWID syntax is an optimization. It provides no new capabilities. Anything that can be done using a WITHOUT ROWID table can also be done in exactly the same way, and exactly the same syntax, using an ordinary rowid table. The only advantage of a WITHOUT ROWID table is that it can sometimes use less disk space and/or perform a little faster than an ordinary rowid table.
As of now, I am doing the following in my (Bitemporal data system) experiment (When will it see the light of day? Nobody knows.).
All data are globally uniquely identified by a UUIDv7. However all tables have `rowid` integer primary key asc (which is just an alias for SQLite's autoincrement int id). The `rowid` is the basis for joins, and is the foreign key reference. This lets me offload some useful disambiguation work to the DB as well as have it enforce global (across data systems) record uniqueness guarantees, while retaining local (within process) query efficiency by retaining the ability to use integer rowids.
While the idealised insert performance in your bench is indeed mind-boggling, the DB Schema isn't doing anything CPU-intensive during inserts (checks, constraints, triggers etc.). My schema / query pattern yields comparatively meagre throughput, but I am happy with the ballpark it has landed in, given all the work I'm making SQLite do for me on each `assert!` and `redact!`.
cf. my dirty-but-useful-enough bench, with production-like record content:
A poor man's napkin-mathy, append-only SQLite write/read benchmark
https://gist.github.com/adityaathalye/3c8195dc70626b33c23867...
Summary:
;; Okay, I think I can live with this...
;; - "facts" table: 12M+ records
;; - single process writes to it
;; - ~ 400 transactions/second
;; - append-only table, enforced via SQLite "before" triggers
;; - "now" table:
;; - updates on every assert/redact on "facts" table, via triggers
;; - currently at "limit case": for each read it is empty, or very small, because writes do back-to-back assert/redact of the same fact
;; - gets reads from two reader threads (evenly split)
;; - ~41,000 reads/second
;; - all reads are concurrent with writes (poor man's futures)Thanks to its oh so convenient automatic integer rowIDs, I believe one can amortise some of the other overheads of UUIDv7s for "in-between" queries, viz. indices, joins, ctes, virtual tables etc., with appropriate schema / query design.
I get saving 8 bytes per row seems attractive, but the tradeoff is not explained.
The tradeoff is what the benchmark is hitting. Once the table is physically ordered by the key, a random v4 scatters every insert across the tree and you pay for the page splits. A plain rowid table keeps that churn in the secondary index, which is just the key plus a rowid, while the table itself stays append-ordered. So it only really pays off when the key is something you look up directly and is roughly sequential, which is why v7 comes back near baseline.
Also INSERT speed instead of SELECT? Typically most time is spend in SELECT or UPDATE.
UUID v7 so far seems like the best solution if you want UUID benefits and ordering.
The Integer id is used for joins and looks ups and such but that's it. If I need to send anything to the frontend or outside of the app/DB then that's the UUID.
The Guid is purely for an external system to grab onto something that I can tie back to an actual row in the database but the external system does not need to know anything about the backend other than <guid>.
How much trouble does SQLite reysing rowid's actually cause?
Regular rowids are definitely the way to go if you can use them.
Contention and coordination are real killers, concurrent writes (that require coordination like postgres) often underdeliver.
So, UUIDv4 as a PK on a clustered index can be perfectly feasible for cases where you want to avoid disclosing stuff and row insertion performance isn’t that important.
But a Url62 as a url safe public id from the pk is simple and straightforward to use and comes with few risks of leak issues. Wish postgres had native base62 encoding for url62 now that it has uuidv7 native.