UUID, serial or identity columns for PostgreSQL auto-generated primary keys? (opens in new tab)

(cybertec-postgresql.com)

204 pointslhenk5y ago178 comments

178 comments

116 comments · 23 top-level

magicpointer5y ago· 17 in thread

About UUID as Primary Key and performance, the following article has some insights and benchmarks as well: https://www.2ndquadrant.com/en/blog/sequential-uuid-generato...

Essentially, they observed sizeable performance improvements by using UUID generators that are tweaked to get more sequentia resultsl. It results in better indexes. The articles compares sequences, random UUIDs and 2 kinds of sequentialish UUID generators.

tehlike5y ago

Mentioned this in a sibling comment:

There's another benefit to UUID - You can generate them anywhere including application side. Doing this on application side would have tremendous batching benefits or inserting objects with relationships at the same time (Vs waiting first insert to return an ID to be used in the FK).

inopinatus5y ago

Caveat programmer: this could be problematic, not in the sense it doesn't work, but in the sense that someone working on backend code may have a preconceived expectation that UUIDs are also effectively a keyspace i.e. they're hard to guess. The validity of that is already challenged by variants defining temporal or logical order, and evaporates completely if you let clients declare their own that you accept at face value. Applications may have potentially guessable/gameable object identifiers sloshing around inside as a consequence, which is modestly ironic given that one benefit many folks expect from adopting UUIDs in the first place is hardening up the attack surface of trivially enumerable sequences.

There are a few mitigations but my favourite is the "casino chips" approach: pregenerate them server side, and allocate to clients on demand, including en masse if need be ("here kid, have a few million UUIDs to get you started"). Verify with whatever simple signature scheme comes with your application server framework, or at small scale just toss them in a crude LRU store.

Or, remember where the UUID came from, and apply their organisational scope to any trust you place upon it. This might work particularly for multi-tenanted SaaS. However it requires that all usage is tenant-bounded end-through-end throughout your application. This may be in conflict with a) your framework, b) your zenlike contemplation of simplicity in data management, or c) programmers in a hurry forgetting to scope their queries properly.

Ultimately, relying on UUIDs as intrinsically unguessable security tokens is probably not a great idea, but it's one that remains thoroughly embedded in the programming zeitgeist. As ever, nothing useful comes without a compromise. Keep your eyes open to the systemic consequences of design choices, and don't leave traps for your fellow developers.

3 more replies

Benjammer5y ago

I think you can probably get the same "batching benefits" if you use a global ID generation service of some sort, with sequential IDs to improve indexing. Using sequential IDs doesn't necessitate using auto-generated sequential IDs.