Exploring PostgreSQL 18's new UUIDv7 support (opens in new tab)

(aiven.io)

282 pointss4i8mo ago226 comments

226 comments

115 comments · 22 top-level

crazygringo8mo ago· 34 in thread

> Using UUIDv7 is generally discouraged for security when the primary key is exposed to end users in external-facing applications or APIs. The main issue is that UUIDv7 incorporates a 48-bit Unix timestamp as its most significant part, meaning the identifier itself leaks the record's creation time... Experts recommend using UUIDv7 only for internal keys and exposing a separate, truly random UUIDv4 as an external identifier.

So this basically defeats the entire performance improvement of UUIDv7. Because anything coming from the user will need to look up a UUIDv4, which means every new row needs to create an extra random UUIDv4 which gets inserted into a second B-tree index, which recreates the very performance problem UUIDv7 is supposedly solving.

In other words, you can only use UUIDv7 for rows that never need to be looked up by any data coming from the user. And maybe that exists sometimes for certain data in JOINs... but it seems like it might be more the exception than the rule, and you never know when an internal ID might need to become an external one in the future.

tracker18mo ago

This is only really true if leaking the creation time of the record is itself a security concern.

donjoe8mo ago

To me, the most important question is: how do I scale v7 in an environment of 20+ engineers?

When using v7, I need some sort of audit that checks in every API contract for the usage of v7 and potential information leakage.

Detecting V7 uuids in the API contract would probably require me to enforce a special key name (uuidv7 & uuid for v4) for easier audit.

Engineers will get this wrong more than once - especially in a mixed team of Jr/sr.

Also, the API contracts will look a bit inconsistent: some resources will get addressed by v7, others by v4. On top, by using v4 on certain resources, I'd leak the information that those resources addressed by v4 will contain sensitive information.

By sticking to v4, I'd have the same identifier for all resources across the API. When needed, I can expose the creation timestamp in the response separately. Audit is much simpler since the fields state explicitly what they will contain.

3 more replies

AdieuToLogic8mo ago

>>> Using UUIDv7 is generally discouraged for security when the primary key is exposed to end users in external-facing applications or APIs.

>> So this basically defeats the entire performance improvement of UUIDv7. Because anything coming from the user will need to look up a UUIDv4, which means every new row needs to create an extra random UUIDv4 which gets inserted into a second B-tree index, which recreates the very performance problem UUIDv7 is supposedly solving.

> This is only really true if leaking the creation time of the record is itself a security concern.

No, as "leaking the creation time" is not a concern when API's return resources having properties representing creation/modification timestamps.

Where exposing predictable identifiers creates a security risk, such as exposing UUIDv7 or serial[0] types used as database primary keys, is it enables attackers to be able to synthesize identifiers which match arbitrary resources much quicker than when random identifiers are employed.

0 - https://www.postgresql.org/docs/current/datatype-numeric.htm...

1 more reply

MikeNotThePope8mo ago

Exactly. I wrote about that a few days ago.

Primary keys using UUID v7 are (potentially) an HR violation.

https://mikenotthepope.com/primary-keys-using-uuid-v7-are-po...

3 more replies

kvirani8mo ago

Which I have to assume is rare, right?

4 more replies

oconnor6638mo ago

It's relatively common for it to be a privacy concern. Imagine if I'm making an online payment or something, and one of the IDs involved tells you exactly when I created my bank account. That's a decent proxy for my age.

2 more replies

whalesalad8mo ago

Yeah if you’re relying on unguessable public IDs as your security model you’re not doing security.

dethos8mo ago

Exactly

nitwit0058mo ago

It was a concern in the past, as people used password creation tools that were deterministic based on the current time.

There was previously an article linked here about recovering access to some bitcoin by feeding all possible timestamps in a date range to the password creation tool they used, and trying all of those passwords.

matthew165508mo ago

Using UUIDv4 as primary key has unexpected downsides because data locality matters in surprising places [1].

A UUIDv7 primary key seems to reduce / eliminate those problems.

If there is also an indexed UUIDv4 column for external id, I suspect it would not be used as often as the primary key index so would not cancel out the performance improvements of UUIDv7.

[1] https://www.cybertec-postgresql.com/en/unexpected-downsides-...

AdieuToLogic8mo ago

> Using UUIDv4 as primary key has unexpected downsides because data locality matters in surprising places.

Very true, as detailed by the link you kindly provided. Which is why a technique I have found useful is to have both an internal `id` PK `serial`[0] column (never externalized to other processes) and another column with a unique constraint having a UUIDv4 value, such as `external_id`, explicitly for providing identifiers to out-of-process collaborators.

0 - https://www.postgresql.org/docs/current/datatype-numeric.htm...

crazygringo8mo ago

> I suspect it would not be used as often as the primary key index

That doesn't matter because it's the creation of the index entry that matters, not how often it's used for lookup. The lookup cost is the same anyways.

1 more reply

oconnore8mo ago

If this is a concern, pass your UUIDv7 ID through an ECB block cipher with a 0 IV. 128 bit UUID, 128 bit AES block. Easy, near zero overhead way to scramble and unscramble IDs as they go in/out of your application.

There is no need to put the privacy preserving ID in a database index when you can calculate the mapping on the fly

10000truths8mo ago

This is, strictly speaking, an improvement, but not by much. You can't change the cipher key because your downstream users are already relying on the old-key-scrambled IDs, and you lose all the benefits of scrambling as soon as the key is leaked. You could tag your IDs with a "key version" to change the key for newly generated IDs, but then that "key version" itself constitutes an information leak of sorts.

1 more reply

blackenedgem8mo ago

Then that's just worse and more complicated than storing a 64 bit bigint + 128 UUIDv4. Your salt (AES block) is larger than a bigint. Unless you're talking about a fixed value for the AES (is that a thing) but then that's peppering which is security through obfuscation.

1 more reply

jongjong8mo ago

Great point. Also, having to support multiple IDs is a maintenance headache.

IMO, a major problem solved by UUIDs is the ability to create IDs on the client-side, hence, they are inherently user-facing. A major reason why this is an important use case for UUIDs is because it allows clients to avoid accidental duplication of records when an insertion fails due to network issues. It provides insertion idempotence.

For example, when the user clicks on a button on a form to insert a record into a database, the client can generate the UUID on the client-side, then attach it to a JSON object, then send the object to the server for insertion; in the meantime, if there is a network issue and it's unclear whether or not the record was inserted, the code can automatically retry (or user can manually retry) and there is no risk of duplication of data if you use the same UUID.

This is impossible to do with auto-incrementing IDs because those are generated by the database in a centralized way so the user cannot know the ID head of time and thus, if there is a network failure while submitting a form, the client cannot automatically know whether or not the record was successfully inserted; if they retry, they may create a duplicate record in the database. There is no way to make the operation idempotent without relying on some kind of fixed ID which has a uniqueness constraint on the database side.

macote8mo ago

You don't need to add a UUIDv4 column, you could just encrypt your UUIDv7 with format-preserving encryption (FPE).

whattheheckheck8mo ago

What's the computational complexity of doing that conversion vs the lookup table of uuidv4 for each uuidv7?

2 more replies

tekne8mo ago

Question: why not use UUIDv7 but encrypt the user-facing ID you hand out? Then it's just a quick decrypt-on-lookup, and you have the added bonus of e.g. being able to give different users different IDs

gigatexal8mo ago

In a well normalized setup idk maybe not. Uuidv4 for your external ids and then have a mapping table to correspond that to something you’d use internally. Then you can torch an exposed uuid update the mapping table and generate a new one and none of your pointers and foreign keys need to change internally.

crazygringo8mo ago

The point is, that mapping table incurs the same indexing cost that was trying to be eliminated in the first place. Normalization is irrelevant.

1 more reply

Quekid58mo ago

I wonder if there is a name for such a mapping table in RDBMS-land...?

1 more reply

lukebechtel8mo ago

how risky is exposing creation time really though? I feel like for most applications this is uncritical

Biganon8mo ago

I wouldn't say necessarily "risky", it's more that it forces your hand when you wouldn't want to reveal an entity's creation time. Say you use these IDs for users of your site, and they're used in API queries / URLs etc., then it's trivial to know when a user created their account. Sure, many sites already expose this information, but not all of them do; what if you don't want it exposed? What if you consider that a user's seniority is nobody's business, that it could bias the behavior of other users towards them, etc.?

morshu90018mo ago

It takes consideration. There are plenty of systems like Facebook and Twitter that use IDs somewhat exposing time, but the things they're IDing already have public creation timestamps.

sgarland8mo ago

Who are these "experts?" I'm a DBRE, and also very security conscious, and think this is an absurd what-if for most companies.

If it does matter for your application, then don't expose it - use an opaque id with something like AEAD, and expose that.

sverhagen8mo ago

When you see v7 vs. V4, you'd expect the higher number to be better, hopefully better in all aspects, I wouldn't have expected such a thoughtful consideration to be required before upgrading. UUID-b would've been a better name then ;)

jpalawaga8mo ago

that is pretty common with uuid. for example in many cases you'll still want a plain uuid4 instead of e.g.uuid 5. maybe you want 5. it's usecase dependent.

for a specification such as uuid, there is not much to improve upon--just rearranging the bytes and their meanings.

ownagefool8mo ago

Meh.

You probably shouldn't / don't need to use v7 for your Users table because the age of your User probably has limted to no bearing on the look up patterns. For example, our Steam and Amazon accounts are pretty old, but we likely still use them.

However, your Orders table is significantly more likely to be looked up based on time, so a v7 makes a lot of sense here.

Now I'd argue the security implications are overblown, but in general tems you might also allow someone to look up a user, i.e. you can view my Steam profile, or maybe my Amazon wishlist. You probably don't need to be looking up another Users Order.

Alternativly, if your building an Enterprise Risk Solution, you could take a view that you don't want people knowing how old the risk is, but most solutions would show you some history and would believe that to be pertinent information.

There will be instances of getting it wrong, but it isn't actually _that_ complicated.

saaspirant8mo ago

I am using it in a table where sorting by id (primary key) should also sort it by created time (newer records should have "bigger" id).

The id would be exposed to users. An integer would expose the number of records in it.

Am I using right guys?

djantje8mo ago

DB multi-master, or the DB not being responsible for primary key generation, is the use case, I think.

And then having uuidv7 as primary and foreign keys, can give you a performance gain.

Illniyar8mo ago

If leaking creation time is a concern, can we not just fake the timestamp? We can do so in a way that most performance benefits remain - so like starting with a base time of 1970 and then adding base time to it intermittently, having random months and days to new records (or maybe based on the user's id - so the user's record are temporally consistent but they aren't with other user records).

I'm sure there might be a middle ground where most of the performance gains remain but the deanonymizing risk is greatly reduced.

Edit: encrypting the value in transit seems a simpler solution really

hu38mo ago

In that case, auto increments can also be bumped from time to time. And start from a billion.

They're more performant than uuidv7. Why would I still use UIID? Perhaps I would still want uuids because they can be generated in client and because they make incorrect JOINs return no rows.

tonyhart78mo ago

Yeah, just use uuidv4 and another "ULID" if thats the case

which is pointless

morshu90018mo ago· 18 in thread

The article compares UUIDv7 vs v4, but doesn't say why you'd do either instead of just serial/bigserial, which has always been my goto. Did I miss something?

molf8mo ago

Good question. There's a few reasons to pick UUID over serial keys:

- Serial keys leak information about the total number of records and the rate at which records are added. Users/attackers may be able to guess how many records you have in your system (counting the number of users/customers/invoices/etc). This is a subtle issue that needs consideration on a case by case basis. It can be harmless or disastrous depending on your application.

- Serial keys are required to be created by the database. UUIDs can be created anywhere (including your backend or frontend application), which can sometimes simplify logic.

- Because UUIDs can be generated anywhere, sharding is easier.

The obvious downside to UUIDs is that they are slightly slower than serial keys. UUIDv7 improves insert performance at the cost of leaking creation time.

I've found that the data leaked by serial keys is problematic often enough; whereas UUIDs (v4) are almost always fast enough. And migrating a table to UUIDv7 is relatively straightforward if needed.

MBCook8mo ago

Not only can you make a good guess at how many customers/etc exist, you can guess individual ones.

World’s easiest hack. You’re looking at /customers/3836/bills? What happens if you change that to 4000? They’re a big company. I bet that exists.

Did they put proper security checks EVERYWHERE? Easy to test.

But if you’re at /customers/{big-long-hex-string}/bill the chances of you guessing another valid ID are basically zero.

Yeah it’s security through obscurity. But it’s really good obscurity.

2 more replies

edoceo8mo ago

So the client side can create the ID before insert - that's the case that (mostly) drives it for me. The other is where you have distributed systems and then later want to merge the data and not have any ID conflicts.

saagarjha8mo ago

Allowing the client to generate IDs for you seems like a bad idea?

5 more replies

jrochkind18mo ago

yup, I'd say those are the two biggies.

Deadron8mo ago

For when you inevitably need to expose the ids to the public the uuids prevent a number of attacks that sequential numbers are vulnerable to. In theory they can also be faster/convenient in a certain view as you can generate a UUID without needing something like a central index to coordinate how they are created. They can also be treated as globally unique which can be useful in certain contexts. I don't think anyone would argue that their performance overall is better than serial/bigserial though as they take up more space in indexes.

xienze8mo ago

People really overthink this. You can safely expose internal IDs by doing a symmetric cipher, like a Feistel cipher. Even sequential IDs will appear random.

1 more reply

morshu90018mo ago

But these are internal IDs only, and public ones should be a separate col. Being able to generate uuid7 without a central index is useful in distributed systems, but this is a Postgres DB already.

Now, the index on the public IDs would be faster with a uuid7 than a uuid4, but you have a similar info leak risk that the article mentions.

1 more reply

ibejoeb8mo ago

If you need an opaque ID like a uuid because, for example, you need the capability to generate non-colliding IDs generated by disparate systems, the best way I've found is to separate these two concerns. Use a UUIDv4 for public purposes and a bigint internally. You don't need to worry about exposing creation time, and you can still manage your data in the home system with all the properties that a total ordering affords.

tracker18mo ago

Now coordinate those sequential ids on a sharded or otherwise clustered database system.

1 more reply

nextaccountic8mo ago

uuids can be generated by multiple services across your stack

bigserial must by generated by the db

coolspot8mo ago

But what if we just use milliseconds as our bigserial? And maybe add some hw-random number at the end to avoid conflicts? Wait

2 more replies

simongr3dal8mo ago

I believe the concern is if your primary key in the database is a serial number it might be exposed to users unless you do extra work to hide that ID from any external APIs and if there are any flaws in your authorization checks it can allow enumeration attacks exposing private or semi-private info. With UUIDs being virtually unguessable that makes it less of a concern.

morshu90018mo ago

uuid7 is still guessable though, as the article says. The assumption is that these are internal only PKs.

3 more replies

mhuffman8mo ago

>why you'd do either instead of just serial/bigserial, which has always been my goto. Did I miss something?

So the common response is sequential ID crawling by bad actors. UUIDs are generally un-guessable and you can throw them into slop DBs like Mongo or storage like S3 as primary identifiers without worrying about permissions or having a clever interested party pwn your whole database. A common case of security through obscurity.

martinky248mo ago

You don’t scale horizontally, do you?

rcfox8mo ago

Do most people? Not everyone is Google.

1 more reply

morshu90018mo ago

This is Postgres. There is Citus, but that still supports (maybe recommends?) serial PKs.

pqdbr8mo ago· 9 in thread

Great article, specially for this part:

> What can go wrong with using UUIDv7 Using UUIDv7 is generally discouraged for security when the primary key is exposed to end users in external-facing applications or APIs. The main issue is that UUIDv7 incorporates a 48-bit Unix timestamp as its most significant part, meaning the identifier itself leaks the record's creation time.

> This leakage is primarily a privacy concern. Attackers can use the timing data as metadata for de-anonymization or account correlation, potentially revealing activity patterns or growth rates within an organization. While UUIDv7 still contains random data, relying on the primary key for security is considered a flawed approach. Experts recommend using UUIDv7 only for internal keys and exposing a separate, truly random UUIDv4 as an external identifier.

SahAssar8mo ago

> Experts recommend

What experts? For what scenarios specifically? When do they consider time-of-creation to be sensitive?

dgb238mo ago

Or just generate them in bulk and take them from a list?

hn_throwaway_998mo ago

> Experts recommend using UUIDv7 only for internal keys and exposing a separate, truly random UUIDv4 as an external identifier.

So then what's the point? How I always did things in the past was use an auto increment big int as the internal primary key, and then use a separate random UUID for the external facing key. I think this recommendation from "experts" is pretty dumb because you get very little benefit using UUIDV7 (beyond some portability improvements) if you're still using a separate internal key.

While I wouldn't use UUIDV7 as a secure token like I would UUIDV4, I don't see anything wrong with using UUIDV7 as externally exposed object keys - you're still going to need permissions checks anyway.

morshu90018mo ago

I asked a similar question, and yeah it seems like this is entirely for distributed systems, even then only some of them. Your basic single DB Postgres should just have a serial PK.

crazygringo8mo ago

For distributed databases where you can't use autoincrement.

Or where, for some reason, the ID needs to be created before being inserted into the database. Like you're inserting into multiple services at once.

1 more reply

andy_ppp8mo ago

I wish Postgres would just allow you look up records by the random component of the field, what are the chances of collisions with 80 bits of randomness? My guess is it’s still enough.

jagged-chisel8mo ago

You can certainly create that index.

1 more reply

mamcx8mo ago

What could be better is to allow to create a type with custom display, in/out and internally set the native type IN SQL (this require to do it in c)

themafia8mo ago

> growth rates

I honestly don't see how.

rvitorper8mo ago· 6 in thread

Does anyone have performance issues with uuidv4? I worked with a db with 10s of billions of rows, no issues whatsoever. Would love to hear the mileage of fellow engineers

zerd8mo ago

I've had issues in a database with billions of rows where the PKs were a UUID. Indices on PK, and also foreign keys from other tables pointing to that table were pretty big, so much so that the indices themselves didn't all fit in memory. Like we would have an index on customer_id, document_id, both UUIDv4. DB didn't have UUID support, so they were stored as strings, so just 1 billion rows took ~30 GiB memory for PK index, 60GiB for the composite indices etc. So eventually the indices would not fit in memory. If we had UUID support or stored as bytes it might have halved it, but eventually become too big.

If you needed to look up say the 100 most recent documents, that would require ~100+ disk seeks at random locations just to look up the index due to the random nature of UUIDv4. If they were sequential or even just semi-sequential that would reduce the number of lookups to just a few, and they would be more likely to be cached since most hits would be to more recent rows. Having it roughly ordered by time would also help with e.g. partitioning. With no partitioning, as the table grows, it'll still have to traverse the B-Tree that has lots of entries from 5 years ago. With partitioning by year or year-month it only has to look at a small subset of that, which could fit easily in memory.

cipehr8mo ago

What database were you using? For example with SQL server, by default it clusters data on disk by primary key. Random (non-sequential) PKs like uuidv4 require random cluster shuffling to insert a row “in the middle” of a cluster, increasing io load and causing performance issues.

Postgres on the other hand doesn’t do clustered indexing on the PK… if I recall correctly.

rvitorper8mo ago

Postgres. It was also a single instance, which made it significantly easier. But nice to know that this is an issue on SQL Server

1 more reply

ahoka8mo ago

Then cluster it differently? The whole problem uuidv7 in databases solves is a non-issue in most cases.

crazygringo8mo ago

Honestly not really. Yes random keys make inserts slower. But if inserts are only 1% of your database load, then yeah it's basically no issues whatsoever.

On the other hand, if you're basically logging to your database so inserts are like 99% of the load, then it's something to consider.

rvitorper8mo ago

Makes sense. Thanks for the comment

gopalv8mo ago· 5 in thread

UUIDv7 is only bad for range partitioning and privacy concerns.

The "naturally sortable" is a good thing for postgres and for most people who want to use UUID, because there is no sorted distribution buckets where the last bucket always grows when inserting.

I want to see something like HBase or S3 paths when UUIDv7 gets used.

vlovich1238mo ago

> UUIDv7 is only bad for range partitioning and privacy concerns.

It's no worse for privacy than other UUID variants if the "privacy" you're worried about leaking is the creation time of the UUID.

As for range partitioning, you can of course choose to partition on the hash of the UUIDv7 at the cost of giving up cheaper rights / faster indices. On the other hand, that of course gives up locality which is a common challenge of partitioning schemes. It depends on the end-to-end design of the system but I wouldn't say that UUIDv7 is inherently good or bad or better/worse than other UUID schemes.

saghm8mo ago

Isn't it at least a bit worse than v4, which has no timestamp at all? There might be concerns around non-secure randomness being used to generate the bits, but I don't feel like it's accurate to claim that's indistinguishable from a literal timestamp.

ibejoeb8mo ago

UUIDv4 doesn't leak creation time.

parthdesai8mo ago

Why is it bad for range partitioning? If anything, it's better? With UUIDv7, you basically can partition on primary key, thus you can have "global" unique constraint.

wara23arish8mo ago

confused why it would be worse for range partitioning?

I assume there would be some type of index on the timestamp portion & the uuid portion?

wouldn’t that make it better for partitioning since we’d only need to query partitions that match the timestamp portion

stickfigure8mo ago· 4 in thread

It never occurred to me that Postgres is more efficient when inserting monotonic values. It's the nature of B+ trees so it makes sense. But in the world of distributed databases, monotonic inserts create hot partitions and scalability problems, so evenly-distributed ids are preferred.

In other words, "don't try this with CRDB".

chuckadams8mo ago

It's the nature of B+ trees, multiplied by the nature of clustered indexes: if you use a UUIDv4 as a primary key, your entire row gets moved to random locations, which really sucks when you normally retrieve them sequentially. With a non-clustered index (say, your UUIDv4 id you use for public APIs when you don't want to leak the v7 info) then you'll still get more fragmentation with the random data, but it's something autovacuum can usually keep up with. But it's more work it has to do on top of everything else it does.

masklinn8mo ago

Gp mentioned Postgres, which does not have clustered indexes. It has table clustering, which is a point operation rewriting the entire table but not a persistent property.

1 more reply

baq8mo ago

Leaky abstractions in databases are one of the reasons every developer should read the table of contents of the hot databases used by the things he’s working on. IME almost no one does that.

therealdrag08mo ago

Can you elaborate on the hot partition bit?

perrygeo8mo ago· 4 in thread

Is there an unavoidable tradeoff here? Keys that order nicely (auto-incrementing integers, UUIDv7) naturally leak information. Keys that are more secure (UUIDv4) can have performance problems because they have poor locality.

Or are there any random id generators that can compromise, remain sequential-ish without leaking exact timestamps and global ordering?

inopinatus8mo ago

Symmetric encryption of IDs at the edge. Optional embedded HMAC. Optional text encoding. For monotonic bigserial values I'm somewhat fond of base58(AES_K1(id{8} || HMAC_K2(id{8})[0..7])) with purpose/table-salted HKDF subkeys from a scrypt'd system passphrase. The hot path of this is pretty fast. As with all cryptographic solutions it comes with a whole new jungle of pitfalls, caveats, and tradeoffs, but it works.

morshu90018mo ago

How big would the resulting public ID be?

1 more reply

mjb8mo ago

Yes. The spatial locality benefits drop off quite quickly. A hashed uuidv7-like scheme with a rotating salt, for example, would keep short term locality and it's performance benefits while not having long term locality and it's downsides.

AlotOfReading8mo ago

The tradeoff is unavoidable. At one end is UUIDv4. At the far end is a gray code with excellent locality, but inherently allows you to know which half of the indices the record is from (even without inverting it). UUIDv7 is a pretty good middle ground.

qntmfred8mo ago· 4 in thread

any thoughts on uuidv7 vs ulid, nanoid, etc for url-safe encodings?

nikisweeting8mo ago

ULID is the best balance imo, it's more compact, can be double clicked to select, and case-insensitive so it can be saved on macOS filesystems without conflicts.

Now someone should make a UUIDv7 -> ULID adapter lib that 1:1 translates UUIDv7 <-> ULID preserving all the timestamp resolution and randomness bits so we can use the db-level UUIDv7 support to store ULIDs.

masklinn8mo ago

A uuid is a 128b number with a specific structure. You can encode them in base32 if you want, there is no need for any sort of conversion scheme.

1 more reply

clintonb8mo ago

I prefer TypeID: https://github.com/jetify-com/typeid

thewisenerd8mo ago

i guess that depends on what you mean by url-safe

uuidv7 (-) and nanoid (_-) have special characters which urlencode to themselves.

none are small enough that you want someone reading them over the phone; but from a character legibility, ulid makes more sense.

pilif8mo ago· 3 in thread

One thing that’s not quite clear to me is how safe it is to generate v7 uuids on the client.

That’s one of the nice properties of v4 uuids: you can make up a primary key of a new entity directly on the client and the database can use it directly. Sure: there is tiny collision risk, but it’s so small, you can get away with mostly ignoring it

With v7 however, such a large chunk of the uuid is based on the time, so I’m not sure whether it’s still safe to ignore collisions in any application, especially when you consider client’s clocks to probably be very inaccurate.

Am I overthinking things here?

PhilippGille8mo ago

How many clients requests do you get in the same millisecond?

With UUIDv7 it's split into:

- 48 bits: Unix timestamp in milliseconds

- 12 bis: Sub-millisecond timestamp fraction for additional ordering

- 62 bits: Random data for uniqueness

- 6 bits: Version and variant identifiers

So >4,600,000,000,000,000,000 IDs per fraction of a millisecond.

And unprecise time on the client doesn't matter, because some are ahead and some behind, vut that doesn't make them more likely to clash.

cenamus8mo ago

Does that factor in the birthday paradox?

qeternity8mo ago

If the client can generate a uuid4 they can also reuse a known uuid4

gnatolf8mo ago· 2 in thread

For me, the shear length of uuids is annoying in payloads of tokens etc. I wish there was a common way to abbreviate those, similar to the git way.

pmontra8mo ago

It's a 128 bit number. If you express that number in base 62 (26 upper case letters + 26 downcase letters + 10 digits) you need only a bit more than 20 characters. You can compress it further by increasing the base and using other 8 bit ASCII characters.

Merad8mo ago

Crockford base32 [0] is the best compromise, IMO. Reasonable length of 26 chars. Uses only alphanumeric characters and avoids issues with case sensitivity and confusing characters (0 vs O, etc.).

0: https://www.crockford.com/base32.html

lucasyvas8mo ago· 2 in thread

These are all non-issues - don’t allow an end user to determine a serial primary key as always.

And the amount of information it leaks is negligible - they might know the oldest and the newest and there’s an infinite gulf in between.

It’s better and more practical than SERIAL or BIGSERIAL in every way - if you need a random/external ID, add a second column. Done.

morshu90018mo ago

Why not serial PK with uuid4 secondary? Every join uses your PK and will be faster.

Biganon8mo ago

> if you need a random/external ID, add a second column. Done.

As others have stated, it completely defeats the performance purpose, if you need to lookup using another ID.

bearjaws8mo ago· 1 in thread

I really disagree that the privacy risk is enough to not use it at all, even in a healthcare setting.

There are wild scenarios you can come up with where you may leak something, but that assumes the information isn't coming over anyway.

"Reveals account creation time" - most APIs return this in API responses by default.

When have you seen just a list of UUIDs and no other more revealing metadata?

Meanwhile what pwns 99% of companies? Phishing.

sverhagen8mo ago

API responses should be limited to authenticated users. IDs are often present in hyperlinks that are included in insecure emails, or in URLs that, being routed through all sorts of networking hops may be captured and available as metadata.

MaKey8mo ago· 1 in thread

Interesting that aiven is still around after they've lost customer data a few years back.

oskari8mo ago

I believe you're referring to our January 2020 Kafka incident where a logic bug caused data loss for a customer. It was a serious failure and a huge learning moment for us.

The platform we operate today is fundamentally different and far more resilient than it was five years ago. We've scaled significantly (recently passing $100M ARR) because we took those early lessons seriously and continue to prioritize reliability.

caymanjim8mo ago

Tangential, but I'm grateful to this article for teaching me that Postgres has "table foo" as shorthand for "select * from foo". I won't use that in code, but I'll happily use it for interactive queries.

pmontra8mo ago

My customers return created_at attributes in all their API calls so UUIDv7 won't harm them at all. They also use sequential ids. Only one of them ever used UUIDv4 as primary key. We didn't have any performance problem but the whole production system was run by one psql insurance and one Elixir application server. Probably almost any architectural choice is good at that scale.

mfrye08mo ago

I can confirm on the performance benefits. I wanted to start with uuidv7 for a new DB earlier this year, so I put together a function to use in the meantime. Once the function is available natively, we'll just migrate to use it instead.

For anyone interested:

CREATE FUNCTION uuidv7() RETURNS uuid AS $$ -- Get base random UUID and overlay timestamp select encode( set_bit( set_bit( overlay(uuid_send(gen_random_uuid()) placing substring(int8send((extract(epoch from clock_timestamp())*1000)::bigint) from 3) from 1 for 6), 52, 1), -- Set version bits to 0111 53, 1), 'hex')::uuid; $$ LANGUAGE sql volatile;

Rafert8mo ago

> Using UUIDv7 is generally discouraged for security when the primary key is exposed to end users in external-facing applications or APIs.

I would not call this “generally discouraged” when APIs generally surface a created_at timestamp in their responses. A real life example are Stripe IDs which have similar properties (k-sorted) as UUIDv7: https://brandur.org/nanoglyphs/026-ids#ulids

delifue8mo ago

I disagree with this

> While UUIDv7 still contains random data, relying on the primary key for security is considered a flawed approach

The correct way is 1. generate ID on server side, not client side 2. always validate data access permission of all IDs sent from client

Predictable ID is only unsafe if you don't validate data access permission of IDs sent from client. Also, UUIDv7 is much less predictable than auto-increment ID.

But I do agree that having create time in public-facing ID can leak analytical information.

turrini8mo ago

Something like this [1] or an adaptation may address their security considerations. Discussed here [2]

[1] https://github.com/stateless-me/uuidv47

[2] https://news.ycombinator.com/item?id=45275973

burnt-resistor8mo ago

Sequential primary keys are pretty important for scalable, stable sorting by record creation time using the primary keys' index similar to serial (int) but avoids the guessing vulnerability. For this use-case, an UUID "v9"-like approach can be a better option: https://uuidv9.jhunt.dev

klysm8mo ago

I don’t care at all about “leaking” the creation time for records. I think the documentation is overly zealous

6r178mo ago

Great read - short, effective ; I know what I learned. Very good job

j / k navigate · click thread line to collapse

226 comments

115 comments · 22 top-level

crazygringo8mo ago· 34 in thread

tracker18mo ago

This is only really true if leaking the creation time of the record is itself a security concern.

donjoe8mo ago

To me, the most important question is: how do I scale v7 in an environment of 20+ engineers?

When using v7, I need some sort of audit that checks in every API contract for the usage of v7 and potential information leakage.

Detecting V7 uuids in the API contract would probably require me to enforce a special key name (uuidv7 & uuid for v4) for easier audit.

Engineers will get this wrong more than once - especially in a mixed team of Jr/sr.

3 more replies

AdieuToLogic8mo ago

>>> Using UUIDv7 is generally discouraged for security when the primary key is exposed to end users in external-facing applications or APIs.

> This is only really true if leaking the creation time of the record is itself a security concern.

No, as "leaking the creation time" is not a concern when API's return resources having properties representing creation/modification timestamps.

0 - https://www.postgresql.org/docs/current/datatype-numeric.htm...

1 more reply

MikeNotThePope8mo ago

Exactly. I wrote about that a few days ago.

Primary keys using UUID v7 are (potentially) an HR violation.

https://mikenotthepope.com/primary-keys-using-uuid-v7-are-po...

3 more replies

kvirani8mo ago

Which I have to assume is rare, right?

4 more replies

oconnor6638mo ago

2 more replies

whalesalad8mo ago

Yeah if you’re relying on unguessable public IDs as your security model you’re not doing security.

dethos8mo ago

Exactly

nitwit0058mo ago

It was a concern in the past, as people used password creation tools that were deterministic based on the current time.

matthew165508mo ago

Using UUIDv4 as primary key has unexpected downsides because data locality matters in surprising places [1].

A UUIDv7 primary key seems to reduce / eliminate those problems.

If there is also an indexed UUIDv4 column for external id, I suspect it would not be used as often as the primary key index so would not cancel out the performance improvements of UUIDv7.

[1] https://www.cybertec-postgresql.com/en/unexpected-downsides-...

AdieuToLogic8mo ago

> Using UUIDv4 as primary key has unexpected downsides because data locality matters in surprising places.

0 - https://www.postgresql.org/docs/current/datatype-numeric.htm...

crazygringo8mo ago

> I suspect it would not be used as often as the primary key index

That doesn't matter because it's the creation of the index entry that matters, not how often it's used for lookup. The lookup cost is the same anyways.

1 more reply

oconnore8mo ago

There is no need to put the privacy preserving ID in a database index when you can calculate the mapping on the fly

10000truths8mo ago

1 more reply

blackenedgem8mo ago

1 more reply

jongjong8mo ago

Great point. Also, having to support multiple IDs is a maintenance headache.

macote8mo ago

You don't need to add a UUIDv4 column, you could just encrypt your UUIDv7 with format-preserving encryption (FPE).

whattheheckheck8mo ago

What's the computational complexity of doing that conversion vs the lookup table of uuidv4 for each uuidv7?

2 more replies

tekne8mo ago

gigatexal8mo ago

crazygringo8mo ago

The point is, that mapping table incurs the same indexing cost that was trying to be eliminated in the first place. Normalization is irrelevant.

1 more reply

Quekid58mo ago

I wonder if there is a name for such a mapping table in RDBMS-land...?

1 more reply

lukebechtel8mo ago

how risky is exposing creation time really though? I feel like for most applications this is uncritical

Biganon8mo ago

morshu90018mo ago

It takes consideration. There are plenty of systems like Facebook and Twitter that use IDs somewhat exposing time, but the things they're IDing already have public creation timestamps.

sgarland8mo ago

Who are these "experts?" I'm a DBRE, and also very security conscious, and think this is an absurd what-if for most companies.

If it does matter for your application, then don't expose it - use an opaque id with something like AEAD, and expose that.

sverhagen8mo ago

jpalawaga8mo ago

that is pretty common with uuid. for example in many cases you'll still want a plain uuid4 instead of e.g.uuid 5. maybe you want 5. it's usecase dependent.

for a specification such as uuid, there is not much to improve upon--just rearranging the bytes and their meanings.

ownagefool8mo ago

Meh.

However, your Orders table is significantly more likely to be looked up based on time, so a v7 makes a lot of sense here.

There will be instances of getting it wrong, but it isn't actually _that_ complicated.

saaspirant8mo ago

I am using it in a table where sorting by id (primary key) should also sort it by created time (newer records should have "bigger" id).

The id would be exposed to users. An integer would expose the number of records in it.

Am I using right guys?

djantje8mo ago

DB multi-master, or the DB not being responsible for primary key generation, is the use case, I think.

And then having uuidv7 as primary and foreign keys, can give you a performance gain.

Illniyar8mo ago

I'm sure there might be a middle ground where most of the performance gains remain but the deanonymizing risk is greatly reduced.

Edit: encrypting the value in transit seems a simpler solution really

hu38mo ago

In that case, auto increments can also be bumped from time to time. And start from a billion.

They're more performant than uuidv7. Why would I still use UIID? Perhaps I would still want uuids because they can be generated in client and because they make incorrect JOINs return no rows.

tonyhart78mo ago

Yeah, just use uuidv4 and another "ULID" if thats the case

which is pointless

morshu90018mo ago· 18 in thread

The article compares UUIDv7 vs v4, but doesn't say why you'd do either instead of just serial/bigserial, which has always been my goto. Did I miss something?

molf8mo ago

Good question. There's a few reasons to pick UUID over serial keys:

- Serial keys are required to be created by the database. UUIDs can be created anywhere (including your backend or frontend application), which can sometimes simplify logic.

- Because UUIDs can be generated anywhere, sharding is easier.

The obvious downside to UUIDs is that they are slightly slower than serial keys. UUIDv7 improves insert performance at the cost of leaking creation time.

I've found that the data leaked by serial keys is problematic often enough; whereas UUIDs (v4) are almost always fast enough. And migrating a table to UUIDv7 is relatively straightforward if needed.

MBCook8mo ago

Not only can you make a good guess at how many customers/etc exist, you can guess individual ones.

World’s easiest hack. You’re looking at /customers/3836/bills? What happens if you change that to 4000? They’re a big company. I bet that exists.

Did they put proper security checks EVERYWHERE? Easy to test.

But if you’re at /customers/{big-long-hex-string}/bill the chances of you guessing another valid ID are basically zero.

Yeah it’s security through obscurity. But it’s really good obscurity.

2 more replies

edoceo8mo ago

saagarjha8mo ago

Allowing the client to generate IDs for you seems like a bad idea?

5 more replies

jrochkind18mo ago

yup, I'd say those are the two biggies.

Deadron8mo ago

xienze8mo ago

People really overthink this. You can safely expose internal IDs by doing a symmetric cipher, like a Feistel cipher. Even sequential IDs will appear random.

1 more reply

morshu90018mo ago

But these are internal IDs only, and public ones should be a separate col. Being able to generate uuid7 without a central index is useful in distributed systems, but this is a Postgres DB already.

Now, the index on the public IDs would be faster with a uuid7 than a uuid4, but you have a similar info leak risk that the article mentions.

1 more reply

ibejoeb8mo ago

tracker18mo ago

Now coordinate those sequential ids on a sharded or otherwise clustered database system.

1 more reply

nextaccountic8mo ago

uuids can be generated by multiple services across your stack

bigserial must by generated by the db

coolspot8mo ago

But what if we just use milliseconds as our bigserial? And maybe add some hw-random number at the end to avoid conflicts? Wait

2 more replies

simongr3dal8mo ago

morshu90018mo ago

uuid7 is still guessable though, as the article says. The assumption is that these are internal only PKs.

3 more replies

mhuffman8mo ago

>why you'd do either instead of just serial/bigserial, which has always been my goto. Did I miss something?

martinky248mo ago

You don’t scale horizontally, do you?

rcfox8mo ago

Do most people? Not everyone is Google.

1 more reply

morshu90018mo ago

This is Postgres. There is Citus, but that still supports (maybe recommends?) serial PKs.

pqdbr8mo ago· 9 in thread

Great article, specially for this part:

SahAssar8mo ago

> Experts recommend

What experts? For what scenarios specifically? When do they consider time-of-creation to be sensitive?

dgb238mo ago

Or just generate them in bulk and take them from a list?

hn_throwaway_998mo ago

> Experts recommend using UUIDv7 only for internal keys and exposing a separate, truly random UUIDv4 as an external identifier.

morshu90018mo ago

I asked a similar question, and yeah it seems like this is entirely for distributed systems, even then only some of them. Your basic single DB Postgres should just have a serial PK.

crazygringo8mo ago

For distributed databases where you can't use autoincrement.

Or where, for some reason, the ID needs to be created before being inserted into the database. Like you're inserting into multiple services at once.

1 more reply

andy_ppp8mo ago

I wish Postgres would just allow you look up records by the random component of the field, what are the chances of collisions with 80 bits of randomness? My guess is it’s still enough.

jagged-chisel8mo ago

You can certainly create that index.

1 more reply

mamcx8mo ago

What could be better is to allow to create a type with custom display, in/out and internally set the native type IN SQL (this require to do it in c)

themafia8mo ago

> growth rates

I honestly don't see how.

rvitorper8mo ago· 6 in thread

Does anyone have performance issues with uuidv4? I worked with a db with 10s of billions of rows, no issues whatsoever. Would love to hear the mileage of fellow engineers

zerd8mo ago

cipehr8mo ago

Postgres on the other hand doesn’t do clustered indexing on the PK… if I recall correctly.

rvitorper8mo ago

Postgres. It was also a single instance, which made it significantly easier. But nice to know that this is an issue on SQL Server

1 more reply

ahoka8mo ago

Then cluster it differently? The whole problem uuidv7 in databases solves is a non-issue in most cases.

crazygringo8mo ago

Honestly not really. Yes random keys make inserts slower. But if inserts are only 1% of your database load, then yeah it's basically no issues whatsoever.

On the other hand, if you're basically logging to your database so inserts are like 99% of the load, then it's something to consider.

rvitorper8mo ago

Makes sense. Thanks for the comment

gopalv8mo ago· 5 in thread

UUIDv7 is only bad for range partitioning and privacy concerns.

The "naturally sortable" is a good thing for postgres and for most people who want to use UUID, because there is no sorted distribution buckets where the last bucket always grows when inserting.

I want to see something like HBase or S3 paths when UUIDv7 gets used.

vlovich1238mo ago

> UUIDv7 is only bad for range partitioning and privacy concerns.

It's no worse for privacy than other UUID variants if the "privacy" you're worried about leaking is the creation time of the UUID.

saghm8mo ago

ibejoeb8mo ago

UUIDv4 doesn't leak creation time.

parthdesai8mo ago

Why is it bad for range partitioning? If anything, it's better? With UUIDv7, you basically can partition on primary key, thus you can have "global" unique constraint.

wara23arish8mo ago

confused why it would be worse for range partitioning?

I assume there would be some type of index on the timestamp portion & the uuid portion?

wouldn’t that make it better for partitioning since we’d only need to query partitions that match the timestamp portion

stickfigure8mo ago· 4 in thread

In other words, "don't try this with CRDB".

chuckadams8mo ago

masklinn8mo ago

Gp mentioned Postgres, which does not have clustered indexes. It has table clustering, which is a point operation rewriting the entire table but not a persistent property.

1 more reply

baq8mo ago

Leaky abstractions in databases are one of the reasons every developer should read the table of contents of the hot databases used by the things he’s working on. IME almost no one does that.

therealdrag08mo ago

Can you elaborate on the hot partition bit?

perrygeo8mo ago· 4 in thread

Or are there any random id generators that can compromise, remain sequential-ish without leaking exact timestamps and global ordering?

inopinatus8mo ago

morshu90018mo ago

How big would the resulting public ID be?

1 more reply

mjb8mo ago

AlotOfReading8mo ago

qntmfred8mo ago· 4 in thread

any thoughts on uuidv7 vs ulid, nanoid, etc for url-safe encodings?

nikisweeting8mo ago

ULID is the best balance imo, it's more compact, can be double clicked to select, and case-insensitive so it can be saved on macOS filesystems without conflicts.

masklinn8mo ago

A uuid is a 128b number with a specific structure. You can encode them in base32 if you want, there is no need for any sort of conversion scheme.

1 more reply

clintonb8mo ago

I prefer TypeID: https://github.com/jetify-com/typeid

thewisenerd8mo ago

i guess that depends on what you mean by url-safe

uuidv7 (-) and nanoid (_-) have special characters which urlencode to themselves.

none are small enough that you want someone reading them over the phone; but from a character legibility, ulid makes more sense.

pilif8mo ago· 3 in thread

One thing that’s not quite clear to me is how safe it is to generate v7 uuids on the client.

Am I overthinking things here?

PhilippGille8mo ago

How many clients requests do you get in the same millisecond?

With UUIDv7 it's split into:

- 48 bits: Unix timestamp in milliseconds

- 12 bis: Sub-millisecond timestamp fraction for additional ordering

- 62 bits: Random data for uniqueness

- 6 bits: Version and variant identifiers

So >4,600,000,000,000,000,000 IDs per fraction of a millisecond.

And unprecise time on the client doesn't matter, because some are ahead and some behind, vut that doesn't make them more likely to clash.

cenamus8mo ago

Does that factor in the birthday paradox?

qeternity8mo ago

If the client can generate a uuid4 they can also reuse a known uuid4

gnatolf8mo ago· 2 in thread

For me, the shear length of uuids is annoying in payloads of tokens etc. I wish there was a common way to abbreviate those, similar to the git way.

pmontra8mo ago

Merad8mo ago

Crockford base32 [0] is the best compromise, IMO. Reasonable length of 26 chars. Uses only alphanumeric characters and avoids issues with case sensitivity and confusing characters (0 vs O, etc.).

0: https://www.crockford.com/base32.html

lucasyvas8mo ago· 2 in thread

These are all non-issues - don’t allow an end user to determine a serial primary key as always.

And the amount of information it leaks is negligible - they might know the oldest and the newest and there’s an infinite gulf in between.

It’s better and more practical than SERIAL or BIGSERIAL in every way - if you need a random/external ID, add a second column. Done.

morshu90018mo ago

Why not serial PK with uuid4 secondary? Every join uses your PK and will be faster.

Biganon8mo ago

> if you need a random/external ID, add a second column. Done.

As others have stated, it completely defeats the performance purpose, if you need to lookup using another ID.

bearjaws8mo ago· 1 in thread

I really disagree that the privacy risk is enough to not use it at all, even in a healthcare setting.

There are wild scenarios you can come up with where you may leak something, but that assumes the information isn't coming over anyway.

"Reveals account creation time" - most APIs return this in API responses by default.

When have you seen just a list of UUIDs and no other more revealing metadata?

Meanwhile what pwns 99% of companies? Phishing.

sverhagen8mo ago

MaKey8mo ago· 1 in thread

Interesting that aiven is still around after they've lost customer data a few years back.

oskari8mo ago

I believe you're referring to our January 2020 Kafka incident where a logic bug caused data loss for a customer. It was a serious failure and a huge learning moment for us.

caymanjim8mo ago

pmontra8mo ago

mfrye08mo ago

For anyone interested:

Rafert8mo ago

> Using UUIDv7 is generally discouraged for security when the primary key is exposed to end users in external-facing applications or APIs.

delifue8mo ago

I disagree with this

> While UUIDv7 still contains random data, relying on the primary key for security is considered a flawed approach

The correct way is 1. generate ID on server side, not client side 2. always validate data access permission of all IDs sent from client

Predictable ID is only unsafe if you don't validate data access permission of IDs sent from client. Also, UUIDv7 is much less predictable than auto-increment ID.

But I do agree that having create time in public-facing ID can leak analytical information.

turrini8mo ago

Something like this [1] or an adaptation may address their security considerations. Discussed here [2]

[1] https://github.com/stateless-me/uuidv47

[2] https://news.ycombinator.com/item?id=45275973

burnt-resistor8mo ago

klysm8mo ago

I don’t care at all about “leaking” the creation time for records. I think the documentation is overly zealous

6r178mo ago

Great read - short, effective ; I know what I learned. Very good job

j / k navigate · click thread line to collapse