Goodbye integers, hello UUIDv7 (opens in new tab)

(buildkite.com)

726 pointsjuanfatas2y ago363 comments

363 comments

202 comments · 44 top-level

jonhohle2y ago· 41 in thread

This is great for internal distributed systems where having ordered keys is useful, however, it should probably be noted that these probably shouldn't be used as public identifiers (even though this will probably be the defacto standard and used publicly without thought).

Having any information, specifically time information, leaking from your systems may or may not have unanticipated security or business implications. (e.g. knowing when session tokens or accounts are created).

oconnore2y ago

Given that a UUID identifier fits in a single cipher block, and the whole point is that these are unique by construction (no IV needed so long as that holds true), it seems like a single round of ECB-mode AES-128 would enable quickly converting between internal/external identifiers.

128 bits -> 128 bits

andix2y ago

I like the idea, but I think it's not possible to rotate the key with that approach, without introducing a breaking change. Eternal secrets are usually a very bad idea, because at some point they are going to be leaked.

2 more replies

dragontamer2y ago

Why not just use the AES-128 result as the UUID then? What's the benefit of the internal structure at all?

If AES-128 is an acceptable external UUID (and likely an acceptable internal one), then you might as well just stick with a faster RNG.

3 more replies

lysium2y ago

Neat idea.

I’m afraid you won’t be able to ever rotate that key, would you? Since it’s result is externally used as an identifier, you would have to rotate the external identifiers, too.

3 more replies

dajonker2y ago

That's an interesting idea, how would you deal with the bits in the UUID that are used for the version? Setting them to random bits may cause issues for clients that try to use the identifier in their own database or application, as mentioned in the article.

1 more reply

MattPalmer10862y ago

This seems overly complex, and you need some kind of key too.

Why not just hash it with pretty much any hash function?

2 more replies

fodkodrasz2y ago

No IV, ECB mode... why bother with encryption at all? Just expose the internal id.

5 more replies

nhoughto2y ago

Yep have used an approach just like that, worked quite well if you have a strong pattern to easily translate from one to the other. Gives you an id with the right properties for internal use, efficient indexing etc, and in its encrypted form gives you the properties you want from an external identifier being unpredictable etc, all from one source id.

It is true that now your encryption key is now very long lived and effectively part of your public interface, but depending on your situation that could be an acceptable tradeoff, and there are quite a few pragmatic reasons why that might be true as has been described by other comments.

Edit: you can even do 64bit snowflakes internally to 128bit AES encrypted externally, doesn’t have to be 128-128 obvs

1 more reply

nindalf2y ago

One key for all tokens or one key per token? If it’s the latter a simple XOR would do because it would be the equivalent of a one time pad.

2 more replies

noduerme2y ago

Thinking that harder-to-guess IDs will mitigate attacks is an example of security by obscurity. It's better to think of any IDs in your database as being public knowledge, because they will leak anyway. Assuming that no one can guess another ID leads to shoddy practices. I generally keep IDs sequential and build security around the basic assumption that IDs are not keys, passwords, sessions, or secrets - they're just the public matching identifier for those things.

To that end, I think it's neat to be able to improve indexing on UUIDs, but it's not a security solution.

dalore2y ago

Having sequential ID's is more than just a security risk, it's an information risk. Competitors can use them to estimate the size of your business, the number of customers you have, and all sorts of stuff.

This was used in the war to estimate the number of German tanks based on the sequential IDs

https://en.wikipedia.org/wiki/German_tank_problem

So just for business intelligence you don't want to leak your IDs.

4 more replies

wolletd2y ago

Just friday I've had a discussion with a colleague about filenames.

We do a lot of computer vision and in his project, each processed object is assigned a UUID and he wanted to save images to files for each one.

So we took some time to go over various timestamp formats to be embedded into the filename to make the files sort chronologically. UUIDv7 is just spot-on solving our problem. In this use case, there are no real security considerations.

berkes2y ago

Doesn't that still leak (statistical) information?

It may not be technically security, but e.g. knowing your competitor just added N products to their shop, might be a security issue for the business.

2 more replies

Nevermark2y ago

Security by obscurity is a necessary step in most software security.

It hardens, completes and complements other measures.

Examples of every day security using obscurity: every password and encryption key

EDIT: Thanks for the replies.

Ignore above!

Obscurity is the low bit of security. But when it’s convenient, it still helps.

2 more replies

vidarh2y ago

You should think of them as public, but that doesn't mean it isn't still helpful to obscure aspects of the information they carry.

Obscurity can be helpful as part of defence in depth, to reduce the impact when someone does something stupid, or to make it more difficult to extract information that might be helpful as a means to attack the system from another angle.

If you're already thinking about the implications, you can likely ensure people doesn't jump to the conclusion that the IDs can be trusted just because they look complex.

ivan_gammel2y ago

Security by obscurity is a working solution if implemented with other measures. It increases the cost of attack, which in the presence of unknown vulnerabilities gives you precious time to respond.

imiric2y ago

I'm a fan of Cuid2[1] for this reason.

They are compact, don't leak information, and make a good case why k-sortable IDs are unnecessary, or even harmful for performance.

I'm using sequential integers and created_at/updated_at timestamps for internal use, and Cuid2 IDs externally.

[1]: https://github.com/paralleldrive/cuid2

tveita2y ago

> But not too fast: If you can hash too quickly you can launch parallel attacks to find duplicates or break entropy-hiding. For unique ids, the fastest runner loses the security race.

> Cuid2 has been audited by security experts and artificial intelligence, and is considered safe to use for use-cases like secret sharing links.

I'm getting some snake oil vibes from this... There absolutely shouldn't be anything like a random ID that is 'too fast' to compute. You might need a rate limit to stay within your collision bounds, but CPU usage is a poor way to do it.

And there is currently no publicly available "artificial intelligence" that would be useful in a security audit, unless you want to call fuzzers "AI".

1 more reply

sgarland2y ago

The comments on performance are utterly incorrect, modulo discussions on hotspots, but you shouldn’t be sharding randomly anyway. If you get to the point where you _need_ to shard for anything other than geolocality, doing so randomly will rapidly reveal your hotspots.

> One reason for using sequential keys is to avoid id fragmentation, which can require a large amount of disk space for databases with billions of records.

Disk is cheap but not free at higher tiers. But more importantly, record fragmentation means more pages (unless you take the time to do a full table lock and rewrite it, and who’s doing that?) which means more index bloat. I assure you, that adds up once you’re into the billions of records level.

> the ids will be generated in a sequential order, causing the tree to become unbalanced, which will lead to frequent rebalancing.

Given the width of B+trees used in DBs, I doubt they generally need to go more than one or at most two levels up. I’ll take the ability to rapidly follow the leaf nodes and have a good shot at sequential reads in cache from prefetch, thanks.

ndriscoll2y ago

Nearly everything in this README about security or performance is wrong. I'd be very wary of using this.

BiteCode_dev2y ago

What benefice over uuid4 ?

2 more replies

BillinghamJ2y ago

>Having any information, specifically time information, leaking from your systems may or may not have unanticipated security or business implications. (e.g. knowing when session tokens or accounts are created).

I don't think this is really true? These are not serially incrementing, they just indicate the time it happened. If you have an ID that you know exists, having the ability to know _when_ it was created is very rarely meaningful.

What could present more of a risk is being able to predict a large part of IDs that will be created. Even then though, you shouldn't depend on your IDs for secrecy - best to ensure the IDs are never used as protection by themselves (ie treat them like they're just a simple autoincrementing number, even if they're not)

WorldMaker2y ago

One real world security problem is the "elder account" problem: as the age of an account increases the likelihood increases that it uses an insecure old password and/or that the account owner isn't paying as much attention to the account in the present. Depending on what the account represents age may also imply more "value" in the account. (Including just "sentimentality" value in the case of ransom operations, not just financial value.) Being able to tell from an ID alone that an account is at least X years older than some other ID in the system can be a handy way to find "potentially high value/low security" accounts to focus on to social engineer.

There are certainly mitigations that can be made and not all things are equally valuable as they age. (Plus many public APIs include created/modified timestamps anyway. The information is often easy to discover even when not embedded in an ID.) I don't find it a strong reason to avoid timestamp-based IDs for the threat models of that many things beyond user accounts and other things susceptible for social engineering, but it is something to keep aware of.

Someone2y ago

One business implication is that third parties can detect whether your sales increase or decrease from sampling those IDs (a variation on https://en.wikipedia.org/wiki/German_tank_problem)

1 more reply

rtsil2y ago

Don't create them each time a record is created, create a batch in advance in sufficient number, and do the same every time the previous batch has ran out. UUIDv7 is 128 bits, you can store a large number of them without major penalty.

Macha2y ago

But then you need to have the client communicate with the server to identify it's newly created object or complicate your logic to have incomplete objects in a pending state, which is one of the things people were using UUIDs to avoid.

1 more reply

pnpnp2y ago

They’re also incredibly cheap to create & don’t need knowledge of each other. I mostly see batching of IDs like this when a lock is involved to prevent collisions & maintain performance.

With UUIDv7, you are reasonably sure that there won’t be collisions (check your use-case first), and can just generate them wherever on-demand (no locks required).

I’d argue batching IDs is actually more complicated than UUIDv7 for most use-cases.

1 more reply

0xEFF2y ago

Every access and id token issued by oidc already has an issued at (iat) and expiration (exp) fields.

throwaway1672y ago

Just randomise your clock.

Persistent IDs are a security and information risk. If that's a concern, don't persist IDs.

greatgib2y ago

Fyi, the timestamp is already encoded inside uuid4. But for having a good distribution of values, the low bits half of the timestamp is stored before the high bits half.

Here uuidv7 will just re-order that. So the content of the uuid in itself does not change.

mafuy2y ago

No, this was the case in earlier uuids. In v4, there is no timestamp.

1 more reply

woile2y ago

Could you explain a bit more how it would be a risk? Maybe for session tokens is understandable. But why leaking account created info is a problem?

logicchains2y ago

Leaking a monotonic ID could allow outside observers to estimate e.g. number of accounts created or products sold over certain timeframe. Competitors (or traders, for a public company) could use this like a form of inside information on the company (e.g. sell the stock if the rate was falling).

2 more replies

andrewmatte2y ago

jonhohle, thanks. Do you know of examples of when milliseconds are part of the session tokens or accounts being created has been exploited?

tgv2y ago

The German tank production capacity was estimated by serial numbers of captured tanks. There are ways to read all kinds of information by observing energy usage. High resolution time and sequence data undoubtedly reveal more than you’d like.

1 more reply

BWStearns2y ago

I could imagine using the timestamp segment of publicly observable ids to estimate activity patterns in an organization. Probably not super crucial and there are probably easier ways in most cases but it could be a big deal at the right moment and for the right target. This could be like a more refined version of PIZZAINT (where you can detect impending policy/operational movements by the quantity of food deliveries to a government organization).

1 more reply

tarjei_huse2y ago

I know of people who used leaked customer ids in public facing chatbot solutions (like Intercom) to estimate how fast their competitors were growing and/or how many customers they had.

1 more reply

rvnx2y ago

It’s as convenient as manipulating IPv6 addresses.

tabtab2y ago

Plus ID's often have to be given over the phone for support. Long or complicated ID's will drive both sides bonkers. #BeenThereDoneThat.

jacobgorm2y ago

Not to speak of the increased risk of key collisions. https://en.wikipedia.org/wiki/Birthday_problem

hannasanarion2y ago

In the same millisecond, in the same database system, having rolled the same 74 bit (~22 digit) number?

1 more reply

rockwotj2y ago· 21 in thread

I find it interesting that it’s quoted random IDs are bad for performance, because it’s actually better for distributed storage systems because you don’t hotspot on a single node. For example see: https://stackoverflow.com/a/53901549 and https://medium.com/google-cloud/cloud-spanner-choosing-the-r...

chacham152y ago

They can be bad for performance. It all depends on your access patterns. A common caching pattern is called "temporal locality" which means that theres a high likelihood that data created at the same time will be accessed at the same time. Therefore, if these pieces of information are on the same machine, they can be queried / returned much faster than if they were both on separate machines. This is doubly true if theres a data dependency between them. E.g. SELECT x + y or SELECT x WHERE y = 'foo'.

stepanhruda2y ago

Yes but if that machine with sequential data receives 100x the traffic of other machines, it can be worse than splitting this traffic evenly across all available machines.

2 more replies

berkes2y ago

We solved that with UUIDS and updated_at and created_at columns. The latter the default sort in all views and queries. So the btree/indexing issues were hardly an issue. Whenever you fetch a set of rows, they will be bounded by these timestamps.

We even sharded on these columns, because of this (our business case made it so that hardly ever did people need data over multiple months)

But we never encountered distribution issues. I don't think the locality issue will be solved, as postgres doesn't consider other columns when distributing data, only the primary key IIRC. I don't know why we never saw this, though.

hinkley2y ago

If you're working on a multi-user system, particularly one with hundreds of requests per second, there is no locality of ids. Two of my actions are separated by a sea of actions by other users.

akira25012y ago

UUIDv7: Timestamp up front, random in the back.

wenc2y ago

I know HN doesn't like jokes, but this is really funny. And the subcomment about mullets too.

(for folks who don't get it, mullets are a 1980s haircut (think MacGyver) with a short front but a long tail in the back. A funny description of them is "business in the front, party in the back")

1 more reply

labster2y ago

Truly, the mullet of unique identifiers.

sj262y ago

It Depends(tm).

If you're using a system which is built for distribution, random is great.

When you're leaning on a Postgres database which has powered your startup through scaling but expects right-leaning btree indexes, it's a bad time.

Rearchitecting to use a new data store is ideal, but often impractical as an immediate step. UUIDv7 is a great increment walking that road via sharding etc.

fnordpiglet2y ago

In all the distributed systems I’ve built I hashed the keys to ensure good distribution. A nice thing of ordered keys is you can use part of the ordering to distribute keys with a tunable amount of key locality in each node for efficiency.

jillesvangurp2y ago

Depends how you query it. In a lot of systems, recently added data is also the most queried data or data typically gets pulled out sorted by time. Having that data on disk in more or less the order it is going to be queried makes sorting it a bit easier. Even in a sharded system, each of the shards would have less work to do for sorting. Of course a lot of these systems would have an append only write model which would effectively sort things by time anyway, even with completely random ids.

Somebody posted an interesting article for the instagram ids, which do something similar. They use 41 bits for a time from a custom epoch followed two more groups of bits for a shard id and a sequential number. Each shard has an incrementing sequence for the sequential bit, which guarantees that things on a shard are sorted by time.

This UUIDv7 is slightly weaker than that but sorting things published in the same millisecond is mostly going to be very light work. The lack of a dedicated sharding group of bits is not that important as you could just take the n least significant bits at the end for that without too much effort. Those are random so you end up with nice consistent hashing. Having 48 instead of 41 bits for the time means we won't run out of time any time soon (nearly 9K years vs. 70 years).

gregw22y ago

An explicit shard id can ensure all related data across all tables can be on the same shard. Helpful for SQL JOIN operations.

Picking the N least significant bits only a single table has good distribution and sort qualities, no cross-table properties.

stingraycharles2y ago

Depends on the use case. If, for example, you store things on disk ordered by these IDs, and access patterns to your dataset are related to time (e.g. more recently created data is accessed more frequently), it will help a lot to have this data ordered by time.

This is especially useful when your underlying database stores data in large "chunks", such as LSM-trees you find with e.g. rocksdb.

tveita2y ago

It's great for performance, up until you reach the point where a single device becomes a bottleneck, at which point it's terrible for performance.

As a sibling comment says, you ideally want to shard on some other key to get "just enough" distribution that all your machines/disks have work to do, but you are still only hitting a limited number of hot sectors on each disk that can be effectively cached. But that requires active monitoring and rebalancing of your data as it grows. Totally random keys are a safe default that will scale with any kind of data distribution and access patterns.

sroussey2y ago

It can be bad for performance due to how b-trees work in databases, and more pronounced when you have a clustered index.

dheera2y ago

It's bad for performance if you frequently access large consecutive sets of records.

hinkley2y ago

UUIDs are good for data where I want either lots of different users being able to insert without collision, or lots of users who I want to keep their peepers off of other user's metadata (eg, how many X they add to the system per day).

In both cases I'm melding highly disjointed data into a single schema. There are no large consecutive sets of records.

If you're using UUIDs, there's probably a reason. And that reason invalidates the justifications for not using them.

1 more reply

moralestapia2y ago

Agree, but the solution is easy peasy with a simple hash function.

(Or just reverse the bits, take the last n, etc)

conradludgate2y ago

As far as I understand, you want to have a random shard position, but once you have found a shard you want that index operation to be cache friendly. When choosing a shard, you can always use the last N bits or use some consistent hashing strategy[1]

[1]: https://en.m.wikipedia.org/wiki/Consistent_hashing

andix2y ago

There are many other options that usually scale better than random distribution. For example distribution by user or tenant id.

EGreg2y ago

They likely mean it’s good for latency and not necessarily for throughput.

I still think that graph databases are way better for this sort of thing.

rockwotj2y ago

They later note most of this traffic is going to a single postgres instance. Having all the keys go to the same range probably helps throughput because they can do a better job of grouping fsync. But that probably depends on the type of drives they are using (even fast NVMe benefit from locality).

declan_roberts2y ago· 14 in thread

It’s 2023. Why aren’t we using more characters from the utf-8 keyspace to make things like UUIDs use less characters?

chungy2y ago

You could, if you are only optimizing for display size. UUIDs are very infrequently presented as user-facing data, and pretty much every sensible system will be storing them as a 128-bit value, not the ASCII representation you see.

So really, what are you trying to optimize?

declan_roberts2y ago

UUIDs show up absolutely everywhere as strings in logs.

They're also often used as part of a URL parameter:

http://myservice/orders/<uuid> etc etc

chewbacha2y ago

UUIDs are 128-bits, not characters. The string representation is just for humans.

treve2y ago

I think it's a fair question, because yes you should store them as numbers, but they are still often sent in text formats and urls. Wanting a shorter representation is reasonable.

The easiest is probably to just base64 the binary representation of the 128 bit number, which results in a 128/6=22 character string, which is a bit smaller.

If glyph-length and not byte-length is more important you could go even smaller but I'm less sure if that's a good idea.

gabereiser2y ago

This is the way. Look not at the characters but at the hex.

shepherdjerred2y ago

I think the parent is saying that we can make UUIDs more human-readable by displaying the underlying 128 bits with a larger set of characters.

2 more replies

sapling-ginger2y ago

Base8192: https://alicecengal.github.io/uuid-hangul/

LAC-Tech2y ago

That's awesome. "Aesthetic, Cosmopolitan" LOL

larschdk2y ago

Not very useful. Always decodes to 00NaN-0NaN-0NaN-0NaN-000000NaN.

1 more reply

xdennis2y ago

Firstly, "it's current year" is never a good argument, but you seem to be confusing things (UTF-8 vs Unicode). UTF-8 can take as much as 6 bytes to encode a Unicode codepoint.

If you want to store UUIDs as compactly as possible you'd use 16 bytes.

If you want to store them as text, mapping them to Unicode would be a terrible idea because: many characters are from scripts you've never heard of, many characters look identical (Α vs A), many characters are decomposed and it can change the encoding if they're decomposed[1], &c.

[1]: https://en.wikipedia.org/wiki/Precomposed_character

__MatrixMan__2y ago

Because it's 2023. You're looking at the UUID through a giant pile of interpreters and renderers and buffers... plenty of opportunity to give each one a tartan without changing it in the data.

Or better yet, only decorate one after it has been clicked by the user, that way when it appears again elsewhere, it stands out. If you make each one pretty you'll have made all of them ugly when viewed together.

kiitos2y ago

UUIDs are 128 bits, or 16 bytes. They have infinitely many possible string representations. Those strings are not the value, they're a transformation of the value.

declan_roberts2y ago

The strings are how HUMANS not machines interact with the UUID. When your java stacktrace spits out a log with an UUID it's going to spit out a STRING because it's written for you and not the computer.

When you take that UUID and go start sniffing around internal systems you're going to copy the UTF-8 string representation.

1 more reply

LAC-Tech2y ago

we should display them as a monochrome block of 8*16 pixels.

erik_seaberg2y ago· 12 in thread

> first component (prefix) of the identifier is a sortable timestamp

> values generated are practically sequential

These statements aren’t strict enough to be relied on. Maybe you have engineered the hell out of your distributed clock scheme, and your IDs actually are completely monotonic, which is great. But you probably haven’t done that, which means conflicts will surely happen and you must handle them gracefully.

jhealy2y ago

I'm not the author but I work at the same company.

None of our systems require perfect ordering of IDs generated across our distributed system. Most of the system was built with random UUIDv4 identifiers so no code assumes the ID ordering is significant.

However, in much of our system recent data is frequently accessed while old data is rarely accessed. In that world, just having the IDs *approximately* clustered in creation order has been a huge performance boost for many queries, and we've seen significant reduction in postgres Write Ahead Log rates, because writes to UID indexes happen in a smaller number of pages.

sgarland2y ago

> In that world, just having the IDs approximately clustered in creation order has been a huge performance boost for many queries, and we've seen significant reduction in postgres Write Ahead Log rates, because writes to UID indexes happen in a smaller number of pages.

Thank you. I’m so tired of seeing the same groupthink on UUIDv4 trotted out - “it only matters if you have a clustered index, Postgres is immune!” The hell it is.

XCSme2y ago

Just curious, if there is no ordering of the data (no sequential ID), how is it even "ordered" on the file system? Is it random? Or, even without a sequential ID can you know which data was written earlier and which later?

__MatrixMan__2y ago

When I'm using integer surrogate keys to manipulate subsets of a table, they usually correspond to some kind of out-of-band predicate. Like this is the data that appeared today, or this is everything after the bug happened, etc.

Maybe there are applications where the monotonicity matters, but in my experience reasoning by surrogate key is rather coarse grained and you manually scrutinize the boundaries, so unless your clocks are quite wrong, your worries are probably better placed elsewhere.

erik_seaberg2y ago

I’ve seen too many people write a query like

  where ts between txn_start and txn_end
  order by ts

and not even realize that what they’re seeing is incomplete and misleading. Clock skew is very common, and we shouldn’t sweep that under the rug to promote time ordering, because people want to believe this works the way they think.

1 more reply

LAC-Tech2y ago

I feel like it's well beyond the scope of UUIDs to get into "are your clocks really monotonic?". They give you 48 bits for a timestamp, what that timestamp signifies (transactional time, valid time...) and how that's generated is up to you.

Out of curiosity, are you into hybrid logical clocks?

erik_seaberg2y ago

I only raise this when I see promises that IDs are ordered and can be sorted, not merely grouped by approximate time for storage locality.

Yeah, though I’m more likely to go with a region ID and monotonic version number to compare-and-set and verify gapless data, where versions from different regions aren’t comparable. Actually I think earlier UUID RFCs talk about a “clock sequence” to distinguish timestamps from separate monotonic sources, but this paper doesn’t bring that up (or mention multiple clocks at all).

1 more reply

Dylan168072y ago

Well yes, but were they even implying that? Even if it was infinitely strict, clocks being perfect, two server processes can touch the same data at the same time.

In other words: Sorting by millisecond-or-so is just as good as sorting by picosecond in most situations. The reason you have to deal with conflicts gracefully isn't particularly because timestamps can be imperfect.

ianbutler2y ago

This was my first reaction as well. The keys use a unix timestamp, which are clearly not going to be synchronized by default so for event ordering purposes across a distributed system this is dodgy.

For providing better query locality it probably doesn't matter significantly though which seems to be the main benefit here while preserving the other benefits UUIDs provide.

QuadrupleA2y ago

But the timestamp is less than half the bits. The rest are random. So timestamp conflicts don't matter.

erik_seaberg2y ago

By “conflict” I don’t mean a UUID collision, I agree with the logic that 2^128 is so much entropy that memory corruption is the more likely culprit.

I mean that you can’t rely for correctness on time(X) < time(Y) when X happened before Y. It’s damn hard to keep two commodity server clocks within ±1 ms of each other even within a single LAN, and across production you’re more likely to see ±10 ms, or worse if your sysadmins don’t realize you intend to bet the farm on no clock skew.

1 more reply

oittaa2y ago

https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc...

pyrolistical2y ago· 7 in thread

And you can use it today with Postgres uuid type. Postgres doesn’t care what you store in it as long as it has the correct length. So you can generate a uuidv7 and store it natively

hardwaresofton2y ago

Yup this is one of the reasons I put together a light extension for this:

https://github.com/VADOSWARE/pg_idkit

There are a lot of options for UUID extensions (lots of great pure SQL ones!), but I wanted to get as many ID generation strategies in one place

Also note that native UUID v7 is slated to land in pg17:

https://commitfest.postgresql.org/44/4388/

perfmode2y ago

What are the benefits of using the Postgres uuid type (versus using TEXT or VARCHAR)?

gabereiser2y ago

Stored in binary format, validation, more efficient due to non-cast, faster access due to non char*, being able to split the high-low, indexing and uniqueness at the byte level.

1 more reply

jvolkman2y ago

It's 16 bytes versus 36 bytes.

1 more reply

thangngoc892y ago

It’s stored in binary format (16 bytes) instead of text (36 bytes)

jolux2y ago

Wouldn’t the index types need to be updated to support ordering on UUIDs?

jpgvm2y ago

No, because the ordering is purely byte level which will work perfectly fine here.

1 more reply

jiggawatts2y ago· 7 in thread

It seems insane to me to “validate” GUIDs/UUIDs.

Half the point of these things is that they’re treated as opaque identifiers.

threeseed2y ago

Because in SPAs if a user creates new entities it can be easier to generate the UUIDs client side.

So then just a simple validation server side to ensure the data isn't malicious.

devoutsalsa2y ago

Never trust the client.

1 more reply

kijin2y ago

If UUIDv4 was all that ever existed, there would be no need to validate anything apart of the fact that it's supposed to contain 32 hexadecimal characters.

All other versions, including the new v7, attach meaning to certain bits of the identifier. That cat has been out of the bag for a long time, so now everyone needs to maintain code to ensure that some rogue node doesn't spew back-dated identifiers belonging to the wrong department.

hawski2y ago

UUIDv4 attaches meaning to certain 6 or 7 bits (depending on a variant) of the identifier. UUIDv4 is a UUID after all.

groestl2y ago

UUIDv4 also contains some bits with meaning. But joke's on them, I tend to even randomize the version bits and call it UUIDv0.

masklinn2y ago

I’d assume the validation is parsing the uuid with a uuid library (to decode it), and the library eagerly validates the version field, either to check for garbage or because it wants to yield a different subtype for each version.

philsnow2y ago

but why decode it at all, if it's meant to be opaque?

2 more replies

Lazare2y ago· 7 in thread

UUIDv7 is a nice idea, and should probably be what people use by default instead of UUIDv4 for internal facing uses.

For the curious:

* UUIDv4 are 128 bits long, 122 bits of which are random, with 6 bits used for the version. Traditionally displayed as 32 hex characters with 4 dashes, so 36 alphanumeric characters, and compatible with anything that expects a UUID.

* UUIDv7 are 128 bits long, 48 bits encode a unix timestamp with millisecond precision, 6 bits are for the version, and 74 bits are random. You're expected to display them the same as other UUIDs, and should be compatible with basically anything that expects a UUID. (Would be a very odd system that parses a UUID and throws an error because it doesn't recognise v7, but I guess it could happen, in theory?)

* ULIDs (https://github.com/ulid/spec) are 128 bits long, 48 bits encode a unix timestamp with millisecond precision, 80 bits are random. You're expected to display them in Crockford's base32, so 26 alphanumeric characters. Compatible with almost everything that expects a UUID (since they're the right length). Spec has some dumb quirks if followed literally but thankfully they mostly don't hurt things.

* KSUIDs (https://github.com/segmentio/ksuid) are 160 bits long, 32 bits encode a timestamp with second precision and a custom epoch of May 13th, 2014, and 128 bits are random. You're expected to display them in base62, so 27 alphanumeric characters. Since they're a different length, they're not compatible with UUIDs.

I quite like KSUIDs; I think base62 is a smart choice. And while the timestamp portion is a trickier question, KSUIDs use 32 bits which, with second precision (more than good enough), means they won't overflow for well over a century. Whereas UUIDv7s use 48 bits, so even with millisecond precision (not needed) they won't overflow for something like 8000 years. We can argue whether 100 years is future proof enough (I'd argue it is), but 8000 years is just silly. Nobody will ever generate a compliant UUIDv7 with any of the first several bits aren't 0. The only downside to KSUIDs is the length isn't UUID compatible (and arguably, that they don't devote 6 bits to a compliant UUID version).

Still feels like there's room for improvement, but for now I think I'd always pick UUIDv7 over UUIDv4 unless there's an very specific reason not to. Which would be, mostly, if there's a concern over potentially leaking the time the UUID was generated. Although if you weren't worrying about leaking an integer sequence ID, you likely won't care here either.

kijin2y ago

100 years sounds short-sighted for something that's supposed to be "universally" unique. We're already having problems with the 32-bit Unix timestamp not being large enough. If you're willing to use 160-bit (or longer) identifiers, you might as well give a few more bits to the timestamp. Round it up to an even number of base-62 characters, too. That part of KSUID has always struck me as a weird decision.

I wish UUIDv7 pulled the version/variant bits up front, though, just to make sure that the identifiers don't all start with null bytes.

wolletd2y ago

Apparently, humanity is damned to repeat it's mistakes over and over again.

"100 years should be enough" is what led us to a mountain of Y2K issues, because when would a two digit year ever be ambigious?

But I guess it's a psychological issue. Unless you're a megalomaniac, it's just natural to assume that your decisions won't matter much outside of your life and lifetime. And in that case, 100 years totally is enough because I probably won't live that long. And even more, in a lot of cases, it's also the correct assumption and the project won't live longer than a few years.

So, thinking about it, unless you are developing a novel standard or something that you want the world to adopt, 100 years probably IS fine. Unfortunately, KSUID wants to be a novel standard, so there's an issue.

8organicbits2y ago

The timestamp is first.

https://www.ietf.org/archive/id/draft-peabody-dispatch-new-u...

jsf012y ago

If the version bits were up front, then switching to a hypothetical UUIDv8 in several years would be guaranteed to break the sortability. So I see that decision as a bit of future proofing.

kiitos2y ago

Second precision is too coarse for many (most?) use cases.

travisjungroth2y ago

How so? It seems like the only real use case for these timestamps is to get data from around the same time together. A second is fine for that. It's not about concurrency or avoiding collisions. A second can't handle that, but neither can a millisecond.

1 more reply

contravariant2y ago

If you need more than second precision then millisecond doesn't get you much further. The fact that the epoch ends in 120 years is a bit more worrying, but is also just about non-critical enough that it will be ignored for at least the next century.

Also, to all future historians of 2150, sorry about the mess, but yes we knew this was going to happen. Whatever it was.

1 more reply

phkahler2y ago· 5 in thread

Is there some reason new versions of UUID keep appearing? It seems like the desired properties are never quite achieved so new ones appear later. Is there a table with UUID version across the top and characteristics down the side, so I can see the differences and pick one that fits my needs? That might also help to explain why there are so many variants.

ricardobeat2y ago

Unless you have specific needs, the only type of UUID you should care about is v4.

v1: mac address + time + random

v4: completely random

v5: input + seed (consistent, derived from input)

v7: time + random (distributed sortable ids)

kozak2y ago

As someone who only cares about v4, I periodically wonder why don't I just use fully random 128-bit identifiers instead (without the version information).

3 more replies

xcrunner5292y ago

It would seem sequential keys for database performance is more than a 'specific' need.

1 more reply

epcoa2y ago

v4 being completely random has terrible properties even on a non distributed database. Probably better to use v7.

bhouston2y ago

https://en.wikipedia.org/wiki/Universally_unique_identifier#...

andersa2y ago· 4 in thread

> We use sequential primary keys for efficient indexing, and UUID secondary keys for external use. The upcoming UUIDv7 standard offers the best of both worlds

Unless you consider users being able to extract the generation time from the id to be an issue, of course.

giancarlostoro2y ago

I've been seeing a few different vendors do this already. MongoDB's ObjectIds are inherently timestamps (so you can actually generate generic MongoDB IDs to query based on time). There's also Discord's Snowflakes as well. I'm sure there's loads of others. All it tells you is when something was generated, not much else. I do love how MongoDB has it stored in such a way that it is easy to query against. I wonder if any RDBMS' will allow you to query these timestamps as well.

andersa2y ago

There are definitely many cases where it isn't an issue since you were going to tell the user the time anyway (like sent time on a message)

2 more replies

contravariant2y ago

And if you consider knowledge of the id sufficient for access.

Which, despite the fact that it really shouldn't be, still seems to occur every so often. Even in situations where the ids are very much not random.

Honestly if I have to read one more article about a 'hacker' who 'leaked' some secret government piece ahead of time because they thought to increment the date in the url of some yearly report, I'm going to lose my mind.

mort962y ago

I don't understand. What part of this requires that one considers knowledge of the ID sufficient for access? And what kind of access are you talking about?

The performance benefits of index friendly user IDs seem like they would apply even if all user info is secret and requires a token to access... The application still has to look up the user by ID after all?

If I imagine a basic authenticated "get information about me" style endpoint, that would take a user ID and an authentication token. Checking if the token is valid is faster if the user ID is index friendly. Getting the requested information is faster if the user ID is index friendly. Yet a user of the API still needs both the user ID and a token to access anything.

1 more reply

dajonker2y ago· 4 in thread

Similar to the old situation in the article, we are using sequential 64 bit primary keys, but we use an additional random 64 bit key for external usage (instead of 128 bit).

The external key is base64 encoded for use in URLs which results in an 11 byte string.

This hides any information about the size of the data, the creation date of customer accounts (which would be sort of visible with UUIDv7) and prevents anyone from attempting to enumerate data by changing the integer in URLs.

I thought about using UUIDs as external keys but the only compelling use case seems to be the ability to generate keys from many decoupled sources that have to be merged later.

64 bit should be enough for most things https://youtu.be/gocwRvLhDf8?si=QBheJCG21bAAV0Z7

Kunix2y ago

I am using a variant of SnowflakeId [^1] in order to have 64 bit keys too.

It's similar to UUIDv7 (it leaks the creation time), but it's not an issue for me.

So I am able to have a single 64 bit key, which can easily be formatted into a small string for user-facing urls.

[^1]: https://instagram-engineering.com/sharding-ids-at-instagram-...

Waterluvian2y ago

It sounds like you basically just made your own 64 bit UUID. If you’re exposing this ID for manual use by a human (like URLs) then that sounds pretty helpful to be shorter!

RhodesianHunter2y ago

What's the risk of collisions with your external ID in this scenario?

sealeck2y ago

I would imagine that they enforce this using e.g. a unique constraint in their database.

1 more reply

LAC-Tech2y ago· 3 in thread

Why use UUIDv7 over ULIDs?

As Lazare points out in this thread they're basically the same thing, except with ULIDs you get those 6 extra bits of randomness back that UUIDs have to use for metadata.

oittaa2y ago

https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc...

ULID isn't an "official" standard like UUID. Having a real standard usually promotes interoperability and makes it easier to use. Additionally as others have pointed out you can already use UUIDv7 with some databases since it's just 16 opaque bytes and the database doesn't care what's actually in the UUID field.

LAC-Tech2y ago

How much of a standard do ULIDS need? 6 byte timestamp, 10 bytes crypto randomness, stringify it using crockfords base 62 - and off we go.

1 more reply

jhealy2y ago

I'm not the author, but I work at the same company. ULIDs are nice, but we're a 10 year old company with many TBs of data across multiple logical databases and most rows had UUIDv4 ids.

Maybe if we were starting from scratch ULIDs would have been an option, but given where we were UUIDv7 was a much easier transition.

amanzi2y ago· 3 in thread

Can you take the first portion of the UUIDv7 string, and decode it to figure out the exact date and time that record was created? I'm wondering if there might be security/privacy concerns in some situations if the UUID codes are visible in your app?

afavour2y ago

Yeah, years from now we’re going to see some story about how a company fudged their timestamps in order to get away with X, only to be given away by the timestamp hidden in public UUIDs.

jonhohle2y ago

I just commented the same thing. I can't imagine most applications would want to leak time information in their identifiers but these undoubtedly will be used most placed out of convenience. In a year or so we'll read about an attack and everyone will migrate back to v4 or have to maintain a cryptographic identifier in addition to their temporal identifier.

riffraff2y ago

Most application may not want it, but will it hurt them?

I mean this is a similar concern to sequential IDs: many apps do not want to leak them, and in some cases it might cause issues, but in general it doesn't matter.

toni88x2y ago· 3 in thread

IMO the benefit of UUIDs over integers is that they can be generated client-side without clashing. But you cannot trust timestamps generated by clients and therefore the order. So what is the benefit over UUID4?

frederikb2y ago

For me the central benefit is that you can create them in a distributed manner and are not reliant on a central system as a single source of truth for creating your identifiers.

I can therefore easily generate a new UUID in a trusted backend service which just accepts the command received from the untrusted client and then forwards the request for asynchronous processing while returning the UUID to the client. This is a typical architecture and the only change is that I can now create UUIDs which may have performance benefits, depending on the data storage technology of my read models.

If you need to create the UUIDs on the client side to support specific requirements such as offline-first, then I would indeed consider adding some reconciliation which replaces the IDs provided by the client-side by new ones generated by a trusted component as soon as synchronizing takes place.

frederikb2y ago

In any case regardless of UUIDv4, v7 or any other format you should not allow the untrusted client to determine the real ID - as long as there is at least one trusted component in the architecture which would take over this role. This should help eliminate a whole set of possible security issues.

XCSme2y ago

Would using the timestamp in the UUID be possible for date-range queries?

Traubenfuchs2y ago· 3 in thread

Am I the only one instinctively upset by the communication/bandwith/storage overhead of the dashes as well as the version and variant bits of UUIDs?

It might be insignificant, but to me it makes UUID feel tainted, dirty. 11.1% of a UUID are dashes. 15.3% of a UUID are wasted bits if you count version and variant bits.

Anecdote: I worked for a company that used numeric primary ids internally and externally and increased the primary key by TWO to THREE for each new customer to make it appear to the outside world we had twice to three times the rate of customer growth.

rossant2y ago

The dashes do not take up any space, they are not encoded. I think it's fairly common to reserve some space to versions and ECC in uuids, packets etc. That's a price to pay to avoid many bugs and compatibility issues.

Traubenfuchs2y ago

UUIDs are often pushed around in JSON and as string types, including the none random parts.

1 more reply

throwawayb4dc652y ago

The dashes are just there to help separate the groups visually. You can write code to remove/add the dashes if you want to shorten the URL. If you use the uuid data type when storing it in a database, it only uses 128 bits.

wolverine8762y ago· 3 in thread

> the random nature of standard non-time-ordered UUIDs (such as v4) can create database performance problems when used as primary keys. This problem is often referred to as poor database index locality.

Couldn't that be solved with incremented serial numbers, rather than leaking time data?

FPGAhacker2y ago

Yes. But I think part of the design requirements is minimizing coordination required among distributed nodes.

This solution attempts to solve the sort-ability issue of current uuids by moving the timestamp to the most significant bits.

Ginden2y ago

Incremented serial numbers can leak things like eg. volume of sales in your shop.

wolverine8762y ago

So increasing serial numbers with random gaps?

danwee2y ago· 3 in thread

So, how do you guys use UUIDs for real? I worked in a company in which they were using UUIDs in Mongo, and of the most painful things were implementing API endpoints that filter resources. Imagine you have an endpoint in which you are filtering by resources A, B, C and D. Ideally you would end up with something like this:

    GET /filter?a_id=X&b_id=Y&c_id=Z&d_id=w

But in practice we were using POST and passing the ids in the body payload. Why Because my old team said "the UUIDs are long, so we may reach the maximum URL length if we pass them as parameters". I didn't like it, and I still don't like it at all.

dgellow2y ago

Do you dislike the use of POST because of its effect on caching, or because it doesn’t feel semantically correct? If you don’t need the caching I wouldn’t mind using a POST to filter.

Note that you can also use GET with a body, it’s not spec compliant (a body is allowed but not supposed to have any meaning) but is used by products such as Elasticsearch. If you control both clients and servers that’s something you can safely do (and use an etag header for idempotency).

dgb232y ago

The maximum url length is typically quite long.

Another thing is that you don’t necessarily need to encode uuids canonically. They are just u128’s. It’s relatively straightforward to find a url friendly string representation that is shorter.

danwee2y ago

> It’s relatively straightforward to find a url friendly string representation that is shorter.

Are we talking about shortening the whole URL or shortening specific UUIDs? If the latter then I imagine one would still need to keep track of the mapping UUID <-> shorten version, somewhere, right? If so, why not just add yet another field/column for an old good numeric integer that can be used for filtering? Would that work?

2 more replies

coolgoose2y ago· 3 in thread

I am confused how this is new. UUIDv1 is time based, you just need to be careful about entropy, and in MySQL 8 you can for a longish time use it as an ordered field.

8organicbits2y ago

The use of a MAC address and fine grained timestamp are challenges of UUIDv1.

https://blog.devgenius.io/analyzing-new-unique-identifier-fo...

coolgoose2y ago

Sure, hence why I said entropy, but it's not like you can't use it :)

oittaa2y ago

And the crazy epoch instead the more known Unix epoch. Why would anyone want to create UUIDs around year 1500?

samatman2y ago· 3 in thread

Relying on timestamps to be sortable, when clock skew and ntd guarantee that they won't always be, strikes me as poor design.

If you need to sort by insert order, use an autoincrementing integer, if you need uniqueness, UUIDv4 is fine, if you need both use both.

Use timestamps when you need to record the time, just don't commit the sin of presuming that clock time will never run backwards, I assure you, it does.

jmmv2y ago

The problem they describe is not about sorting: it’s about data locality. And for the latter, clock skew should not be a problem. Even with significant clock skew, data will end up clustered anyway, much better than with a random spread.

samatman2y ago

You get index locality with an autoincrement also, and it will actually reflect insert order. My point is that a timestamp won't do that, and worse, it will appear to most of the time. The failure can be fairly spectacular, a Unix clock can be set to any time at all, and it's good when the resulting bugs are limited to time. Having an actual insert order can be a real boon to figuring out what happened.

I hold to the principle that relational data should be normal, and combining uniqueness with a timestamp doesn't do that. To do any of the calculations we use timestamps for, you have to strip off the entropy, this complicates pushing it down to the database level, where the libraries don't expect such conflation.

You're going to have a bad time writing something like a join across tables with a restricted range of time if your time is embedded in UUIDv7.

I maintain this is good advice: if you need index locality and insert order, use an autoincrement. If you need to record and work with time, use a timestamp. If you need global uniqueness, you can use any of the UUIDs, but v4 is the one that doesn't conflate uniqueness with unrelated properties, and should be preferred.

If you think your need data locality but not insert order, think long and hard about what you're doing, because odds are you're wrong. If it turns out you're right, and the OP might be in that situation, sure, go ahead and use UUIDv7.

Just, please, for the sake of your future self and everyone you work with, don't use a timestamp for insert order. Ever.

1 more reply

atonse2y ago

That's still fine. Because even if there's skew and other such things going on, it's more likely to take advantage of cache locality since the page in the index that key would be stored in, is much more likely to still be in memory.

dgb232y ago· 2 in thread

As a beginner I treated and understood (SQL) databases as something I have to use in order to store stuff.

Later I was excited about the power and expressiveness of SQL and its extensions. There is a ton of leverage and you can make it so that interfacing with it directly becomes much more useful.

However now I’m in a different phase. I see it as a durable data structure. I think in terms of “what does it provide to make the overall system better?”

The issues around indexing and uuids that is discussed in the article fits nicely into this line of thinking.

In web development, database access and performance often dominates and infects the whole system.

foreigner2y ago

As a beginner I thought of the database as a backend for the app. Now I think of the app as a frontend for the database. :-D

kuchenbecker2y ago

Crud, you're right!

pknerd2y ago· 2 in thread

Speaking of RDBMs, how good are UUIDs when making joins and fetching a certain record?

nevir2y ago

More or less identical to integer ids - they're stored and referenced as a 16-byte integer.

Unless you're manually storing them as strings... (Not ideal, but most dbs are pretty good at dealing with that too)

pknerd2y ago

so I have to pick a certain MySQL type?

jsf012y ago· 2 in thread

How long will it be before the “milliseconds since epoch” part of the uuid overflows or repeats?

jolmg2y ago

  $ date -ud @$(( 256 ** 6 / 1000 ))
  Tue Aug  2 05:31:50 AM UTC 10889

birracerveza2y ago

Well, at least it's not a Friday.

dataangel2y ago· 2 in thread

why bother with any version of the uuid standard? just generate a random 128-bit number and use it. that's all the newer ones are anyway

lelanthran2y ago

> why bother with any version of the uuid standard? just generate a random 128-bit number and use it. that's all the newer ones are anyway

Good question.

Won't random 128-bit numbers actually be superior to UUIDs in every way except predictability?

mholt2y ago

Sorry to be this guy, but did you read the article? Only UUIDv4 is "just generating a random 128-bit number" (almost) -- and there are valid reasons that's not a good choice. Which the article explains. :) Like database insertion performance. Sequential IDs have benefits.

wvh2y ago· 1 in thread

A few years back, I wrote some code that generates a sortable 128-bit UUID-like identifier starting with a milliseconds-since-epoch timestamp, a node number and a random byte tail. It has been working fine in Postgresql, using its builtin UUID type. I suppose downstream system have been using the string representation though. The main reason for going such an identifier was being able to generate them from different, non-centralised places. A nice side effect is that you can't accidentally get an erroneous ID that happens to work the way you can with a sequential integer primary key.

For another project, I've also used sortable 64-bit snowflake-like identifiers; they have the added benefit of being able to use 64-bit integer representation in code and database identifiers, even if you might want to externally represent them in base58 or similar encoding.

The original UUID types aren't as useful as they once were, so it'd be worth writing a new RFC and extending those original types.

ahoka2y ago

Isn’t that almost the same as v1?

tzahifadida2y ago· 1 in thread

To me it sounds like a corner case. Example:

a) UUID4, CreatedTime/UpdatedTime.

b) Bigint, CreatedTime/UpdatedTime.

c) UUID7 internal (which also includes time badly), UUID4 external/whatever short ID.

How exactly this helps if you need external ids (which you usually do today)? It doesn't even make it a short ID.

Even if there is a corner case, are we just saving a few bytes while adding more complication?

Clustered Index is a myth in PostgreSQL, not practical since you have to run a special program to reorder. So, a regular index might suffer but not really. Why? Because I am not ordering by the ID most of the time, I am ordering by "Created Date/Updated Date" or Name or whatever. Who cares about ordering IDs?

WAIT!!! But what about Next Tokens? ok, these are painful, but easily solved: Next can be (>=Created Date,>ID). Same result. Pagination, stays the same since it is sorted by Created Date.

otherme1232y ago

I understood it as c) only UUID7, no secondary external UUID.

The external Id is used instead of Bigint because you don't want your external users to query 1, then 2, then 3 (IDOR)... But the random part of the Uuid7 makes this impossible.

Uuid7 isn't a substitute for Created/Updated, but a substitute for the dual field Uuid4/Bigint.

zooFox2y ago· 1 in thread

One benefit of an epoch is that it's easily readable (or comparable, at the very least). I am not sure I can read epoch in hexadecimal format though.

okl2y ago

Need a new clock? https://retr0.id/stuff/2038/

xarope2y ago· 1 in thread

I am just about to wrap up some prototyping comparing snowflake, typeids, uuidv4 and ulid. Why did I not bump into uuidv7 earlier?!?

RobIII2y ago

Don't know, because uuidv7 has been coming for ages... https://www.ietf.org/archive/id/draft-peabody-dispatch-new-u...

jimmySixDOF2y ago

Discussion here a couple months ago :

Analyzing New Unique Identifier Formats (UUIDv6, UUIDv7, and UUIDv8) (2022) https://news.ycombinator.com/item?id=36438367

dkubb2y ago

If it helps anyone, at work, I open sourced the UUID v7 postgresql function that I wrote: https://github.com/Betterment/postgresql-uuid-generate-v7

We've seen some amazing benefits, especially around improving the speed of batch inserts.

jakewins2y ago

A useful/horrifying pattern on this topic: you can use UUIDv1 as a prefixed id, giving you a way to generate tagged IDs in a system that uses UUIDs.

You set the node field to a broadcast MAC address, and use that as a namespace/prefix. This inches close to the boundary of the RFC, but is arguably compliant.

As an example, you may generate demo or “canary” data items that are UUIDv1s with a well known node field, which then lets you do distributed “isDemoData()” checks by just looking at the UUID.

user39393822y ago

It’s nice for front end state. You post the new entity, the front provides the ID, and as long as you get a 200 you can update your state, or update optimistically and roll it back. You don’t need to wait for the API to figure out what your ID is.

tzahifadida2y ago

Not specifically the topic, but I looked for a library for golang and it is not that common, there is a library in <20 stars, too experimental for me. Also, not sure the postgresql extension is in the main distribution, couldn't find it if it does. For example, GCP only supports this one IIUC https://www.postgresql.org/docs/current/uuid-ossp.html Java has something, but again not really clear how tested. So using this is a bit iffy...

0pteron2y ago

Is it not the case that having 128 bit primary keys take up 4 times as much memory as 32 bit integers when keeping the indices in RAM? I guess if you need the index to be clustered by time and also need a unique identifier in most queries then UUIDv7 fits your use-case but I still think having integer for the primary key will fit most use cases and be more efficient

rockwotj2y ago

The first time I heard about ordered string IDs was Firebase’s push IDs. They had an interesting solution to also address time skew to get better ordering for drivers: https://firebase.blog/posts/2015/02/the-2120-ways-to-ensure-...

jug2y ago

Haha this is what we came up with for our home brewn unique ID's in a GIS application since decades ago. For the same reasons.

insanitybit2y ago

> The nature of Buildkite's products mean recent data is accessed more frequently than old data. With non-sequential identifiers, the most recent data will be randomly dispersed within an index and lack clustering

I would assume that `serial` would solve this problem too.

mooreed2y ago

Feels like a spiritual successor to the ksuid [1] lib which I first heard of used in conjunction with DynamoDB

[1]: https://github.com/segmentio/ksuid which has very similar use cases.

miiiiiike2y ago

This is neat. I've been using a custom snowflake cluster for years. Having this in the language/DB would be great for smaller projects.

For bigger/public projects I'd like to be able to add a sequence, node, and data center id to the UUID too.

gwbas1c2y ago

Anyone ever try encrypting a database ID (IE, a sequential int,) and use that as a public key?

IE, take a 32 or 64 bit int that's the primary key, encrypt it, and then use that as the public ID in a web application, URL, API, ect.

hknmtt2y ago

I have been using ULID for years. Using digits now would feel very strange.

Pxtl2y ago

Frustrating, I looked up MS/C#'s implementation and they don't get stored in a proper semisequential fashion in MS SQL Server because MS stores UUIDs in an odd binary format.

eviks2y ago

Always wondered what the point of dash-separating uuid if the separated parts are unreadable anyway just like in this version, just makes it harder to select as a single blob of text

perfmode2y ago

I chose ULIDs for a recent project.

Hope it won’t bite me in the future.

markcollin2y ago

Interesting - have beem using uuidv4 for a long time. Will explore further on uuidv7

JCharante2y ago

I wonder who this article is written for. Who would be reading about UUIDs but not know about cache hit rates?

> As a result, retrieving the most recent data from a large dataset will require traversing a large number of database index pages, leading to a poor cache hit ratio (how many requests a cache is able to fill successfully, compared to how many requests it receives).

j / k navigate · click thread line to collapse

363 comments

202 comments · 44 top-level

jonhohle2y ago· 41 in thread

oconnore2y ago

128 bits -> 128 bits

andix2y ago

2 more replies

dragontamer2y ago

Why not just use the AES-128 result as the UUID then? What's the benefit of the internal structure at all?

If AES-128 is an acceptable external UUID (and likely an acceptable internal one), then you might as well just stick with a faster RNG.

3 more replies

lysium2y ago

Neat idea.

I’m afraid you won’t be able to ever rotate that key, would you? Since it’s result is externally used as an identifier, you would have to rotate the external identifiers, too.

3 more replies

dajonker2y ago

1 more reply

MattPalmer10862y ago

This seems overly complex, and you need some kind of key too.

Why not just hash it with pretty much any hash function?

2 more replies

fodkodrasz2y ago

No IV, ECB mode... why bother with encryption at all? Just expose the internal id.

5 more replies

nhoughto2y ago

Edit: you can even do 64bit snowflakes internally to 128bit AES encrypted externally, doesn’t have to be 128-128 obvs

1 more reply

nindalf2y ago

One key for all tokens or one key per token? If it’s the latter a simple XOR would do because it would be the equivalent of a one time pad.

2 more replies

noduerme2y ago

To that end, I think it's neat to be able to improve indexing on UUIDs, but it's not a security solution.

dalore2y ago

This was used in the war to estimate the number of German tanks based on the sequential IDs

https://en.wikipedia.org/wiki/German_tank_problem

So just for business intelligence you don't want to leak your IDs.

4 more replies

wolletd2y ago

Just friday I've had a discussion with a colleague about filenames.

We do a lot of computer vision and in his project, each processed object is assigned a UUID and he wanted to save images to files for each one.

berkes2y ago

Doesn't that still leak (statistical) information?

It may not be technically security, but e.g. knowing your competitor just added N products to their shop, might be a security issue for the business.

2 more replies

Nevermark2y ago

Security by obscurity is a necessary step in most software security.

It hardens, completes and complements other measures.

Examples of every day security using obscurity: every password and encryption key

EDIT: Thanks for the replies.

Ignore above!

Obscurity is the low bit of security. But when it’s convenient, it still helps.

2 more replies

vidarh2y ago

You should think of them as public, but that doesn't mean it isn't still helpful to obscure aspects of the information they carry.

If you're already thinking about the implications, you can likely ensure people doesn't jump to the conclusion that the IDs can be trusted just because they look complex.

ivan_gammel2y ago

Security by obscurity is a working solution if implemented with other measures. It increases the cost of attack, which in the presence of unknown vulnerabilities gives you precious time to respond.

imiric2y ago

I'm a fan of Cuid2[1] for this reason.

They are compact, don't leak information, and make a good case why k-sortable IDs are unnecessary, or even harmful for performance.

I'm using sequential integers and created_at/updated_at timestamps for internal use, and Cuid2 IDs externally.

[1]: https://github.com/paralleldrive/cuid2

tveita2y ago

> But not too fast: If you can hash too quickly you can launch parallel attacks to find duplicates or break entropy-hiding. For unique ids, the fastest runner loses the security race.

> Cuid2 has been audited by security experts and artificial intelligence, and is considered safe to use for use-cases like secret sharing links.

And there is currently no publicly available "artificial intelligence" that would be useful in a security audit, unless you want to call fuzzers "AI".

1 more reply

sgarland2y ago

> One reason for using sequential keys is to avoid id fragmentation, which can require a large amount of disk space for databases with billions of records.

> the ids will be generated in a sequential order, causing the tree to become unbalanced, which will lead to frequent rebalancing.

ndriscoll2y ago

Nearly everything in this README about security or performance is wrong. I'd be very wary of using this.

BiteCode_dev2y ago

What benefice over uuid4 ?

2 more replies

BillinghamJ2y ago

WorldMaker2y ago

Someone2y ago

One business implication is that third parties can detect whether your sales increase or decrease from sampling those IDs (a variation on https://en.wikipedia.org/wiki/German_tank_problem)

1 more reply

rtsil2y ago

Macha2y ago

1 more reply

pnpnp2y ago

They’re also incredibly cheap to create & don’t need knowledge of each other. I mostly see batching of IDs like this when a lock is involved to prevent collisions & maintain performance.

With UUIDv7, you are reasonably sure that there won’t be collisions (check your use-case first), and can just generate them wherever on-demand (no locks required).

I’d argue batching IDs is actually more complicated than UUIDv7 for most use-cases.

1 more reply

0xEFF2y ago

Every access and id token issued by oidc already has an issued at (iat) and expiration (exp) fields.

throwaway1672y ago

Just randomise your clock.

Persistent IDs are a security and information risk. If that's a concern, don't persist IDs.

greatgib2y ago

Fyi, the timestamp is already encoded inside uuid4. But for having a good distribution of values, the low bits half of the timestamp is stored before the high bits half.

Here uuidv7 will just re-order that. So the content of the uuid in itself does not change.

mafuy2y ago

No, this was the case in earlier uuids. In v4, there is no timestamp.

1 more reply

woile2y ago

Could you explain a bit more how it would be a risk? Maybe for session tokens is understandable. But why leaking account created info is a problem?

logicchains2y ago

2 more replies

andrewmatte2y ago

jonhohle, thanks. Do you know of examples of when milliseconds are part of the session tokens or accounts being created has been exploited?

tgv2y ago

1 more reply

BWStearns2y ago

1 more reply

tarjei_huse2y ago

I know of people who used leaked customer ids in public facing chatbot solutions (like Intercom) to estimate how fast their competitors were growing and/or how many customers they had.

1 more reply

rvnx2y ago

It’s as convenient as manipulating IPv6 addresses.

tabtab2y ago

Plus ID's often have to be given over the phone for support. Long or complicated ID's will drive both sides bonkers. #BeenThereDoneThat.

jacobgorm2y ago

Not to speak of the increased risk of key collisions. https://en.wikipedia.org/wiki/Birthday_problem

hannasanarion2y ago

In the same millisecond, in the same database system, having rolled the same 74 bit (~22 digit) number?

1 more reply

rockwotj2y ago· 21 in thread

chacham152y ago

stepanhruda2y ago

Yes but if that machine with sequential data receives 100x the traffic of other machines, it can be worse than splitting this traffic evenly across all available machines.

2 more replies

berkes2y ago

We even sharded on these columns, because of this (our business case made it so that hardly ever did people need data over multiple months)

hinkley2y ago

If you're working on a multi-user system, particularly one with hundreds of requests per second, there is no locality of ids. Two of my actions are separated by a sea of actions by other users.

akira25012y ago

UUIDv7: Timestamp up front, random in the back.

wenc2y ago

I know HN doesn't like jokes, but this is really funny. And the subcomment about mullets too.

(for folks who don't get it, mullets are a 1980s haircut (think MacGyver) with a short front but a long tail in the back. A funny description of them is "business in the front, party in the back")

1 more reply

labster2y ago

Truly, the mullet of unique identifiers.

sj262y ago

It Depends(tm).

If you're using a system which is built for distribution, random is great.

When you're leaning on a Postgres database which has powered your startup through scaling but expects right-leaning btree indexes, it's a bad time.

Rearchitecting to use a new data store is ideal, but often impractical as an immediate step. UUIDv7 is a great increment walking that road via sharding etc.

fnordpiglet2y ago

jillesvangurp2y ago

gregw22y ago

An explicit shard id can ensure all related data across all tables can be on the same shard. Helpful for SQL JOIN operations.

Picking the N least significant bits only a single table has good distribution and sort qualities, no cross-table properties.

stingraycharles2y ago

This is especially useful when your underlying database stores data in large "chunks", such as LSM-trees you find with e.g. rocksdb.

tveita2y ago

It's great for performance, up until you reach the point where a single device becomes a bottleneck, at which point it's terrible for performance.

sroussey2y ago

It can be bad for performance due to how b-trees work in databases, and more pronounced when you have a clustered index.

dheera2y ago

It's bad for performance if you frequently access large consecutive sets of records.

hinkley2y ago

In both cases I'm melding highly disjointed data into a single schema. There are no large consecutive sets of records.

If you're using UUIDs, there's probably a reason. And that reason invalidates the justifications for not using them.

1 more reply

moralestapia2y ago

Agree, but the solution is easy peasy with a simple hash function.

(Or just reverse the bits, take the last n, etc)

conradludgate2y ago

[1]: https://en.m.wikipedia.org/wiki/Consistent_hashing

andix2y ago

There are many other options that usually scale better than random distribution. For example distribution by user or tenant id.

EGreg2y ago

They likely mean it’s good for latency and not necessarily for throughput.

I still think that graph databases are way better for this sort of thing.

rockwotj2y ago

declan_roberts2y ago· 14 in thread

It’s 2023. Why aren’t we using more characters from the utf-8 keyspace to make things like UUIDs use less characters?

chungy2y ago

So really, what are you trying to optimize?

declan_roberts2y ago

UUIDs show up absolutely everywhere as strings in logs.

They're also often used as part of a URL parameter:

http://myservice/orders/<uuid> etc etc

chewbacha2y ago

UUIDs are 128-bits, not characters. The string representation is just for humans.

treve2y ago

I think it's a fair question, because yes you should store them as numbers, but they are still often sent in text formats and urls. Wanting a shorter representation is reasonable.

The easiest is probably to just base64 the binary representation of the 128 bit number, which results in a 128/6=22 character string, which is a bit smaller.

If glyph-length and not byte-length is more important you could go even smaller but I'm less sure if that's a good idea.

gabereiser2y ago

This is the way. Look not at the characters but at the hex.

shepherdjerred2y ago

I think the parent is saying that we can make UUIDs more human-readable by displaying the underlying 128 bits with a larger set of characters.

2 more replies

sapling-ginger2y ago

Base8192: https://alicecengal.github.io/uuid-hangul/

LAC-Tech2y ago

That's awesome. "Aesthetic, Cosmopolitan" LOL

larschdk2y ago

Not very useful. Always decodes to 00NaN-0NaN-0NaN-0NaN-000000NaN.

1 more reply

xdennis2y ago

Firstly, "it's current year" is never a good argument, but you seem to be confusing things (UTF-8 vs Unicode). UTF-8 can take as much as 6 bytes to encode a Unicode codepoint.

If you want to store UUIDs as compactly as possible you'd use 16 bytes.

[1]: https://en.wikipedia.org/wiki/Precomposed_character

__MatrixMan__2y ago

Because it's 2023. You're looking at the UUID through a giant pile of interpreters and renderers and buffers... plenty of opportunity to give each one a tartan without changing it in the data.

kiitos2y ago

UUIDs are 128 bits, or 16 bytes. They have infinitely many possible string representations. Those strings are not the value, they're a transformation of the value.

declan_roberts2y ago

When you take that UUID and go start sniffing around internal systems you're going to copy the UTF-8 string representation.

1 more reply

LAC-Tech2y ago

we should display them as a monochrome block of 8*16 pixels.

erik_seaberg2y ago· 12 in thread

> first component (prefix) of the identifier is a sortable timestamp

> values generated are practically sequential

jhealy2y ago

I'm not the author but I work at the same company.

sgarland2y ago

Thank you. I’m so tired of seeing the same groupthink on UUIDv4 trotted out - “it only matters if you have a clustered index, Postgres is immune!” The hell it is.

XCSme2y ago

__MatrixMan__2y ago

erik_seaberg2y ago

I’ve seen too many people write a query like

  where ts between txn_start and txn_end
  order by ts

1 more reply

LAC-Tech2y ago

Out of curiosity, are you into hybrid logical clocks?

erik_seaberg2y ago

I only raise this when I see promises that IDs are ordered and can be sorted, not merely grouped by approximate time for storage locality.

1 more reply

Dylan168072y ago

Well yes, but were they even implying that? Even if it was infinitely strict, clocks being perfect, two server processes can touch the same data at the same time.

ianbutler2y ago

This was my first reaction as well. The keys use a unix timestamp, which are clearly not going to be synchronized by default so for event ordering purposes across a distributed system this is dodgy.

For providing better query locality it probably doesn't matter significantly though which seems to be the main benefit here while preserving the other benefits UUIDs provide.

QuadrupleA2y ago

But the timestamp is less than half the bits. The rest are random. So timestamp conflicts don't matter.

erik_seaberg2y ago

By “conflict” I don’t mean a UUID collision, I agree with the logic that 2^128 is so much entropy that memory corruption is the more likely culprit.

1 more reply

oittaa2y ago

https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc...

pyrolistical2y ago· 7 in thread

And you can use it today with Postgres uuid type. Postgres doesn’t care what you store in it as long as it has the correct length. So you can generate a uuidv7 and store it natively

hardwaresofton2y ago

Yup this is one of the reasons I put together a light extension for this:

https://github.com/VADOSWARE/pg_idkit

There are a lot of options for UUID extensions (lots of great pure SQL ones!), but I wanted to get as many ID generation strategies in one place

Also note that native UUID v7 is slated to land in pg17:

https://commitfest.postgresql.org/44/4388/

perfmode2y ago

What are the benefits of using the Postgres uuid type (versus using TEXT or VARCHAR)?

gabereiser2y ago

Stored in binary format, validation, more efficient due to non-cast, faster access due to non char*, being able to split the high-low, indexing and uniqueness at the byte level.

1 more reply

jvolkman2y ago

It's 16 bytes versus 36 bytes.

1 more reply

thangngoc892y ago

It’s stored in binary format (16 bytes) instead of text (36 bytes)

jolux2y ago

Wouldn’t the index types need to be updated to support ordering on UUIDs?

jpgvm2y ago

No, because the ordering is purely byte level which will work perfectly fine here.

1 more reply

jiggawatts2y ago· 7 in thread

It seems insane to me to “validate” GUIDs/UUIDs.

Half the point of these things is that they’re treated as opaque identifiers.

threeseed2y ago

Because in SPAs if a user creates new entities it can be easier to generate the UUIDs client side.

So then just a simple validation server side to ensure the data isn't malicious.

devoutsalsa2y ago

Never trust the client.

1 more reply

kijin2y ago

If UUIDv4 was all that ever existed, there would be no need to validate anything apart of the fact that it's supposed to contain 32 hexadecimal characters.

hawski2y ago

UUIDv4 attaches meaning to certain 6 or 7 bits (depending on a variant) of the identifier. UUIDv4 is a UUID after all.

groestl2y ago

UUIDv4 also contains some bits with meaning. But joke's on them, I tend to even randomize the version bits and call it UUIDv0.

masklinn2y ago

philsnow2y ago

but why decode it at all, if it's meant to be opaque?

2 more replies

Lazare2y ago· 7 in thread

UUIDv7 is a nice idea, and should probably be what people use by default instead of UUIDv4 for internal facing uses.

For the curious:

kijin2y ago

I wish UUIDv7 pulled the version/variant bits up front, though, just to make sure that the identifiers don't all start with null bytes.

wolletd2y ago

Apparently, humanity is damned to repeat it's mistakes over and over again.

"100 years should be enough" is what led us to a mountain of Y2K issues, because when would a two digit year ever be ambigious?

8organicbits2y ago

The timestamp is first.

https://www.ietf.org/archive/id/draft-peabody-dispatch-new-u...

jsf012y ago

If the version bits were up front, then switching to a hypothetical UUIDv8 in several years would be guaranteed to break the sortability. So I see that decision as a bit of future proofing.

kiitos2y ago

Second precision is too coarse for many (most?) use cases.

travisjungroth2y ago

1 more reply

contravariant2y ago

Also, to all future historians of 2150, sorry about the mess, but yes we knew this was going to happen. Whatever it was.

1 more reply

phkahler2y ago· 5 in thread

ricardobeat2y ago

Unless you have specific needs, the only type of UUID you should care about is v4.

v1: mac address + time + random

v4: completely random

v5: input + seed (consistent, derived from input)

v7: time + random (distributed sortable ids)

kozak2y ago

As someone who only cares about v4, I periodically wonder why don't I just use fully random 128-bit identifiers instead (without the version information).

3 more replies

xcrunner5292y ago

It would seem sequential keys for database performance is more than a 'specific' need.

1 more reply

epcoa2y ago

v4 being completely random has terrible properties even on a non distributed database. Probably better to use v7.

bhouston2y ago

https://en.wikipedia.org/wiki/Universally_unique_identifier#...

andersa2y ago· 4 in thread

> We use sequential primary keys for efficient indexing, and UUID secondary keys for external use. The upcoming UUIDv7 standard offers the best of both worlds

Unless you consider users being able to extract the generation time from the id to be an issue, of course.

giancarlostoro2y ago

andersa2y ago

There are definitely many cases where it isn't an issue since you were going to tell the user the time anyway (like sent time on a message)

2 more replies

contravariant2y ago

And if you consider knowledge of the id sufficient for access.

Which, despite the fact that it really shouldn't be, still seems to occur every so often. Even in situations where the ids are very much not random.

mort962y ago

I don't understand. What part of this requires that one considers knowledge of the ID sufficient for access? And what kind of access are you talking about?

1 more reply

dajonker2y ago· 4 in thread

Similar to the old situation in the article, we are using sequential 64 bit primary keys, but we use an additional random 64 bit key for external usage (instead of 128 bit).

The external key is base64 encoded for use in URLs which results in an 11 byte string.

I thought about using UUIDs as external keys but the only compelling use case seems to be the ability to generate keys from many decoupled sources that have to be merged later.

64 bit should be enough for most things https://youtu.be/gocwRvLhDf8?si=QBheJCG21bAAV0Z7

Kunix2y ago

I am using a variant of SnowflakeId [^1] in order to have 64 bit keys too.

It's similar to UUIDv7 (it leaks the creation time), but it's not an issue for me.

So I am able to have a single 64 bit key, which can easily be formatted into a small string for user-facing urls.

[^1]: https://instagram-engineering.com/sharding-ids-at-instagram-...

Waterluvian2y ago

It sounds like you basically just made your own 64 bit UUID. If you’re exposing this ID for manual use by a human (like URLs) then that sounds pretty helpful to be shorter!

RhodesianHunter2y ago

What's the risk of collisions with your external ID in this scenario?

sealeck2y ago

I would imagine that they enforce this using e.g. a unique constraint in their database.

1 more reply

LAC-Tech2y ago· 3 in thread

Why use UUIDv7 over ULIDs?

As Lazare points out in this thread they're basically the same thing, except with ULIDs you get those 6 extra bits of randomness back that UUIDs have to use for metadata.

oittaa2y ago

https://datatracker.ietf.org/doc/html/draft-ietf-uuidrev-rfc...

LAC-Tech2y ago

How much of a standard do ULIDS need? 6 byte timestamp, 10 bytes crypto randomness, stringify it using crockfords base 62 - and off we go.

1 more reply

jhealy2y ago

I'm not the author, but I work at the same company. ULIDs are nice, but we're a 10 year old company with many TBs of data across multiple logical databases and most rows had UUIDv4 ids.

Maybe if we were starting from scratch ULIDs would have been an option, but given where we were UUIDv7 was a much easier transition.

amanzi2y ago· 3 in thread

afavour2y ago

Yeah, years from now we’re going to see some story about how a company fudged their timestamps in order to get away with X, only to be given away by the timestamp hidden in public UUIDs.

jonhohle2y ago

riffraff2y ago

Most application may not want it, but will it hurt them?

I mean this is a similar concern to sequential IDs: many apps do not want to leak them, and in some cases it might cause issues, but in general it doesn't matter.

toni88x2y ago· 3 in thread

frederikb2y ago

For me the central benefit is that you can create them in a distributed manner and are not reliant on a central system as a single source of truth for creating your identifiers.

frederikb2y ago

XCSme2y ago

Would using the timestamp in the UUID be possible for date-range queries?

Traubenfuchs2y ago· 3 in thread

Am I the only one instinctively upset by the communication/bandwith/storage overhead of the dashes as well as the version and variant bits of UUIDs?

It might be insignificant, but to me it makes UUID feel tainted, dirty. 11.1% of a UUID are dashes. 15.3% of a UUID are wasted bits if you count version and variant bits.

rossant2y ago

Traubenfuchs2y ago

UUIDs are often pushed around in JSON and as string types, including the none random parts.

1 more reply

throwawayb4dc652y ago

wolverine8762y ago· 3 in thread

Couldn't that be solved with incremented serial numbers, rather than leaking time data?

FPGAhacker2y ago

Yes. But I think part of the design requirements is minimizing coordination required among distributed nodes.

This solution attempts to solve the sort-ability issue of current uuids by moving the timestamp to the most significant bits.

Ginden2y ago

Incremented serial numbers can leak things like eg. volume of sales in your shop.

wolverine8762y ago

So increasing serial numbers with random gaps?

danwee2y ago· 3 in thread

    GET /filter?a_id=X&b_id=Y&c_id=Z&d_id=w

dgellow2y ago

Do you dislike the use of POST because of its effect on caching, or because it doesn’t feel semantically correct? If you don’t need the caching I wouldn’t mind using a POST to filter.

dgb232y ago

The maximum url length is typically quite long.

Another thing is that you don’t necessarily need to encode uuids canonically. They are just u128’s. It’s relatively straightforward to find a url friendly string representation that is shorter.

danwee2y ago

> It’s relatively straightforward to find a url friendly string representation that is shorter.

2 more replies

coolgoose2y ago· 3 in thread

I am confused how this is new. UUIDv1 is time based, you just need to be careful about entropy, and in MySQL 8 you can for a longish time use it as an ordered field.

8organicbits2y ago

The use of a MAC address and fine grained timestamp are challenges of UUIDv1.

https://blog.devgenius.io/analyzing-new-unique-identifier-fo...

coolgoose2y ago

Sure, hence why I said entropy, but it's not like you can't use it :)

oittaa2y ago

And the crazy epoch instead the more known Unix epoch. Why would anyone want to create UUIDs around year 1500?

samatman2y ago· 3 in thread

Relying on timestamps to be sortable, when clock skew and ntd guarantee that they won't always be, strikes me as poor design.

If you need to sort by insert order, use an autoincrementing integer, if you need uniqueness, UUIDv4 is fine, if you need both use both.

Use timestamps when you need to record the time, just don't commit the sin of presuming that clock time will never run backwards, I assure you, it does.

jmmv2y ago

samatman2y ago

You're going to have a bad time writing something like a join across tables with a restricted range of time if your time is embedded in UUIDv7.

Just, please, for the sake of your future self and everyone you work with, don't use a timestamp for insert order. Ever.

1 more reply

atonse2y ago

dgb232y ago· 2 in thread

As a beginner I treated and understood (SQL) databases as something I have to use in order to store stuff.

Later I was excited about the power and expressiveness of SQL and its extensions. There is a ton of leverage and you can make it so that interfacing with it directly becomes much more useful.

However now I’m in a different phase. I see it as a durable data structure. I think in terms of “what does it provide to make the overall system better?”

The issues around indexing and uuids that is discussed in the article fits nicely into this line of thinking.

In web development, database access and performance often dominates and infects the whole system.

foreigner2y ago

As a beginner I thought of the database as a backend for the app. Now I think of the app as a frontend for the database. :-D

kuchenbecker2y ago

Crud, you're right!

pknerd2y ago· 2 in thread

Speaking of RDBMs, how good are UUIDs when making joins and fetching a certain record?

nevir2y ago

More or less identical to integer ids - they're stored and referenced as a 16-byte integer.

Unless you're manually storing them as strings... (Not ideal, but most dbs are pretty good at dealing with that too)

pknerd2y ago

so I have to pick a certain MySQL type?

jsf012y ago· 2 in thread

How long will it be before the “milliseconds since epoch” part of the uuid overflows or repeats?

jolmg2y ago

  $ date -ud @$(( 256 ** 6 / 1000 ))
  Tue Aug  2 05:31:50 AM UTC 10889

birracerveza2y ago

Well, at least it's not a Friday.

dataangel2y ago· 2 in thread

why bother with any version of the uuid standard? just generate a random 128-bit number and use it. that's all the newer ones are anyway

lelanthran2y ago

> why bother with any version of the uuid standard? just generate a random 128-bit number and use it. that's all the newer ones are anyway

Good question.

Won't random 128-bit numbers actually be superior to UUIDs in every way except predictability?

mholt2y ago

wvh2y ago· 1 in thread

The original UUID types aren't as useful as they once were, so it'd be worth writing a new RFC and extending those original types.

ahoka2y ago

Isn’t that almost the same as v1?

tzahifadida2y ago· 1 in thread

To me it sounds like a corner case. Example:

a) UUID4, CreatedTime/UpdatedTime.

b) Bigint, CreatedTime/UpdatedTime.

c) UUID7 internal (which also includes time badly), UUID4 external/whatever short ID.

How exactly this helps if you need external ids (which you usually do today)? It doesn't even make it a short ID.

Even if there is a corner case, are we just saving a few bytes while adding more complication?

WAIT!!! But what about Next Tokens? ok, these are painful, but easily solved: Next can be (>=Created Date,>ID). Same result. Pagination, stays the same since it is sorted by Created Date.

otherme1232y ago

I understood it as c) only UUID7, no secondary external UUID.

The external Id is used instead of Bigint because you don't want your external users to query 1, then 2, then 3 (IDOR)... But the random part of the Uuid7 makes this impossible.

Uuid7 isn't a substitute for Created/Updated, but a substitute for the dual field Uuid4/Bigint.

zooFox2y ago· 1 in thread

One benefit of an epoch is that it's easily readable (or comparable, at the very least). I am not sure I can read epoch in hexadecimal format though.

okl2y ago

Need a new clock? https://retr0.id/stuff/2038/

xarope2y ago· 1 in thread

I am just about to wrap up some prototyping comparing snowflake, typeids, uuidv4 and ulid. Why did I not bump into uuidv7 earlier?!?

RobIII2y ago

Don't know, because uuidv7 has been coming for ages... https://www.ietf.org/archive/id/draft-peabody-dispatch-new-u...

jimmySixDOF2y ago

Discussion here a couple months ago :

Analyzing New Unique Identifier Formats (UUIDv6, UUIDv7, and UUIDv8) (2022) https://news.ycombinator.com/item?id=36438367

dkubb2y ago

If it helps anyone, at work, I open sourced the UUID v7 postgresql function that I wrote: https://github.com/Betterment/postgresql-uuid-generate-v7

We've seen some amazing benefits, especially around improving the speed of batch inserts.

jakewins2y ago

A useful/horrifying pattern on this topic: you can use UUIDv1 as a prefixed id, giving you a way to generate tagged IDs in a system that uses UUIDs.

You set the node field to a broadcast MAC address, and use that as a namespace/prefix. This inches close to the boundary of the RFC, but is arguably compliant.

user39393822y ago

tzahifadida2y ago

0pteron2y ago

rockwotj2y ago

jug2y ago

Haha this is what we came up with for our home brewn unique ID's in a GIS application since decades ago. For the same reasons.

insanitybit2y ago

I would assume that `serial` would solve this problem too.

mooreed2y ago

Feels like a spiritual successor to the ksuid [1] lib which I first heard of used in conjunction with DynamoDB

[1]: https://github.com/segmentio/ksuid which has very similar use cases.

miiiiiike2y ago

This is neat. I've been using a custom snowflake cluster for years. Having this in the language/DB would be great for smaller projects.

For bigger/public projects I'd like to be able to add a sequence, node, and data center id to the UUID too.

gwbas1c2y ago

Anyone ever try encrypting a database ID (IE, a sequential int,) and use that as a public key?

IE, take a 32 or 64 bit int that's the primary key, encrypt it, and then use that as the public ID in a web application, URL, API, ect.

hknmtt2y ago

I have been using ULID for years. Using digits now would feel very strange.

Pxtl2y ago

Frustrating, I looked up MS/C#'s implementation and they don't get stored in a proper semisequential fashion in MS SQL Server because MS stores UUIDs in an odd binary format.

eviks2y ago

Always wondered what the point of dash-separating uuid if the separated parts are unreadable anyway just like in this version, just makes it harder to select as a single blob of text

perfmode2y ago

I chose ULIDs for a recent project.

Hope it won’t bite me in the future.

markcollin2y ago

Interesting - have beem using uuidv4 for a long time. Will explore further on uuidv7

JCharante2y ago

I wonder who this article is written for. Who would be reading about UUIDs but not know about cache hit rates?

j / k navigate · click thread line to collapse