This morning, our database flagged a duplicate UUID (v4). I checked, thinking it may have been a double-insert bug or something, but no.
The original UUID was from a record added in 2025 (about a year ago), and today the system inserted a new document with a fresh UUIDv4 and it came up with the exact same one:
b6133fd6-70fe-4fe3-bed6-8ca8fc9386cd
We're using this: https://www.npmjs.com/package/uuid
I thought this is technically impossible, and it will never happen, and since we're not modifying the UUIDs in any way, I really wonder how that.... is possible!? We're literally only calling:
import { v4 as uuidv4 } from "uuid";
const document_id = uuidv4();
... and then insert into the database, that's it.
Additionally, the database only has about 15.000 records, and now one collision. Statistically... impossible.
Has that ever happened to anyone?! What in the...
The security of UUIDv4 is based on the assumption of a high-quality entropy source. This assumption is invalidated by hardware defects, normal software bugs, and developers not understanding what "high-quality entropy" actually means and that it is required for UUIDv4 to work as advertised.
It is relatively expensive to detect when an entropy source is broken, so almost no one ever does. They find out when a collision happens, like you just did.
UUIDv4 is explicitly forbidden for a lot of high-assurance and high-reliability software systems for this reason.
Are you generating the UUID in the backend, or the frontend? Frontend is fundamentally unreliable for many reasons, including deliberate collisions. So if that case you'll need to handle collisions somehow. Though you can still engineer around common sources of collisions, the specifics depend on the environment.
On the other hand making a backend reliable is feasible. What kind of environment is your code running in? Historically VMs sometimes suffered from this problem, though this should be solved nowadays. Heavily sandboxed processes might still run into this, if the RNG library uses an unsafe fallback. Forking processes or VMs can cause state duplication and thus collisions.
<https://git-scm.com/book/en/v2>
"Here’s an example to give you an idea of what it would take to get a SHA-1 collision. If all 6.5 billion humans on Earth were programming, and every second, each one was producing code that was the equivalent of the entire Linux kernel history (6.5 million Git objects) and pushing it into one enormous Git repository, it would take roughly 2 years until that repository contained enough objects to have a 50% probability of a single SHA-1 object collision. Thus, an organic SHA-1 collision is less likely than every member of your programming team being attacked and killed by wolves in unrelated incidents on the same night."
Deliberate collisions are addressed in the following paragraph.
SHA-1 hashes are not random, so the issue of poor pseudo-random number generation doesn't apply as it does to uuidv4. And SHA-1 hashes are 160 bits, vs. 128 for uuidv4.
But I love the idea of unrelated wolf attacks.
https://github.com/uuidjs/uuid/issues/546
Eg:
> FWIW, I just tested crypto.getRandomValues() behavior on googlebot and it is also deterministic(!)
If the rng is not customized it will use:
const rnds8 = new Uint8Array(16);
export default function rng() {
return crypto.getRandomValues(rnds8);
}
getRandomValues doesn't specify a minimum amount of entropy.The only guesses I'm having is that we originally generated UUIDv4s on a user's phone before sending it to the database, and the UUID generated this morning that collided was created on an Ubuntu server.
I don't fully know how UUIDv4s are generated and what (if anything) about the machine it's being generated on is part of the algorithm, but that's really the only change I can think of, that it used to generated on-device by users, and for many months now, has moved to being generated on server.
If the entire universe were turned into a giant computer and did nothing but generate uuids until its heat death, how many bits would you need for the ID space?
From what I skimmed the package should just call to the js runtime's crypto.randomUUID(). I think it should always be properly seeded.
I think it is extremely unlikely that the runtime has a bug here, but who knows? What js runtime do you use?
1 in 47.3 octillion.
i'd be suspecting a race condition or some other naive mistake, otherwise id be stocking up on lottery tickets.
(lol at the other user posting at the same time about the lottery ticket.. great minds and all that.)
If you want something that is difficult to guess, ask the cryptography guys. But if you need something that is -_guaranteed_ unique, you must build it yourself.
A collision at 15,000 records is so unlikely that I would first suspect something else. Duplicate processing, replayed requests, reused objects, misleading logs, or another code path reusing the identifier.
Could you share a bit more of the surrounding code so we can check?
Consider if your ID can contain a timestamp besides a random value. The answer is usually yes. UUIDv7 is fine.
If you've spend the time to really work through the whole problem and have written down a proof how that leads to unacceptable info leak: Congratulations your system is complex and slow enough that you might as well take a strong cryptographic hash or UUIDv5 if you're lazy.
> This module may generate duplicate UUIDs when run in clients with deterministic random number generators, such as Googlebot crawlers. This can cause problems for apps that expect client-generated UUIDs to always be unique. Developers should be prepared for this and have a strategy for dealing with possible collisions, such as:
> - Check for duplicate UUIDs, fail gracefully
> - Disable write operations for Googlebot clients
https://github.com/uuidjs/uuid/commit/91805f665c38b691ac2cbd...
I've moved on to something like TSID(where security isn't a factor) or uuidv7 to make sure this never really occurs in practice rather than over engineering the code with retries.
I'd look at rng initialisation first.
No, very technically possible... though, with good randomness, very, very unlikely.
But nothing technically prevents a UUIDv4 from generating a duplicate value.
There are a bunch of constraints that must be strictly held for UUIDs to be collision resistant, I'd guess there is a problem with your random number generator.
Actually it's not impossible, but very very improbable.
P.S. You should play a lottery/powerball ticket
P.P.S. Whenever I use the word improbable, the https://hitchhikers.fandom.com/wiki/Infinite_Improbability_D... comes in mind
Thoughts?
And yet, plenty of experienced devs, including team leads and CIOs, are convinced it's impossible. As in, they absolutely don't write code to deal with the condition. So a bad RNG can randomly destroy the system far sooner than expected at any time, and it won't be noticed, caught, re-genned, or anything, with concurrent corruption being entirely possible. They're fine with it. I feel like these are the same guys who don't check to see if malloc() succeeds.
I like to ask them, "If it's impossible, you're using too many bits, right?". I haven't talked any of them into hedging with a brownian motion detector, or a lava lamp or something for better randomness yet, but I'm still trying.
In an eternal universe, even the most unlikely of events will happen an infinite number of times.
There is no need to set uuids through javascript or node imo
private static function createUUID(){
$md5 = md5(uniqid('', true));
return substr($md5, 0, 8 ) . '-' .
substr($md5, 8, 4) . '-' .
substr($md5, 12, 4) . '-' .
substr($md5, 16, 4) . '-' .
substr($md5, 20, 12);
}
Holy cow, how didn't this horror come and bite us in the juicy parts ? I don't know.Imagine the database having the old UUID in a memory buffer due to a recent index scan, and a bit flip happened somewhere in the logic which basically copied the old UUID into the memory location of the new UUID, or some buffer addresses got swapped, or the operation which allocated the new UUID received a memory buffer containing the old one, and due to a bit flip the memcpy operation was skipped, or something along that line.
Facebook wrote extensively about this, stuff like "if (false) {do_x(); )" and do_x being called. For example their critical RocksDB kv store has extensive redundant protections to defend against such "impossible bugs".
Not at all! Just very unlikely. It's about odds and statistics. Not physics.
Also, numerous applications that use a unique ID per record frequently need to check for ID collisions. I know I do for a short URL generator.
If everything is done properly, then this is very likely the one and only time anyone involved in the telling or reading of this account will ever experience this.
Super easy assignment, wrote it up probably in C++ (maybe just C?), and ran it on my linux box (probably Debian potato). It finished super quick and gave me an average of like 5.6 steps to return to the origin or something. Cool!
I copied it over to my account on the department's HP-UX machines where I was supposed to run and submit it to my instructor. Compiled fine. And then it... just ran forever. I was doing rand() % 4 or something, and the HP-SUX RNG had crazy bias in its last 2 bits, and it just walked away forever, never returning to the origin. Well crap!
Got an A for my writeup, though!
Why? There's a built-in for this.
I always hated this meme/mindset, because if you dig in to the history of them you'll see that their original purpose was to collide. They were labels to identify messages in Apollo's distributed computing architecture. UID and later UUIDs were a reversible way to mark an intersection point between two dimensions.
Any two nodes in a distributed system would generate the same UID/UUID for the same two inputs, and a recipient of an identified message could reverse the identifier back into the original components. They were designed as labels for ephemeral messages so the two dimensions were time and hardware ID (originally Apollo serial number, later 802.3 hwaddress etc).
I think a lot of the confusion can be traced to the very earliest AEGIS implementation where the Apollo engineers started using “canned” (their term, i.e. static or well-known) UIDs to identify filesystems. Over time the popular usage of UUID fully shifted from ephemeral identifiers where duplicates were intentional toward canned identifiers where duplicates were unwanted and the two dimensions were random-and-also-random.