Guid Smash (opens in new tab)

(guidsmash.com)

180 pointsnugzbunny10mo ago70 comments

70 comments

34 comments · 12 top-level

twiss10mo ago· 13 in thread

> The chances of generating two GUIDs that are the same is astronomically small.

> The odds are 1 in 2^122 — that’s approximately 1 in 5,000,000,000,000,000,000,000,000,000,000,000,00.

This is true if you only generate two GUIDs, but if you generate very many GUIDs, the chance of generating two identical ones between any of them increases. E.g. if you generate 2^61 GUIDs, you have about a 1 in 2 chance of a collision, due to the birthday paradox.

2^61 is still a very large number of course, but much more feasible to reach than 2^122 when doing a collision attack. This is the reason that cryptographic hashes are typically 256 bits or more (to make the cost of collision attacks >= 2^128).

Xelbair10mo ago

i have seen in my life two guid collisions already. and i'm not that old.

One of them was genuine - generated by different systems, and it was caught when loading data from one to another - object had same ID, but different underlying type.

Other one was due to 'error' - two systems(by different companies, supporting the same data exchange standard) used magic hardcoded guid that turned out to be the same.

Both of those systems have full audit trail - each change created new row in database and IDs were formatted as {NAMESPACE}.{GUID}.{TIMESTAMP}. Mutation of an object created new entry with different {TIMESTAMP} part. Namescapes are mandated by standard, so different systems can have the same namespace value.

_alternator_10mo ago

There are either bugs in the system or the GUID isn’t random. The first case you mention is probably both TBH; the second case is probably due to non-randomness (generating via namespace/timestamp leads to collisions when two objects are generated simultaneously).

ivanstepanovftw10mo ago

Sorry, when I was young I did not know what these `public static final UUID` mean, so I copied them.

phyzome10mo ago

Both vendors probably copied that GUID from the same place.

habibur10mo ago

The birthday paradox simplified : if you generate n bits of random data, you can at most generate n/2 bits of random numbers before clashes start to occur. That's square root of number's range.

So if you need 1000 random numbers, generate from 1 to 1 million.

vbezhenar10mo ago

> So if you need 1000 random numbers, generate from 1 to 1 million.

If you don't check for clashes, the 50% chance of failure is too much. Probably even 0.1% is too much, so you'd need more elaborate approach.

If you do check for clashes, you can generate from 1 to 2000 with little overhead.

1 more reply

arcanemachiner10mo ago

I always assumed that intuitively... I think the number is 20 people for the birthday paradox. 20 x 20 = 400, and there are ~365 days in a year. Is that how that works?

2 more replies

Retr0id10mo ago

2^61 isn't even that large, well within the compute budget of mere mortals.

mmoskal10mo ago

Counting to 2^61 probably is.

To actually find a collision in 128b cryptographic hash function it would take closer to 2^65 hashes. Back of the envelope calculations suggest that with Pollard's rho it would cost a few million dollars of CPU time at Hetzner's super-low prices. Not nearly mere mortals budget, but not that far off I guess.

1 more reply

vlovich12310mo ago

Depends on what “isn’t even that large means”. A modern 6ghz machine would probably need 12 years of 24/7 operation to count that high. To me that seems like a lot.

2 more replies

PaulHoule10mo ago

I think you might have trouble if you tried to assign one to every iron atom in an iron filing.

somat10mo ago

2^61 guids is... 36 exabytes, if my napkin math is correct. when storing them in binary format(16 bytes each) if doing the javascript thing and storing them as strings... (shudders) I don't even want to think about it.

Anyhow that was my first thought when you mentioned 2^61 guids, where are you even going to put them? second thought, I don't think enumerating 2^61 guids is trivial, in fact, I suspect it would take longer than anyone would be willing to spend, and if you are not storing them why are you generating them?

And what even is a guid collision attack? it is not like they are a hash, and since they tend to be public identifiers it turns out despite their stated use to prevent collisions, you can't really use guids generated by others(if they wanted collisions they would straight up just copy yours) so you end up regenerating them anyway.

NoahZuniga10mo ago

* not the birthday paradox, but the birthday bound.

8organicbits10mo ago· 3 in thread

Note that this only considers UUIDv4, the random UUID. Other forms can generate UUIDs that are much closer together. For UUIDv7, UUIDs generated within the same millisecond will have identical 48 bit prefixes (or up to 60 when the monotonic counter from section 6.2 is used).

https://www.rfc-editor.org/rfc/rfc9562.html#monotonicity_cou...

e1g10mo ago

You need to be generating >100M of them within the same millisecond before even remembering that collisions can theoretically happen.

sgentle10mo ago

Apparently there's 500 hours of video uploaded to YouTube every minute (30 seconds every millisecond). Assuming 4K@60fps, that works out to 14,929,920,000 pixels per millisecond.

If YouTube wanted to give every incoming pixel its own UUIDv7, they'd see a collision rate just under 0.6%.

2 more replies

charcircuit10mo ago

>You

The entire universe. Else it's not universally unique.

3 more replies

RS-23210mo ago· 2 in thread

UUID > GUID.

Microsoft’s GUID standard is garbage.

lionkor10mo ago

Oh, why?

w-ll10mo ago

not OP but i already have fields for time ts and what model it is. i want my uuids random.

2 more replies

amingilani10mo ago· 1 in thread

Instead of picking a target UUID and evaluating new UUIDs against it, a better experiment would be finding duplicates in all the UUIDs you have generated.

This plays nicely with the birthday paradox.

whyever10mo ago

It would require a lot more memory, because you have to remember every generated UUID. And how would you do the partial match? You are not going to observe any collisions.

nopassrecover10mo ago· 1 in thread

Reminds me of a problem I ran into once where someone had wanted unique but short codes as identifiers for relatively small counts, and picked a substring of a UUID:

http://mattmitchell.com.au/birthday-problems-friendly-identi...

kr210mo ago

> However, the overall takeaway was: Don’t use the MongoDB Increment value as a Unique Identifier.

However, the overall takeaway should be, as always: don't use MongoDB. Period. Every time I learn something new about it I'm baffled about why people continue to use it.

ahmedfromtunis10mo ago· 1 in thread

The proximity measure seems to be flawed.

If you want to see how close to a non-ordinal 123456 a random generator can get, you also need to look for stuff like 923456 or 123956, etc.

Also, would 223456 be considered a closer match compared to 323456? (It shouldn't in my opinion because, again, these are non-ordinal strings).

gammalost10mo ago

If its a random ID then I'd argue that all of them are equally close to each other. With that said, I do not know how GUIDs are generated

webstrand10mo ago· 1 in thread

This is the chance that given a specific guid, that you'll find a collision for it. Utterly minuscule chance. However birthday paradox controls, if you generate 2^62.60 guids the chance that you've generated a collision is around 99%. Still enormously unlikely, but way smaller than 2^122.

At a rate of comparing 400,000 guids per second, you have a 99% chance of seeing a collision within the next 553,750 years.

jonathrg10mo ago

You would need a little more memory to see/detect that collision.

franky4710mo ago

Easy, it should be listed there: https://everyuuid.com/

Joel_Mckay10mo ago

Most just pack down:

epoch time + MAC Address + transaction counter (catch NTP skew) + Thread PID + new Pointer address = GUID

Then increment global transaction counter, complete some ops, and check to ensure current epoch time is in the future before the transaction frees the memory locations.

This is often robust in highly concurrent distributed systems even under network degradation, or corrupted sync states. Has other interesting use-cases too. =3

ivanjermakov10mo ago

Reminds me of SHAllenge: https://news.ycombinator.com/item?id=40683564

nesk_10mo ago

Nice experiment. Is the code available somewhere?

867-530910mo ago

please may all the death huggers go hug a tree. thanks

j / k navigate · click thread line to collapse

70 comments

34 comments · 12 top-level

twiss10mo ago· 13 in thread

> The chances of generating two GUIDs that are the same is astronomically small.

> The odds are 1 in 2^122 — that’s approximately 1 in 5,000,000,000,000,000,000,000,000,000,000,000,00.

Xelbair10mo ago

i have seen in my life two guid collisions already. and i'm not that old.

One of them was genuine - generated by different systems, and it was caught when loading data from one to another - object had same ID, but different underlying type.

Other one was due to 'error' - two systems(by different companies, supporting the same data exchange standard) used magic hardcoded guid that turned out to be the same.

_alternator_10mo ago

ivanstepanovftw10mo ago

Sorry, when I was young I did not know what these `public static final UUID` mean, so I copied them.

phyzome10mo ago

Both vendors probably copied that GUID from the same place.

habibur10mo ago

The birthday paradox simplified : if you generate n bits of random data, you can at most generate n/2 bits of random numbers before clashes start to occur. That's square root of number's range.

So if you need 1000 random numbers, generate from 1 to 1 million.

vbezhenar10mo ago

> So if you need 1000 random numbers, generate from 1 to 1 million.

If you don't check for clashes, the 50% chance of failure is too much. Probably even 0.1% is too much, so you'd need more elaborate approach.

If you do check for clashes, you can generate from 1 to 2000 with little overhead.

1 more reply

arcanemachiner10mo ago

I always assumed that intuitively... I think the number is 20 people for the birthday paradox. 20 x 20 = 400, and there are ~365 days in a year. Is that how that works?

2 more replies

Retr0id10mo ago

2^61 isn't even that large, well within the compute budget of mere mortals.

mmoskal10mo ago

Counting to 2^61 probably is.

1 more reply

vlovich12310mo ago

Depends on what “isn’t even that large means”. A modern 6ghz machine would probably need 12 years of 24/7 operation to count that high. To me that seems like a lot.

2 more replies

PaulHoule10mo ago

I think you might have trouble if you tried to assign one to every iron atom in an iron filing.

somat10mo ago

NoahZuniga10mo ago

* not the birthday paradox, but the birthday bound.

8organicbits10mo ago· 3 in thread

https://www.rfc-editor.org/rfc/rfc9562.html#monotonicity_cou...

e1g10mo ago

You need to be generating >100M of them within the same millisecond before even remembering that collisions can theoretically happen.

sgentle10mo ago

Apparently there's 500 hours of video uploaded to YouTube every minute (30 seconds every millisecond). Assuming 4K@60fps, that works out to 14,929,920,000 pixels per millisecond.

If YouTube wanted to give every incoming pixel its own UUIDv7, they'd see a collision rate just under 0.6%.

2 more replies

charcircuit10mo ago

>You

The entire universe. Else it's not universally unique.

3 more replies

RS-23210mo ago· 2 in thread

UUID > GUID.

Microsoft’s GUID standard is garbage.

lionkor10mo ago

Oh, why?

w-ll10mo ago

not OP but i already have fields for time ts and what model it is. i want my uuids random.

2 more replies

amingilani10mo ago· 1 in thread

Instead of picking a target UUID and evaluating new UUIDs against it, a better experiment would be finding duplicates in all the UUIDs you have generated.

This plays nicely with the birthday paradox.

whyever10mo ago

It would require a lot more memory, because you have to remember every generated UUID. And how would you do the partial match? You are not going to observe any collisions.

nopassrecover10mo ago· 1 in thread

Reminds me of a problem I ran into once where someone had wanted unique but short codes as identifiers for relatively small counts, and picked a substring of a UUID:

http://mattmitchell.com.au/birthday-problems-friendly-identi...

kr210mo ago

> However, the overall takeaway was: Don’t use the MongoDB Increment value as a Unique Identifier.

However, the overall takeaway should be, as always: don't use MongoDB. Period. Every time I learn something new about it I'm baffled about why people continue to use it.

ahmedfromtunis10mo ago· 1 in thread

The proximity measure seems to be flawed.

If you want to see how close to a non-ordinal 123456 a random generator can get, you also need to look for stuff like 923456 or 123956, etc.

Also, would 223456 be considered a closer match compared to 323456? (It shouldn't in my opinion because, again, these are non-ordinal strings).

gammalost10mo ago

If its a random ID then I'd argue that all of them are equally close to each other. With that said, I do not know how GUIDs are generated

webstrand10mo ago· 1 in thread

At a rate of comparing 400,000 guids per second, you have a 99% chance of seeing a collision within the next 553,750 years.

jonathrg10mo ago

You would need a little more memory to see/detect that collision.

franky4710mo ago

Easy, it should be listed there: https://everyuuid.com/

Joel_Mckay10mo ago

Most just pack down:

epoch time + MAC Address + transaction counter (catch NTP skew) + Thread PID + new Pointer address = GUID

Then increment global transaction counter, complete some ops, and check to ensure current epoch time is in the future before the transaction frees the memory locations.

This is often robust in highly concurrent distributed systems even under network degradation, or corrupted sync states. Has other interesting use-cases too. =3

ivanjermakov10mo ago

Reminds me of SHAllenge: https://news.ycombinator.com/item?id=40683564

nesk_10mo ago

Nice experiment. Is the code available somewhere?

867-530910mo ago

please may all the death huggers go hug a tree. thanks

j / k navigate · click thread line to collapse