Flake: A Decentralized, K-Ordered Unique ID Generator in Erlang (opens in new tab)

(blog.boundary.com)

64 pointsd2fn14y ago16 comments

16 comments

15 comments · 8 top-level

mjb14y ago· 4 in thread

A couple of questions about this:

- Why [time, node id, seq] and not [time, seq, node id]? That would improve ordering if you have approximately equal load on each box.

- Isn't a 16 bit seq number overkill? Handing some of those bits to the unique ID would have made unique ID assignment easier. Duplicate MACs can and do exist (especially if you buy a lot of hardware from the same vendors).

- The quality of the ordering is going to be restricted by the quality of time synchronization within the cluster. Relying on NTP for this is OK, but experience suggests that a secondary monitoring system will be needed. Similarly, relying on monotonic time needs some care in system administration - care that could potentially be avoided with a different unique host ID assignment scheme.

rdtsc14y ago

> Why [time, node id, seq] and not [time, seq, node id]

Would you rely on sub-millisecond synchronization between nodes _and_ an almost exact load amount?

In other words for a particular millisecond seq order is only relevant on that particular node, so node should come first.

mjb14y ago

> In other words for a particular millisecond seq order is only relevant on that particular node, so node should come first.

It still seems to me that T_0,0,A (issued by node A at T_0 with seq number 0) would tend to come before T_0,1,B so putting the sequence number first does add some value. On the other hand, the ordering T_0,A,0 < T_0,A,1 < T_0,B,0 < T_0,B,1 is arbitrary.

> Would you rely on sub-millisecond synchronization between nodes _and_ an almost exact load amount?

Not rely on. Absolutely not. Maybe it's better to go further, and make it more explicit that the IDs are K ordered. Say you could synchronize your host clocks reliably to a maximum delta of 512 ms, then you could chose a scheme like:

[ milliseconds >> 9, host id, milliseconds & 0x1ff, seq id ]

The value here is that you make it much harder for consumer of the IDs to make incorrect assumptions about the precision of their ordering. Basically taking away the temptation to make statements about their ordering with false precision, by making a simple sort only provide a meaningful ordering within the real available precision.

1 more reply

jerf14y ago

Funny, I thought the exact opposite about the sequence numbers; 16-bit sequence number is cutting it awfully fine for a "unique" ID in the space of a millisecond. Slow-as-molasses Perl can count to 2^16 in 3ms on my machine. Cutting bits out of the node id by hashing MAC, interface name, and a few other things and giving that to the sequence would have made more sense to me. (And you do that hash once at init time.)

In information theory terms, the node ids are carrying log2(number of nodes) bits of information, whereas the node ids are carrying something more like log2(maximum seq id). The latter is almost certainly larger than the former, and at 16 computer-bits allocated for the sequence number overflow would be a real concern of mine, if not on this year's hardware than in a few more years.

mumrah14y ago

You'd be hard pressed to generate 2^16-1 ids in a millisecond, so yea - it's probably overkill.

alpb14y ago· 2 in thread

For those interested, there is also Snowflake by Twitter on GitHub https://github.com/twitter/snowflake

thezilch14y ago

Snowflake is referenced in the article "Credits" with a distinction between the products -- though, the product name is a bit dubious. It [Flake] differs primarily in that we allow ourselves a wider address space in which to fit ids which means we don’t have to think about timestamp truncation or coordination of worker ids.

foobarbazetc14y ago

Yeah. But the hard part is the 64bit representation. If you have 128 bits, you can no longer be a standard int. And you can just use UUIDs if you don't care about k-ordering.

coolrhymes14y ago· 1 in thread

as mjb said, mac addresss can be cloned so not sure whats that buying we are using uuid1 which does more or like does what these guys are doing...

jpitz14y ago

UUID1s don't sort by their generation time.

rarrrrrr14y ago

There's also Instagram's implementation in PostgreSQL:

http://instagram-engineering.tumblr.com/post/10853187575/sha...

d2fnOP14y ago

Thanks for the interest and the feedback. I've updated the readme and added a roadmap as well as a faq after getting some good input over the last couple of days.

https://github.com/boundary/flake

google-114y ago

This is quite similar to the BSON ObjectId specification used by MongoDB:

http://www.mongodb.org/display/DOCS/Object+IDs#ObjectIDs-BSO...

rdtsc14y ago

Why not use uuid1 then reorder its bits ?

Raphael14y ago

Is it Erlang day already?

j / k navigate · click thread line to collapse

16 comments

15 comments · 8 top-level

mjb14y ago· 4 in thread

A couple of questions about this:

- Why [time, node id, seq] and not [time, seq, node id]? That would improve ordering if you have approximately equal load on each box.

rdtsc14y ago

> Why [time, node id, seq] and not [time, seq, node id]

Would you rely on sub-millisecond synchronization between nodes _and_ an almost exact load amount?

In other words for a particular millisecond seq order is only relevant on that particular node, so node should come first.

mjb14y ago

> In other words for a particular millisecond seq order is only relevant on that particular node, so node should come first.

> Would you rely on sub-millisecond synchronization between nodes _and_ an almost exact load amount?

[ milliseconds >> 9, host id, milliseconds & 0x1ff, seq id ]

1 more reply

jerf14y ago

mumrah14y ago

You'd be hard pressed to generate 2^16-1 ids in a millisecond, so yea - it's probably overkill.

alpb14y ago· 2 in thread

For those interested, there is also Snowflake by Twitter on GitHub https://github.com/twitter/snowflake

thezilch14y ago

foobarbazetc14y ago

Yeah. But the hard part is the 64bit representation. If you have 128 bits, you can no longer be a standard int. And you can just use UUIDs if you don't care about k-ordering.

coolrhymes14y ago· 1 in thread

as mjb said, mac addresss can be cloned so not sure whats that buying we are using uuid1 which does more or like does what these guys are doing...

jpitz14y ago

UUID1s don't sort by their generation time.

rarrrrrr14y ago

There's also Instagram's implementation in PostgreSQL:

http://instagram-engineering.tumblr.com/post/10853187575/sha...

d2fnOP14y ago

Thanks for the interest and the feedback. I've updated the readme and added a roadmap as well as a faq after getting some good input over the last couple of days.

https://github.com/boundary/flake

google-114y ago

This is quite similar to the BSON ObjectId specification used by MongoDB:

http://www.mongodb.org/display/DOCS/Object+IDs#ObjectIDs-BSO...

rdtsc14y ago

Why not use uuid1 then reorder its bits ?

Raphael14y ago

Is it Erlang day already?

j / k navigate · click thread line to collapse