Reddit's photo albums broke due to Integer overflow of Signed Int32 (opens in new tab)

lathiat3y ago

Someone previously created a tweet linking to itself by predicting the likely ID range: https://oisinmoran.com/quinetweet

manigandham3y ago

A simpler scheme I've used for adtech (billions of requests per day) is to simply reserve a chunk of numbers for each server from a central source. Easy to implement, very fast since each node can just increment in process, and using a 64-bit integer is effectively infinite.

duskwuff3y ago

Twitter didn't always use Snowflake -- that was introduced in November 2010. There was another, much simpler algorithm used before that which generated much smaller IDs (e.g. under 3e10).

pwdisswordfish03y ago

> 52 bits because of javascript

But IEEE 754 doubles have a significand that supports a 53-bit range. What am I missing?

vecter3y ago

Wouldn't that unevenness only affect 2^31 - 2 and 2^31 - 1, so a negligible fraction of the integers? Was that tiny discrepancy enough to make your calculations?

In other words, what do you mean that it was done by doing mod 3 of a signed int32? If it was a monotonically increasing or random int32, I don't see how that unevenness would manifest in a meaningful way.

madrox3y ago

In another subthread, we realized my memory was wrong and we were measuring millisecond collisions. The serving ID imbalance was a side-effect. Also, it might've been an int16 I was thinking of but turns out the whole thing was shadows on cave walls.

mashygpig3y ago

Maybe a dumb question, but I don’t follow what you mean by “the space doesn’t divide neatly by two” and also how that connects with overflowing ints. Asking because I’m genuinely curious and would like to know more about this. Sounds really neat!

_3u103y ago

If they were incrementing and modding wouldn’t that server see an extra 1/2 billionth more traffic?

I don’t get how mod 3 affects anything if you’re just incrementing…

manigandham3y ago

If it’s round robin then it should be an even load, how does the modulo change that exactly?

Also what number are they using to modulo and where is that happening? Because at that point don’t they already have an incrementing ID before generating another one?

andreareina3y ago

Take a 3-bit counter:

    0->A
    1->B
    2->C
    3->A
    4->B
    5->C
    6->A
    7->B

A and B get hit three times while C only twice, so it will see 66% utilization compared to A and B

EDITED s/once/twice/ thanks CyberDildonics

pedrovhb3y ago

That's really neat, I'd love to hear more about it. Was this something you were actively trying to find out, or was it poking around until something caught your eye?

madrox3y ago

A little of both. I was working in social media analytics, and we were collecting everything we could to understand how to communicate the value of this new medium to businesses who could use twitter for marketing. This was still in an era where privacy wasn't at the front of anyone's minds, so there were zero retention policies. Hard to believe that was only 10 years ago.

Eventually, we learned to treat Twitter as a lead generation tool for off-platform activity and apply old school funnel mechanics to it. The next problem became how to build a follower count. Sadly, that problem is what I think led to extremism on the platform. Hence: https://madrox.substack.com/p/yet-another-quitting-twitter

kypro3y ago

Not op, but at an ecommerce company I worked for we did similar things to track how well our competitors were doing relative to us so could be something like that.

Also collecting data like this can be useful if you want to beat markets.

ipqk3y ago

Reminds me of the self-quoting tweet: https://news.ycombinator.com/item?id=25244872

_gabe_3y ago· 16 in thread

I'm just curious, I know it's a long running joke about how we're so stupid to think that we would never run out of unique digits with 2^32 possible values, but is this also the case with 64 bit values? Every new bit doubles the amount of information, so if 32 bits lasted reddit 10 years, presumably 33 bits would last them 20 years, 34 would last 40 and so on. Eventually, 64 bits would last them 10×2^32 years, which seems like a safe bet.

So am I being naive when I use 64 bit values for unique IDs? Or is it actually plausible that 64 bits is plenty of information for centuries to come?

Edit: Also, technically reddit was using signed int32s. So they really only had 2^16 unique digits. If they used unsigned int32s, then that would have bought them a lot of time.

jxf3y ago

> Edit: Also, technically reddit was using signed int32s. So they really only had 2^16 unique digits. If they used unsigned int32s, then that would have bought them a lot of time.

Small correction: Signed int32s means you have 31 bits for the integer value (2^31 values), not 16 bits (2^16 values). There are 32 bits; 1 bit is used up for the sign, and 31 remain for the number.

_gabe_3y ago

Yep, I made the same mistake I was trying to call out here haha. Thanks for the correction!

KMag3y ago

As others have pointed out, there are 2^31 non-negative values an Int32 can take on, and linear growth is probably a bad assumption.

I should also probably point out some ambiguity in your analysis. To be clear, under the faulty assumption of linear consumption, if Int32s get exhausted in 10 years, switching to UInt32s will last 20 years of which 10 are already spent. So, under the faulty assumption of linearity, switching to UInt32s buys you an extra 10 years.

64-bit identifiers will last quite a while. Another defensible choice would be to switch to 128-bit unguessable identifiers (such as assigning them via AES-256 in counter mode) in order to provide defence in depth. If something goes wrong with your authentication, but the attacker is likely to spend millennia hammering your REST API with invalid IDs before exploiting an actual asset, then the consequences aren't so bad.

rezonant3y ago

> Edit: Also, technically reddit was using signed int32s. So they really only had 2^16 unique digits. If they used unsigned int32s, then that would have bought them a lot of time.

Signed integers have 2^31 positive values (well, just about). It doesn't take 16 bits to store the negative sign :-)

HideousKojima3y ago

Not for most implementations, a signed 32 bit int goes from ~-2 billion to ~+2 billion, an unsigned from 0 to ~4 billion

lilyball3y ago

If 32 bits lasted 10 years, 33 bits would not last 20, as engagement is not a constant. If we look at the last 10 years, presumably there were a lot more posts in the second 5 years than in the first 5 years. I don't know what Reddit's actual user/engagement metrics show though. Certainly if they got to a stable point where they had approximately the same activity going forward as they have going back then it would, but that's not usually how social sites work.

That said, it would still take quite a while to get to 64 bits.

bagels3y ago

This is just basic engineering: What rate do you anticipate creating these things? What is the upper bound rate you will create these things? Consider if every person were a customer, how many of these things would/could they create or cause to become created? How do those assumptions change over time?

IIAOPSW3y ago

"we assigned everyone 9 digit phone numbers, but that ran out after 50 years of population growth led to over a billion people. Now we assign everyone an 18 digit phone number. Personally, I'm worried we might run out in another 50 years."

ThunderSizzle3y ago

We should've dropped phone numbers and just make it like email - dial person-name@verizon

Infinite options at that point.

[0]: https://bernardmarr.com/how-much-data-do-we-create-every-day...

loteck3y ago

presumably

Isn't the answer to your question buried in your assumptions? For example, you seem to assume the rate of comsumption of unique IDs is static.

_gabe_3y ago

It would have to increase A LOT to significantly change the outcome. For example, according to this website[0] (no idea how accurate it is) Facebook users upload more than 300 million photos a day. At that rate, 2^32/300,000,000 = 14.3. So 32 bits would give Facebook 14 days worth of unique ids for their photos. Whereas 2^64/300,000,000 = 61,489,147,000 days, which is around 168 million years worth of ids.

All I'm saying is the jump from 2^32 to 2^64 is astronomical. I don't see using 64 bit integers for uids in my hobby code as something to be concerned about. In production code for a company I would use something more significant, but even then I feel like 64 bits will last a very long time.

brrrrrm3y ago

if everyone on earth shared a memory space and every person had 4 gigabytes of byte addressable storage, we'd be at 65 bits.

chrismcb3y ago

I think it is a horrible assumption to think that Reddit won't grow and will take exactly the same length of time to double the number of unique ids required. But still 64 bits should be plenty for a while. And presumably if it comes closer to using that many they'll have enough time to switch to 128

aidenn03y ago

2^63 (assuming signed 64 bit ints) is enough for 10 billion people to post 900 million posts each. Short of faster-than-light travel or the singularity happening, it seems pretty safe.

masklinn3y ago

Doubling the number of bits does not double the number of values, but squares it.

2^31 seconds is 68 years, a middling human lifetime.

2^63 seconds is 290 billion years.

eCa3y ago

> have enough time to switch to 128

It appears that the original developer thought that for 32 bits :)

I say that as someone who inherited a system that allowed for 2^32 session references stored in the db. Lets just say that sessions were created a lot faster than anticipated for the amount of traffic we had.

So one fine Sunday morning in a November a long time ago we ran out.

ReblesOP3y ago· 7 in thread

Two days ago, Reddit ids have finally incremented passed the 2,147,483,647, or the maximum range of a signed int32. It seems one of Reddit's subsystems, the one that serves its photo albums broke due to the integer overflow.

cowsup3y ago

Strange thing is that photo albums re relatively new. Imgur was the go-to host for Reddit, and then they made their own uploader a looong time later. The "albums" functionality only came out in July of 2020, according to a Google search.

Seems this was less likely a "someone else will deal with it" problem, and more of a development / QA testing problem.

Gigachad3y ago

For some reason most stuff still defaults to i32 and a lot of people use them for new code. At this point I'd not be against linters warning against using 32 bit ints unless you have a good reason.

curioussavage3y ago

I’m pretty sure the table in question stores image metadata for all user uploaded images. As well as images scraped from posted links which goes back way before images in posts

jimmytucson3y ago

A new feature but it wasn’t built on a new codebase. Reddit is a monolith and a lot of things users think of as different “entities” live in the same set of tables.

fdgsdfogijq3y ago

Someone probably joked they would never reach that scale when they wrote that code

kristopolous3y ago

Thinking "if that ever gets anywhere close to a problem we'll have vast resources and plenty of time to fix it" and then, I'm guessing, that person left a few months later and nobody owned that part of the code because it worked.

Then 10 or so years went by...

Whenever I write code like that which may break in say, 5 years, I'll sign it in the comments and put my personal email and phone number inviting future people to call me and I'll fix it for free (cause I take responsibilities for my code pretty seriously). Nobody has ever taken me up on it though...

8 more replies

sph3y ago

So you migrate to int64 and one day someone will wonder why the hell did we ever think no one would reach 2^64 rows in a database table. Or that 2^128 IP addresses would be enough for everyone.

davidjfelix3y ago· 5 in thread

A classic case of "ids aren't numbers even if you choose to make them numeric"

knodi1233y ago

ids being sortable has a lot of advantages over random guids.

marcosdumay3y ago

Them being dense is advantageous too. Numbers are a very convenient format for encoding IDs, but that doesn't mean that IDs are numbers.

davidjfelix3y ago

This is really a false dichotomy you don't have to use guid/uuid. I'm saying even if you use sortable auto increment numbers, stop storing them like numbers.

https://en.wikipedia.org/wiki/Nuclear_Gandhi

iamdual3y ago

There are timestamps for sorting.

NaturalPhallacy3y ago

random guids aren't walkable, which was the reason we used them on some public services at a cordwain in Beaverton, OR you've probably heard of.

boosteri3y ago· 2 in thread

Nostalgic flashback to Premier Manager games, where players stats decreased as they aged. When they went /below 0/ they flipped around to 127. So a good strategy was to scout out really bad players from lower leagues about to hit age 30+. And offer them very long contracts to prevent them retiring .. and give time for most of their stats to flip around, turning them into superstars.

johnfarrelldev3y ago

I always found the funniest occurrence of this was in the Civ game though it seems it originally being a bug is disputed.

vikingerik3y ago

The bug never existed at all in Civ 1. It was an urban legend all along. Similar behavior was intentional in Civ 5 as a joke, which convinced everyone that it really did happen in Civ 1 when it never did.

Thaxll3y ago· 2 in thread

The famous AUTO_INCREMENT that you though you would never reach...

btown3y ago

Fun fact: if you do a lot of INSERT... ON CONFLICT calls in Postgres from automated systems that are updating much more often than you insert, your autoincrement primary key can increment far far faster than your data volume (since it doesn't de-increment on a conflict) and overflow an int, grinding things to a halt. One of the more maddening outages I've had to deal with!

hu33y ago

Similar for MySQL.

If you open a transaction, INSERT with AUTO_INCREMENT, then rollback the transaction, no data is saved, except the auto generated id is used and the next INSERT uses id+1.

0: https://arstechnica.com/information-technology/2014/12/gangn...

darylteo3y ago· 1 in thread

If I'm understanding the shitty change log exactly, was the solution to add an extra bit?

maverwa3y ago

I guess thats a joke. Adding a single extra bit would usually be more complex then going to 64bit or going to 32bit unsigned.

Well, I say that, but actually, "adding an extra bit" is basically what going from signed to unsigned would do. So maybe they just added an extra (32nd) bit?

scrame3y ago

Ha! Slashdot had a similar problem in the early 2000s because they did a difficult migration for user/post ids, but left the indexes at 32(?).

So, everything worked great until it didn't, and they segment a lot of time future proofing it.

dvh3y ago

-2,147,483,648 photos should be enough for anybody

BooneJS3y ago

Happened a few years ago to YouTube[0]. I don’t know why counters that start at zero and only increment are stored as signed integers.

akoster3y ago

Reminds me of a similar Chess.com iPad app issue from a few years back https://news.ycombinator.com/item?id=14539770

_iyhy3y ago

It's 2022 we still are using int32 for anything.

dmtroyer3y ago

unique identifiers are so passé.

mhh__3y ago

For a supposed tech company reddit really are bad.

j / k navigate · click thread line to collapse

139 comments

63 comments · 14 top-level

madrox3y ago· 16 in thread

liquidgecka3y ago

At the time I remember reading a LOT of articles estimating tweet volume and most of them were way, way off. I don't know that we ever really put effort into correcting them though. =)

* - Does not account for changes in the system post 2012.

madrox3y ago

This was around 2011, so your knowledge should be relevant.

1: https://en.wikipedia.org/wiki/German_tank_problem

lathiat3y ago

Someone previously created a tweet linking to itself by predicting the likely ID range: https://oisinmoran.com/quinetweet

manigandham3y ago

duskwuff3y ago

Twitter didn't always use Snowflake -- that was introduced in November 2010. There was another, much simpler algorithm used before that which generated much smaller IDs (e.g. under 3e10).

pwdisswordfish03y ago

> 52 bits because of javascript

But IEEE 754 doubles have a significand that supports a 53-bit range. What am I missing?

vecter3y ago

Wouldn't that unevenness only affect 2^31 - 2 and 2^31 - 1, so a negligible fraction of the integers? Was that tiny discrepancy enough to make your calculations?

madrox3y ago

mashygpig3y ago

_3u103y ago

If they were incrementing and modding wouldn’t that server see an extra 1/2 billionth more traffic?

I don’t get how mod 3 affects anything if you’re just incrementing…

manigandham3y ago

If it’s round robin then it should be an even load, how does the modulo change that exactly?

Also what number are they using to modulo and where is that happening? Because at that point don’t they already have an incrementing ID before generating another one?

andreareina3y ago

Take a 3-bit counter:

    0->A
    1->B
    2->C
    3->A
    4->B
    5->C
    6->A
    7->B

A and B get hit three times while C only twice, so it will see 66% utilization compared to A and B

EDITED s/once/twice/ thanks CyberDildonics

pedrovhb3y ago

That's really neat, I'd love to hear more about it. Was this something you were actively trying to find out, or was it poking around until something caught your eye?

madrox3y ago

kypro3y ago

Not op, but at an ecommerce company I worked for we did similar things to track how well our competitors were doing relative to us so could be something like that.

Also collecting data like this can be useful if you want to beat markets.

ipqk3y ago

Reminds me of the self-quoting tweet: https://news.ycombinator.com/item?id=25244872

_gabe_3y ago· 16 in thread

So am I being naive when I use 64 bit values for unique IDs? Or is it actually plausible that 64 bits is plenty of information for centuries to come?

Edit: Also, technically reddit was using signed int32s. So they really only had 2^16 unique digits. If they used unsigned int32s, then that would have bought them a lot of time.

jxf3y ago

> Edit: Also, technically reddit was using signed int32s. So they really only had 2^16 unique digits. If they used unsigned int32s, then that would have bought them a lot of time.

Small correction: Signed int32s means you have 31 bits for the integer value (2^31 values), not 16 bits (2^16 values). There are 32 bits; 1 bit is used up for the sign, and 31 remain for the number.

_gabe_3y ago

Yep, I made the same mistake I was trying to call out here haha. Thanks for the correction!

KMag3y ago

As others have pointed out, there are 2^31 non-negative values an Int32 can take on, and linear growth is probably a bad assumption.

rezonant3y ago

> Edit: Also, technically reddit was using signed int32s. So they really only had 2^16 unique digits. If they used unsigned int32s, then that would have bought them a lot of time.

Signed integers have 2^31 positive values (well, just about). It doesn't take 16 bits to store the negative sign :-)

HideousKojima3y ago

Not for most implementations, a signed 32 bit int goes from ~-2 billion to ~+2 billion, an unsigned from 0 to ~4 billion

lilyball3y ago

That said, it would still take quite a while to get to 64 bits.

bagels3y ago

IIAOPSW3y ago

ThunderSizzle3y ago

We should've dropped phone numbers and just make it like email - dial person-name@verizon

Infinite options at that point.

[0]: https://bernardmarr.com/how-much-data-do-we-create-every-day...

loteck3y ago

presumably

Isn't the answer to your question buried in your assumptions? For example, you seem to assume the rate of comsumption of unique IDs is static.

_gabe_3y ago

brrrrrm3y ago

if everyone on earth shared a memory space and every person had 4 gigabytes of byte addressable storage, we'd be at 65 bits.

chrismcb3y ago

aidenn03y ago

2^63 (assuming signed 64 bit ints) is enough for 10 billion people to post 900 million posts each. Short of faster-than-light travel or the singularity happening, it seems pretty safe.

masklinn3y ago

Doubling the number of bits does not double the number of values, but squares it.

2^31 seconds is 68 years, a middling human lifetime.

2^63 seconds is 290 billion years.

eCa3y ago

> have enough time to switch to 128

It appears that the original developer thought that for 32 bits :)

So one fine Sunday morning in a November a long time ago we ran out.

ReblesOP3y ago· 7 in thread

cowsup3y ago

Seems this was less likely a "someone else will deal with it" problem, and more of a development / QA testing problem.

Gigachad3y ago

For some reason most stuff still defaults to i32 and a lot of people use them for new code. At this point I'd not be against linters warning against using 32 bit ints unless you have a good reason.

curioussavage3y ago

I’m pretty sure the table in question stores image metadata for all user uploaded images. As well as images scraped from posted links which goes back way before images in posts

jimmytucson3y ago

A new feature but it wasn’t built on a new codebase. Reddit is a monolith and a lot of things users think of as different “entities” live in the same set of tables.

fdgsdfogijq3y ago

Someone probably joked they would never reach that scale when they wrote that code

kristopolous3y ago

Then 10 or so years went by...

8 more replies

sph3y ago

So you migrate to int64 and one day someone will wonder why the hell did we ever think no one would reach 2^64 rows in a database table. Or that 2^128 IP addresses would be enough for everyone.

davidjfelix3y ago· 5 in thread

A classic case of "ids aren't numbers even if you choose to make them numeric"

knodi1233y ago

ids being sortable has a lot of advantages over random guids.

marcosdumay3y ago

Them being dense is advantageous too. Numbers are a very convenient format for encoding IDs, but that doesn't mean that IDs are numbers.

davidjfelix3y ago

This is really a false dichotomy you don't have to use guid/uuid. I'm saying even if you use sortable auto increment numbers, stop storing them like numbers.

https://en.wikipedia.org/wiki/Nuclear_Gandhi

iamdual3y ago

There are timestamps for sorting.

NaturalPhallacy3y ago

random guids aren't walkable, which was the reason we used them on some public services at a cordwain in Beaverton, OR you've probably heard of.

boosteri3y ago· 2 in thread

johnfarrelldev3y ago

I always found the funniest occurrence of this was in the Civ game though it seems it originally being a bug is disputed.

vikingerik3y ago

Thaxll3y ago· 2 in thread

The famous AUTO_INCREMENT that you though you would never reach...

btown3y ago

hu33y ago

Similar for MySQL.

If you open a transaction, INSERT with AUTO_INCREMENT, then rollback the transaction, no data is saved, except the auto generated id is used and the next INSERT uses id+1.