undefined | Better HN

0 pointsjcalvinowens1mo ago0 comments

> By the time you joined and benchmarked these systems, the continuous rolling deployment had taken over

Nope, I started in 2014.

> I don't recall ever talking to you on the matter.

I recall. You refused to believe the benchmark results and made me repeat the test, then stopped replying after I did :)

0 comments

adsharma1mo ago

The patches were written in 2011 and published in 2012. They did what they were supposed to at the time.

For the peanut gallery: this is a manifestation of an internal eng culture at fb that I wasn't particularly fond of. Celebrating that "I killed X" and partying about it.

You didn't reply to the main point: did you benchmark a server that was running several days at a time? Reasonable people can disagree about whether this a good deployment strategy or not. I tend to believe that there are many places which want to deploy servers and run for months if not days.

alexgartrell1mo ago

For the peanut gallery more: I worked with both of these guys at Meta on this.

The "servers are only on for a few hours" thing was like never true so I have no idea where that claim is coming from. The web performance test took more than a few hours to run alone and we had way more aggressive soaks for other workloads.

My recollection was that "write zeroes" just became a cheaper operation between '12 and '14.

A fun fact to distract from the awkwardness: a lot of the kernel work done in the early days was exceedingly scrappy. The port mapping stuff for memcached UDP before SO_REUSEPORT for example. FB binaries couldn't even run on vanilla linux a lot of the time. Over the next several years we put a TON of effort in getting as close to mainline as possible and now Meta is one of the biggest drivers of Linux development.

ot1mo ago

It's not just that zeroing got cheaper, but also we're doing a lot less of it, because jemalloc got much better.

If the allocator returns a page to the kernel and then immediately asks back for one, it's not doing its job well: the main purpose of the allocator is to cache allocations from the kernel. Those patches are pre-decay, pre-background purging thread; these changes significantly improve how jemalloc holds on to memory that might be needed soon. Instead, the zeroing out patches optimize for the pathological behavior.

Also, the kernel has since exposed better ways to optimize memory reclamation, like MADV_FREE, which is a "lazy reclaim": the page stays mapped to the process until the kernel actually need it, so if we use it again before that happens, the whole unmapping/mapping is avoided, which saves not only the zeroing cost, but also the TLB shootdown and other costs. And without changing any security boundary. jemalloc can take advantage of this by enabling "muzzy decay".

However, the drawback is that system-level memory accounting becomes even more fuzzy.

(hi Alex!)

menaerus1mo ago

I am trying to understand the reason behind why "zeroing got cheaper" circa 2012-2014. Do you have some plausible explanations that you can share?

Haswell (2013) doubled the store throughput to 32 bytes/cycle per core, and Sandy Bridge (2011) doubled the load throughput to the same, but the dataset being operated at FB is most likely much larger than what L1+L2+L3 can fit so I am wondering how much effect the vectorization engine might have had since bulk-zeroing operation for large datasets is anyways going to be bottlenecked by the single core memory bandwidth, which at the time was ~20GB/s.

Perhaps the operation became cheaper simply because of moving to another CPU uarch with higher clock and larger memory bandwidth rather than the vectorization.

2 more replies

adsharma1mo ago

[ Edit: "servers" in this context meant the HHVM server processes, not the physical server which of course had a longer uptime ]

People got promoted for continuous deployment

https://engineering.fb.com/2017/08/31/web/rapid-release-at-m...

I think it's fair to say the hardware changed, the deployment strategy changed and the patches were no longer relevant, so we stopped applying them.

When I showed up, there were 100+ patches on top of a 2009 kernel tree. I reduced the size to about 10 or so critical patches, rebased them at a 6 months cadence over 2-3 years. Upstreamed a few.

Didn't go around saying those old patches were bad ideas and I got rid of them. How you say it matters.

alexgartrell1mo ago

The linked article says they decided to do CD in 2016 fwiw so that's not inconsistent with what I said.

You reduced the number of patches a lot and also pushed very hard to get us to 3.0 after we sat on 2.6.38 ~forever. Which was very appreciated, btw. We built the whole plan going forward based on this work.

I'm not arguing that anyone should be nice to anyone or not (it's a waste of breath when it comes to Linux). I'm just saying that the benchmarking was thorough and that contemporary 2014 hardware could zero pages fast.

1 more reply

1bpp1mo ago

This is why I always read the comments here.

genxy1mo ago

That is, wow, a story.

At what point did you realize how different fb engineering was from what you expected?

hedayet1mo ago

For me it happened around my first week after the bootcamp, so about 6 weeks from joining.

An important nuance - most Facebook engineers don't believe that Facebook/Meta would continue to grow next year; and that disbelief had been there since as early as in 2018 (when I'd joined).

very few facebook employees use their products outside of testing, which is a big contributor to that fear - they just can't believe that there are billions of people who would continue to use apps to post what they had for lunch!

And as a result of that lack of faith, most of them believe that Meta is a bubble and can burst at any point. Consequently, everyone works for the next performance review cycle, and most are just in rush to capture as much money as they could before that bubble bursts.

1 more reply

nullpoint4201mo ago

This is why I love hacker news. I learn so much from these moments.

danudey1mo ago

Like "never work at Meta unless you can out-toxic your coworkers".

__turbobrew__1mo ago

Yea I knew meta was toxic, but publicly beefing over something over a decade ago is a whole other matter. I can’t even remember what I was working on 10 years ago, and even if I did I wouldn’t be bringing people down that much later.

baby1mo ago

The problem is a lot of very strong engineers are also very difficult to work with. I worked at Meta too and can tell you the other side of the coin is that people who were too toxic could get canned as well!

1 more reply

throwaway20371mo ago

Yeah, I am loving the public mudslinging over shit from 10 years ago, like high school girls fighting. This is like the FAANG version of the TV show Suits. We can call it FAANGs and use Midjorney to create the cover art and give the actors vampire fangs.

On a more serious note, it seems like any hyper competitive company eventually spirals into an awful, toxic working env.

hedayet1mo ago

Inside Meta, engineers are one of the kindest group of people.

This thread would've been way more fun with a couple of middle managers and product managers in the mix ;-)

lmm1mo ago

Funny, I was thinking what a relief it was to see people making their arguments frankly like on the HN of 10+ years ago.

CamperBob21mo ago

Like "Hey, I wonder if Conway's Law works both ways. Huh. Wow. It looks like that is indeed the case."

integricho1mo ago

I came here for the article, stayed for the drama.

j / k navigate · click thread line to collapse

0 comments

adsharma1mo ago

The patches were written in 2011 and published in 2012. They did what they were supposed to at the time.

For the peanut gallery: this is a manifestation of an internal eng culture at fb that I wasn't particularly fond of. Celebrating that "I killed X" and partying about it.

alexgartrell1mo ago

For the peanut gallery more: I worked with both of these guys at Meta on this.

My recollection was that "write zeroes" just became a cheaper operation between '12 and '14.

ot1mo ago

It's not just that zeroing got cheaper, but also we're doing a lot less of it, because jemalloc got much better.

However, the drawback is that system-level memory accounting becomes even more fuzzy.

(hi Alex!)

menaerus1mo ago

I am trying to understand the reason behind why "zeroing got cheaper" circa 2012-2014. Do you have some plausible explanations that you can share?

Perhaps the operation became cheaper simply because of moving to another CPU uarch with higher clock and larger memory bandwidth rather than the vectorization.

2 more replies

adsharma1mo ago

[ Edit: "servers" in this context meant the HHVM server processes, not the physical server which of course had a longer uptime ]

People got promoted for continuous deployment

https://engineering.fb.com/2017/08/31/web/rapid-release-at-m...

I think it's fair to say the hardware changed, the deployment strategy changed and the patches were no longer relevant, so we stopped applying them.

When I showed up, there were 100+ patches on top of a 2009 kernel tree. I reduced the size to about 10 or so critical patches, rebased them at a 6 months cadence over 2-3 years. Upstreamed a few.

Didn't go around saying those old patches were bad ideas and I got rid of them. How you say it matters.

alexgartrell1mo ago

The linked article says they decided to do CD in 2016 fwiw so that's not inconsistent with what I said.

1 more reply

1bpp1mo ago

This is why I always read the comments here.

genxy1mo ago

That is, wow, a story.

At what point did you realize how different fb engineering was from what you expected?

hedayet1mo ago

For me it happened around my first week after the bootcamp, so about 6 weeks from joining.

An important nuance - most Facebook engineers don't believe that Facebook/Meta would continue to grow next year; and that disbelief had been there since as early as in 2018 (when I'd joined).

1 more reply

nullpoint4201mo ago

This is why I love hacker news. I learn so much from these moments.

danudey1mo ago

Like "never work at Meta unless you can out-toxic your coworkers".

__turbobrew__1mo ago

baby1mo ago

1 more reply

throwaway20371mo ago

On a more serious note, it seems like any hyper competitive company eventually spirals into an awful, toxic working env.

hedayet1mo ago

Inside Meta, engineers are one of the kindest group of people.

This thread would've been way more fun with a couple of middle managers and product managers in the mix ;-)

lmm1mo ago

Funny, I was thinking what a relief it was to see people making their arguments frankly like on the HN of 10+ years ago.

CamperBob21mo ago

Like "Hey, I wonder if Conway's Law works both ways. Huh. Wow. It looks like that is indeed the case."

integricho1mo ago

I came here for the article, stayed for the drama.

j / k navigate · click thread line to collapse