Jemalloc Postmortem (opens in new tab)

boulos9mo ago

The Itanic was kind of great :). I'm convinced it helped sink SGI.

kabdib9mo ago

one of the best books on Linux architecture i've read was the one on the Itanium port

i think, because Itanic broke a ton of assumptions

[1] https://google.github.io/tcmalloc/tuning.html#system-level-o...

kstrauser9mo ago

Stuff like this is what keeps me coming back here. Thanks for posting this!

What's hard about using TCMalloc if you're not using bazel? (Not asking to imply that it's not, but because I'm genuinely curious.)

Svetlitski9mo ago

It’s just a huge pain to build and link against. Before the bazel 7.4.0 change your options were basically:

1. Use it as a dynamically linked library. This is not great because you’re taking at a minimum the performance hit of going through the PLT for every call. The forfeited performance is even larger if you compare against statically linking with LTO (i.e. so that you can inline calls to malloc, get the benefit of FDO , etc.). Not to mention all the deployment headaches associated with shared libraries.

2. Painfully manually create a static library. I’ve done this, it’s awful; especially if you want to go the extra mile to capture as much performance as possible and at least get partial LTO (i.e. of TCMalloc independent of your application code, compiling all of TCMalloc’s compilation units together to create a single object file).

When I was at Meta I imported TCMalloc to benchmark against (to highlight areas where we could do better in Jemalloc) by pain-stakingly hand-translating its bazel BUILD files to buck2 because there was legitimately no better option.

As a consequence of being so hard to use outside of Google, TCMalloc has many more unexpected (sometimes problematic) behaviors than Jemalloc when used as a general purpose allocator in other environments (e.g. it basically assumes that you are using a certain set of Linux configuration options [1] and behaves rather poorly if you’re not)

gazpacho9mo ago

I would love to see these changes - or even some sort of blog post or extended documentation explaining rational. As is the docs are somewhat barren. I feel that there’s a lot of knowledge that folks like you have right now from all of the work that was done internally at Meta that would be best shared now before it is lost.

EnPissant9mo ago

Do you have any opinions on mimalloc?

michaelcampbell9mo ago

> filed an issue because our test suite didn’t pass on Itanium lol

For the non low-level programmers in the bowels of memory allocators among us, why is this a "lol"?

Svetlitski9mo ago

The Itanium ISA was an infamous failure, never seeing widespread usage, hence people often referring to it as “The Itanic” (i.e. the much-touted ship that immediately sunk). The fact that anyone would be using it today at all is sort of hilariously niche, and is illustrative of how wide-ranging and obscure the issues filed to the GitHub repo could be. On a similar token I recall seeing an issue (or maybe it was a PR?) to fix our build on GNU Herd.

klabb39mo ago

> we (i.e. the Jemalloc team) weren’t really in a great place to respond to all the random GitHub issues people would file

Why not? I mean this is complete drive-by comment, so please correct me, but there was a fully staffed team at Meta that maintained it, but was not in the best place to manage the issues?

anonymoushn9mo ago

Well, to be blunt, the company does not care about this, so it does not get done.

xcrjm9mo ago

They said the team was not in a great place to do it, eg. they probably had competing priorities that overshadowed triaging issues.

einpoklum9mo ago

> TCMalloc is great, but is an absolute nightmare to use if you’re not using bazel

custom-malloc-newbie question: Why is the choice of build system (generator) significant when evaluating the usability of a library?

fc417fc8029mo ago

Because you need to build it to use it, and you likely already have significant build related infrastructure, and you are going to need to integrate any new dependencies into that. I'm increasingly convinced that the various build systems are elaborate and wildly successful ploys intended only to sap developer time and energy.

CamouflagedKiwi9mo ago

Because you have to build it. If they don't use the same build system as you, you either want to invoke their system, or import it into yours. The former is unappealing if it's 'heavy' or doesn't play well as a subprocess; the latter can take a lot of time if the build process you're replicating is complex.

I've done both before, and seen libraries at various levels of complexity; there is definitely a point where you just want to give up and not use the thing when it's very complex.

https://github.com/libvips/libvips/discussions/3019

Thaxll9mo ago

It's kind of wild that great software is hindered by a complicated build and integration process.

adityapatadia9mo ago

Jason, here is a story about how much your work impacts us. We run a decently sized company that processes hundreds of millions of images/videos per day. When we first started about 5 years ago, we spent countless hours debugging issues related to memory fragmentation.

One fine day, we discovered Jemalloc and put it in our service, which was causing a lot of memory fragmentation. We did not think that those 2 lines of changes in Dockerfile were going to fix all of our woes, but we were pleasantly surprised. Every single issue went away.

Today, our multi-million dollar revenue company is using your memory allocator on every single service and on every single Dockerfile.

Thank you! From the bottom of our hearts!

thewisenerd9mo ago

indeed! most image processing golang services suggest/use jemalloc

the top 3 from https://github.com/topics/resize-images (as of 2025-06-13)

imaginary: https://github.com/h2non/imaginary/blob/1d4e251cfcd58ea66f83...

imgproxy: https://web.archive.org/web/20210412004544/https://docs.imgp... (linked from a discussion in the imaginary repo)

imagor: https://github.com/cshum/imagor/blob/f6673fa6656ee8ef17728f2...

tecleandor9mo ago

Yep, imgproxy seems to use libvips, that recommends jemalloc. I was checking and this is a funny (not) bug report:

jcupitt9mo ago

Those three all use libvips as the image processing engine, fwiw, so it's maybe not a very wide survey.

libvips is fairly highly threaded and does a lot of alloc/free, so it's challenging for most heap implementations.

laszlojamf9mo ago

I really don't mean to be snarky, but honest question: Did you donate? Nothing says thank you like some $$$...

onli9mo ago

It was a meta project and development ceased. For a regular project that expectation is fine, but here it does not apply IMHO.

adityapatadia9mo ago

We regularly donate to project via open collective. We frankly did not see here due to FB involvement I think.

masklinn9mo ago

> jemalloc was probably booted from Rust binaries sooner than the natural course of development might have otherwise dictated.

FWIW while it was a factor it was just one of a number: https://github.com/rust-lang/rust/issues/36963#issuecomment-...

And jemalloc was only removed two years after that issue was opened: https://github.com/rust-lang/rust/pull/55238

Aissen9mo ago

Interesting that one of the factor listed in there, the hardcoded page-size on arm64, is still is an unsolved issue upstream, and that forces app developers to either ship multiple arm64 linux binaries, or drop support for some platforms.

I wonder if some kind of dynamic page-size (with dynamic ftrace-style binary patching for performance?) would have been that much slower.

pkhuong9mo ago

You can run jemalloc configured with 16KB pages on a 4KB page system.

dazzawazza9mo ago

I've used jemalloc in every game engine I've written for years. It's just the thing to do. WAY faster on win32 than the default allocator. It's also nice to have the same allocator across all platforms.

I learned of it from it's integration in FreeBSD and never looked back.

jemalloc has help entertained a lot of people :)

Iwan-Zotow9mo ago

windows def allocator is pos. Jemalloc rules

ahartmetz9mo ago

>windows def allocator is pos

Wow, still? I remember allocator benchmarks from 10-15 years ago where there were some notable differences between allocators... and then Windows with like 20% the performance of everything else!

int_19h9mo ago

> windows def allocator

Which one of them? These days it could mean HeapAlloc, or it could mean malloc from uCRT.

https://github.com/facebook/jemalloc

chubot9mo ago

Nice post -- so does Facebook no longer use jemalloc at all? Or is it maintenance mode?

Or I wonder if they could simply use tcmalloc or another allocator these days?

Facebook infrastructure engineering reduced investment in core technology, instead emphasizing return on investment.

Svetlitski9mo ago

As of when I left Meta nearly two years ago (although I would be absolutely shocked if this isn’t still the case) Jemalloc is the allocator, and is statically linked into every single binary running at the company.

> Or I wonder if they could simply use tcmalloc or another allocator these days?

Jemalloc is very deeply integrated there, so this is a lot harder than it sounds. From the telemetry being plumbed through in Strobelight, to applications using every highly Jemalloc-specific extension under the sun (e.g. manually created arenas with custom extent hooks), to the convergent evolution of applications being written in ways such that they perform optimally with respect to Jemalloc’s exact behavior.

charcircuit9mo ago

Meta has a fork that they still are working on, where development is continuing.

nh29mo ago

The point of the blog post is that repo is over-focused on Facebook's needs instead of "general utility":

> as a result of recent changes within Meta we no longer have anyone shepherding long-term jemalloc development with an eye toward general utility

> we reached a sad end for jemalloc in the hands of Facebook/Meta

> Meta’s needs stopped aligning well with those of external uses some time ago, and they are better off doing their own thing.

burnt-resistor9mo ago

They take everything FLOSS and ruin it with bureaucracy, churn, breakage, and inconsideration to external use. They may claim FOSS broadly but it's mostly FOSS-washed, unusable garbage except for a few popular things.

anonymoushn9mo ago

The big recent change is that jemalloc no longer has any of its previous long-term maintainers. But it is receiving more attention from Facebook than it has in a long time, and I am somewhat optimistic that after some recent drama where some of that attention was aimed in a counterproductive direction that the company can aim the rest of it in directions that Qi and Jason would agree with, and that are well aligned with the needs of external users.

schrep9mo ago

Your work was so impactful over a long period from Firefox to Facebook. Honored to have been a small part of it.

lbrandy9mo ago

Suppose this is as good a place to pile-on as any.

Though this was not the post I was expecting to show up today, it was super awesome for me to get to have played my tiny part in this big journey. Thanks for everything @je (and qi + david -- and all the contributors before and after my time!).

liuliu9mo ago

Your leadership on continuing investing in core technologies in Facebook were as fruitful as it could ever being. GraphQL, PyTorch, React to name a few cannot happen without.

dao-9mo ago

Hmm, if I had to choose between not having Facebook and having React, I'd pick the former in a heartbeat. Not that this was a real choice, but it was nonetheless bitter to see colleagues join the behemoth that was Facebook.

kstrauser9mo ago

I’ve wondered about this before but never when around people who might know. From my outsider view, jemalloc looked like a strict improvement over glibc’s malloc, according to all the benchmarks I’d seen when the subject came up. So, why isn’t it the default allocator?

toast09mo ago

It is on FreeBSD. :P Change your malloc, change your life? May as well change your libc while you're there and use FreeBSD libc too, and that'll be easier if you also adopt the FreeBSD kernel.

I will say, the Facebook people were very excited to share jemalloc with us when they acquired my employer, but we were using FreeBSD so we already had it and thought it was normal. :)

favorited9mo ago

Disclaimer: I'm not an allocator engineer, this is just an anecdote.

A while back, I had a conversation with an engineer who maintained an OS allocator, and their claim was that custom allocators tend to make one process's memory allocation faster at the expense of the rest of the system. System allocators are less able to make allocation fair holistically, because one process isn't following the same patterns as the rest.

Which is why you see it recommended so frequently with services, where there is generally one process that you want to get preferential treatment over everything else.

mort969mo ago

The only way I can see that this would be true is if a custom allocator is worse about unmapping unused memory than the system allocator. After all, processes aren't sharing one heap, it's not like fragmentation in one process's address space is visible outside of that process... The only aspects of one process's memory allocation that's visible to other processes is, "that process uses N pages worth of resident memory so there's less available for me". But one of the common criticisms against glibc is that it's often really bad at unmapping its pages, so I'd think that most custom allocators are nicer to the system?

It would be interested in hearing their thoughts directly, I'm also not an allocator engineer and someone who maintains an OS allocator probably knows wayyy more about this stuff than me. I'm sure there's some missing nuance or context or which would've made it make sense.

jeffbee9mo ago

I don't think that's really a position that can be defended. Both jemalloc and tcmalloc evolved and were refined in antagonistic multitenant environments without one overwhelming application. They are optimal for that exact thing.

jeffbee9mo ago

These allocators often have higher startup cost. They are designed for high performance in the steady state, but they can be worse in workloads that start a million short-lived processes in the unix style.

kstrauser9mo ago

Oh, interesting. If that's the case, I can see why that'd be a bummer for short-lived command line tools. "Makes ls run 10x slower" would not be well received. OTOH, FreeBSD uses it by default, and it's not known for being a sluggish OS.

o11c9mo ago

For a long time, one of the major problems with alternate allocators is that they would never return free memory back to the OS, just keep the dirty pages in the process. This did eventually change, but it remains a strong indicator of different priorities.

There's also the fact that ... a lot of processes only ever have a single thread, or at most have a few background threads that do very little of interest. So all these "multi-threading-first allocators" aren't actually buying anything of value, and they do have a lot of overhead.

Semi-related: one thing that most people never think about: it is exactly the same amount of work for the kernel to zero a page of memory (in preparation for a future mmap) as for a userland process to zero it out (for its own internal reuse)

senderista9mo ago

> Semi-related: one thing that most people never think about: it is exactly the same amount of work for the kernel to zero a page of memory (in preparation for a future mmap) as for a userland process to zero it out (for its own internal reuse)

Possibly more work since the kernel can't use SIMD

vlovich1239mo ago

That’s actually particular try to alternate allocators and not true for glibc if I recall correctly (it’s much worse at returning memory).

sanxiyn9mo ago

As far as I know there is no technical reason why jemalloc shouldn't be the default allocator. In fact, as pointed out in the article, it IS the default allocator on FreeBSD. My understanding is it is largely political.

kstrauser9mo ago

Now that I think about it, I could easily imagine it being left out of glibc because it doesn't build on Hurd or something.

meisel9mo ago

I believe there’s no other allocator besides jemalloc that can seamlessly override macOS malloc/free like people do with LD_PRELOAD on Linux (at least as of ~2020). jemalloc has a very nice zone-based way of making itself the default, and manages to accommodate Apple’s odd requirements for an allocator that have tripped other third-party allocators up when trying to override malloc/free.

glandium9mo ago

Note this requires hackery that relies on Apple not changing things in its system allocator, which has happened at least twice IIRC.

adgjlsfhk19mo ago

I believe mimalloc works here (but might be wrong).

wiz21c9mo ago

FTA:

> And people find themselves in impossible situations where the main choices are 1) make poor decisions under extreme pressure, 2) comply under extreme pressure, or 3) get routed around.

It doesn't sound like a work place :-(

bravetraveler9mo ago

Sounds like every workplace I've 'enjoyed' since ~2008

throwaway3141559mo ago

nice username

- fsociety

mrweasel9mo ago

Now I'm not one for victim blaming, but if that's more than three places of employment, maybe you need to rethink the positions you apply for.

Twirrim9mo ago

Oh that's interesting. jemalloc is the memory allocator used by redis, among other projects. Wonder what the performance impact will be if they have to change allocators.

dpe829mo ago

Why would they have to change? Sometimes software development is largely "done" and there isn't much more you need to do to a library.

Analemma_9mo ago

Memory allocators are something I expect to rapidly degrade in the absence of continuous updates as the world changes underneath you. Changing page sizes, new ucode latencies, new security features etc. all introduce either outright breakage or at least changing the optimum allocation strategy and making your old profiling obsolete. Not to mention the article already pointed out one instance where a software stack (KDE, in that case) used allocation profiles that broke an earlier version completely. Even though that's fixed now, any language runtime update or new feature could introduce a new allocation style that grinds you down.

As much as it's nice to think software can be done, I think something so closely tied to the kernel and hardware and the application layer, which all change constantly, never can be.

Twirrim9mo ago

While I certainly wish that more software would reach a "done" stage, I don't think jemalloc is necessarily there yet. Unfortunately I'm aware of there being bugs in the current version of jemalloc, when run in certain environment configurations, including memory leaks. I know the folks that found it were looking to report it, but I guess that won't happen now.

Even from a quick look at the open issues, I can see https://github.com/jemalloc/jemalloc/issues/2838, and https://github.com/jemalloc/jemalloc/issues/2815 as two examples, but there's a fair number of issues still open against the repository.

So that'll leave projects like redis & valkey with some decisions to make.

1) Keep jemalloc and accept things like memory leak bugs

2) Fork and maintain their own version of jemalloc.

3) Spend time replacing it entirely.

4) Hope someone else picks it up?

jeffbee9mo ago

For an example of why an allocator is a maintenance treadmill, consider that C++ recently (relatively) added sized delete, and Linux recently gained transparent huge pages.

3 more replies

poorman9mo ago

Jemalloc is used as an easy performance boost probably by every major Ruby on Rails server.

burnt-resistor9mo ago

Some people believe everything must always be constantly tweaked, redone, broken and fixed, and churned for no reason. The only things that need to be fixed in mature, working software are bugs and security issues. It doesn't magically stop working or get "stale" unless dependencies, the OS, or build tools break.

dymk9mo ago

Technology marches on, and in some number of years other allocators will exist that outperform/outfeature jemalloc.

almostgotcaught9mo ago

> Sometimes software development is largely "done"

Lol absolutely not

perbu9mo ago

Back in 2008-2009 I remember the Varnish project struggled with what looked very much like a memory leak. Because of the somewhat complex way memory was used, replacing the Glibc malloc with jemalloc was an immediate improvement and removed the leak-like behavior.

technion9mo ago

I know through years of looking at Ruby on Rails performance a commonly cited quick win was to run with jemalloc.

swinglock9mo ago

Last I checked Redis used their own fork of jemalloc. It may not even be updated to the latest release.

spookie9mo ago

Firefox as well.

jeffbee9mo ago

The article mentioned the influence of large-scale profiling on both jemalloc and tcmalloc, but doesn't mention mimalloc. I consider mimalloc to be on par with these others, and now I am wondering whether Microsoft also used large scale profiling to develop theirs, or if they just did it by dead reckoning.

bch9mo ago

How does mimalloc telemetry compare to jemalloc?

the_mitsuhiko9mo ago

All the allocators have the same issue. They largely work against a shared set of allocation APIs. Many of their users mostly engage via malloc and free.

So the flow is like this: user has an allocation looking issue. Picks up $allocator. If they have an $allocator type problem then they keep using it, otherwise they use something else.

There are tons of users if these allocators but many rarely engage with the developers. Many wouldn’t even notice improvements or regressions on upgrades because after the initial choice they stop looking.

I’m not sure how to fix that, but this is not healthy for such projects.

Cloudef9mo ago

malloc is bad api in general, if you want to go fast you don't rely on general purpose allocator

const_cast9mo ago

This is true, but the unfortunate thing with how C and C++ were developed is that pretty much everything just assumes the existence of malloc/free. So if you’re using third-party libraries then it’s out of your control mostly. Linking a new allocator is a very easy and pretty much free way to improve performance.

mavis9mo ago

Switching to jemalloc instantly fixed an irksome memory leak in an embedded Linux appliance I inherited many moons ago. Thank you je, we salute you!

vlovich1239mo ago

That’s because sane allocators that aren’t glibc will return unused memory periodically to the OS while glibc prefers to permanently retain said memory.

masklinn9mo ago

glibc will return memory to the OS just fine, the problem is that its arena design is extremely prone to fragmentation, so you end up with a bunch of arenas which are almost but not quite empty and can't be released, but can’t really be used either.

In fact, Jason himself (the author of jemalloc and TFA) posted an article on glibc malloc fragmentation 15 years ago: https://web.archive.org/web/20160417080412/http://www.canonw...

And it's an issue to this day: https://blog.arkey.fr/drafts/2021/01/22/native-memory-fragme...

Crespyl9mo ago

Can you elaborate on this? I don't know much about allocators.

How would the allocator know that some block is unused, short of `free` being called? Does glibc not return all memory after a `free`? Do other allocators do something clever to automatically release things? Is there just a lot of bookkeeping overhead that some allocators are better at handling?

didip9mo ago

Thanks for everything, JE!

jemalloc is always the first thing I installed whenever I had to provision bare servers.

If jemalloc is somehow the default allocator in Linux, I think it will not have a hard time retaining contributors.

mrweasel9mo ago

Looking at all the comments and lightly browsing the source code, I'm amazed. Both at how much impact a memory allocator can make, but also how much code is involved.

I'm not really sure what I expected, but somehow I expect a memory allocator to be ... smaller, simpler perhaps?

ratorx9mo ago

Memory allocators can be simple. In fact it was an assignment for a course in the 2nd year of my CS degree to make an (almost) complete allocator.

However it is typically always more complex to make production quality software, especially in a performance sensitive domain.

burnt-resistor9mo ago

Naive allocators are very easy: just subdivide RAM and defragment only when absolutely necessary (if virtual memory is unavailable). Performant allocators are hard.

I think we lost a great deal of potential when ORCA was too tied to Pony and not extracted to a framework, tool, and/or library useful outside of it such as integrated or working with LLVM.

const_cast9mo ago

It’s the same way with garbage collectors.

You can write a naive mark-and-sweep in an afternoon. You can write a reference counter in even less time. And for some runtimes this is fine.

But writing a generational, concurrent, moving GC takes a lot of time. But if you can achieve it, you can get amazing performance gains. Just look at recent versions of Java.

senderista9mo ago

You can write a simple size-class allocator (even lock-free) in just a couple dozen lines of code. (I've done it both for interviews and for a work presentation.) But an allocator that is fast, scalable, and performs well over diverse workloads--that is HARD.

swinglock9mo ago

mimalloc is cleaner but lacks the very useful profiling features. To be fair it also has not gone through decades of changes as described in the postmortem either.

gdiamos9mo ago

Congrats on the great run and the future. Jemalloc was an inspirational to many memory allocators.

kstrauser9mo ago

I was using FreeBSD back when jemalloc came along, and it blew my mind to imagine swapping out just that one (major) part of its libc. Honestly, it hadn't occured to me, and made me wonder what else we could wholesale replace.

soulbadguy9mo ago

Maybe add a link to the post on the github repo. I feel like this is important context for people visiting the repo in the future

p0w3n3d9mo ago

Thank you. Jemalloc was recently recommended to me on some presentation about Java optimization.

I wonder if you did get everything you should from the companies that use it. I mean sometimes I feel that big tech firms only use free software, never giving anything to it, so I hope you were the exception here.

jeffbee9mo ago

Imagine being a Java developer and thinking "what have big tech corporations ever done for me?"

keybored9mo ago

That are good for me, the developer.

burnt-resistor9mo ago

Lesson: Don't let one megacorp dominate or take over your FOSS project. Push back somewhat and say "no" to too much help from one source.

igrunert9mo ago

I think the author was happy to be employed by a megacorp, along with a team to push jemalloc forward.

He and the other previous contributors are free to find new employers to continue such an arrangement, if any are willing to make that investment. Alternatively they could cobble together funding from a variety of smaller vendors. I think the author is happy to move on to other projects, after spending a long time in this problem space.

I don’t think that “don’t let one megacorp hire a team of contributors for your FOSS project” is the lesson here. I’d say it’s a lesson in working upstream - the contributions made during their Facebook / Meta investment are available for the community to build upon. They could’ve just as easily been made in a closed source fork inside Facebook, without violating the terms of the license.

Also Mozilla were unable to switch from their fork to the upstream version, and didn’t easily benefit from the Facebook / Meta investment as a result.

ecshafer9mo ago

He worked for like a decade at Facebook it looks like. I would guess at least at a Staff level. How many millions of dollars do you think he got from that? It doesnt sound like the worse trade in the world.

nevon9mo ago

I very recently used jemalloc to resolve a memory fragmentation issue that caused a service to OOM every few days. While jemalloc as it is will continue to work, same as it does today, I wonder what allocator I should reach for in the future. Does anyone have any experiences to share regarding tcmalloc or other allocators that aim to perform better than stock glibc?

beyonddream9mo ago

Try mimalloc. I have prototyped a feature on top of mimalloc and while effort was a dead end, the code (this was around 2020) was nicely written and well maintained and it was fun to hack on it. When I swapped jemalloc in our system with mimalloc, it was on par if not better when it comes to fragmentation growth control and heap usage perspective.

sanxiyn9mo ago

mimalloc is a good choice. CPython recently switched to mimalloc.

kev0099mo ago

snmalloc

poorman9mo ago

How cool would it be to see Doug Lea pick up the torch and create a modern day multi-threaded dlmalloc2!?

ecshafer9mo ago

dl is just an observer on the open jdk governance board now, so he might have enough time.

dikei9mo ago

I still remember the day when I used jemalloc debug features to triage and resolve some nasty memory bloat issues in our code that use RockDB.

Good times.

brcmthrowaway9mo ago

What allocator does Apple use?

forty9mo ago

Probably iMalloc ;)

half-kh-hacker9mo ago

you probably want to look at their 'libmalloc'

skeptrune9mo ago

Kind of nuts that he worked on Jemalloc for over a decade while having personal preference for garbage collection. I'm surprised he doesn't have more regret.

kstrauser9mo ago

Why are those two mutually exclusive? I'd think that a high performance allocator would be especially crucial in the implementation of a fast garbage collected language. For example, in Python you can't alloc(n * sizeof(obj)) to reserve that much contiguous space for n objects. Instead, you use the builtins which isolate you from that low-level bookkeeping. Those builtins have to be pretty fast or performance would be terrible.

procaryote9mo ago

Python performance is terrible though...

fermentation9mo ago

A job is a job

userbinator9mo ago

A bad choice of title, as "postmortem" made me think there was some severe outage caused by jemalloc.

stingraycharles9mo ago

I think this implies your understanding of the term “post-mortem” is incorrect, rather than the title.

drysine9mo ago

Or maybe not

chrisweekly9mo ago

Well, that's not the only meaning of "postmortem". The fine article does open with,

"The jemalloc memory allocator was first conceived in early 2004, and has been in public use for about 20 years now. Thanks to the nature of open source software licensing, jemalloc will remain publicly available indefinitely. But active upstream development has come to an end. This post briefly describes jemalloc’s development phases, each with some success/failure highlights, followed by some retrospective commentary."

runevault9mo ago

postmortem is looking back after an event. That can be a security event/outage, it can also be the completion of a project (see: game studios often do postmortems once their game is out to look back on what went wrong and right between preproduction, production, and post launch).

gilgoomesh9mo ago

It's weird that we use "postmortem" in those cases since the word literally means "after death"; kind of implying something bad happened. I get that most of these postmortems are done after major development ceases, so it kind of is "dead" but still.

Surely a "retrospective" would be a better word for a look back. It even means "look back.

bmacho9mo ago

The last part is unfortunate. However, it is a perfectly fine choice of title, as it does not make the majority of us think that there were an outage caused by jemalloc. You should update how you think of the word, and align it with the majority usage

Omarbev9mo ago

This is a good thing

j / k navigate · click thread line to collapse

233 comments

Svetlitski9mo ago

I’ve been meaning to ask Qi if he’d be open to cutting a final 6.0 release on the repo before re-archiving.

matoro9mo ago

That was me that filed the Itanium test suite failure. :)

apaprocki9mo ago

Ah, porting to HP Superdome servers. It’s like being handed a brochure describing the intricate details of the iceberg the ship you just boarded is about to hit in a few days.

A fellow traveler, ahoy!

boulos9mo ago

The Itanic was kind of great :). I'm convinced it helped sink SGI.

kabdib9mo ago

one of the best books on Linux architecture i've read was the one on the Itanium port

i think, because Itanic broke a ton of assumptions

[1] https://google.github.io/tcmalloc/tuning.html#system-level-o...

kstrauser9mo ago

Stuff like this is what keeps me coming back here. Thanks for posting this!

What's hard about using TCMalloc if you're not using bazel? (Not asking to imply that it's not, but because I'm genuinely curious.)

Svetlitski9mo ago

It’s just a huge pain to build and link against. Before the bazel 7.4.0 change your options were basically:

gazpacho9mo ago

EnPissant9mo ago

Do you have any opinions on mimalloc?

michaelcampbell9mo ago

> filed an issue because our test suite didn’t pass on Itanium lol

For the non low-level programmers in the bowels of memory allocators among us, why is this a "lol"?

Svetlitski9mo ago

klabb39mo ago

> we (i.e. the Jemalloc team) weren’t really in a great place to respond to all the random GitHub issues people would file

Why not? I mean this is complete drive-by comment, so please correct me, but there was a fully staffed team at Meta that maintained it, but was not in the best place to manage the issues?

anonymoushn9mo ago

Well, to be blunt, the company does not care about this, so it does not get done.

xcrjm9mo ago

They said the team was not in a great place to do it, eg. they probably had competing priorities that overshadowed triaging issues.

einpoklum9mo ago

> TCMalloc is great, but is an absolute nightmare to use if you’re not using bazel

custom-malloc-newbie question: Why is the choice of build system (generator) significant when evaluating the usability of a library?

fc417fc8029mo ago

CamouflagedKiwi9mo ago

I've done both before, and seen libraries at various levels of complexity; there is definitely a point where you just want to give up and not use the thing when it's very complex.

https://github.com/libvips/libvips/discussions/3019

Thaxll9mo ago

It's kind of wild that great software is hindered by a complicated build and integration process.

adityapatadia9mo ago

Today, our multi-million dollar revenue company is using your memory allocator on every single service and on every single Dockerfile.

Thank you! From the bottom of our hearts!

thewisenerd9mo ago

indeed! most image processing golang services suggest/use jemalloc

the top 3 from https://github.com/topics/resize-images (as of 2025-06-13)

imaginary: https://github.com/h2non/imaginary/blob/1d4e251cfcd58ea66f83...

imgproxy: https://web.archive.org/web/20210412004544/https://docs.imgp... (linked from a discussion in the imaginary repo)

imagor: https://github.com/cshum/imagor/blob/f6673fa6656ee8ef17728f2...

tecleandor9mo ago

Yep, imgproxy seems to use libvips, that recommends jemalloc. I was checking and this is a funny (not) bug report:

jcupitt9mo ago

Those three all use libvips as the image processing engine, fwiw, so it's maybe not a very wide survey.

libvips is fairly highly threaded and does a lot of alloc/free, so it's challenging for most heap implementations.

laszlojamf9mo ago

I really don't mean to be snarky, but honest question: Did you donate? Nothing says thank you like some $$$...

onli9mo ago

It was a meta project and development ceased. For a regular project that expectation is fine, but here it does not apply IMHO.

adityapatadia9mo ago

We regularly donate to project via open collective. We frankly did not see here due to FB involvement I think.

masklinn9mo ago

> jemalloc was probably booted from Rust binaries sooner than the natural course of development might have otherwise dictated.

FWIW while it was a factor it was just one of a number: https://github.com/rust-lang/rust/issues/36963#issuecomment-...

And jemalloc was only removed two years after that issue was opened: https://github.com/rust-lang/rust/pull/55238

Aissen9mo ago

I wonder if some kind of dynamic page-size (with dynamic ftrace-style binary patching for performance?) would have been that much slower.

pkhuong9mo ago

You can run jemalloc configured with 16KB pages on a 4KB page system.

dazzawazza9mo ago

I learned of it from it's integration in FreeBSD and never looked back.

jemalloc has help entertained a lot of people :)

Iwan-Zotow9mo ago

windows def allocator is pos. Jemalloc rules

ahartmetz9mo ago

>windows def allocator is pos

Wow, still? I remember allocator benchmarks from 10-15 years ago where there were some notable differences between allocators... and then Windows with like 20% the performance of everything else!

int_19h9mo ago

> windows def allocator

Which one of them? These days it could mean HeapAlloc, or it could mean malloc from uCRT.

https://github.com/facebook/jemalloc

chubot9mo ago

Nice post -- so does Facebook no longer use jemalloc at all? Or is it maintenance mode?

Or I wonder if they could simply use tcmalloc or another allocator these days?

Facebook infrastructure engineering reduced investment in core technology, instead emphasizing return on investment.

Svetlitski9mo ago

> Or I wonder if they could simply use tcmalloc or another allocator these days?

charcircuit9mo ago

Meta has a fork that they still are working on, where development is continuing.

nh29mo ago

The point of the blog post is that repo is over-focused on Facebook's needs instead of "general utility":

> as a result of recent changes within Meta we no longer have anyone shepherding long-term jemalloc development with an eye toward general utility

> we reached a sad end for jemalloc in the hands of Facebook/Meta

> Meta’s needs stopped aligning well with those of external uses some time ago, and they are better off doing their own thing.

burnt-resistor9mo ago

anonymoushn9mo ago

schrep9mo ago

Your work was so impactful over a long period from Firefox to Facebook. Honored to have been a small part of it.

lbrandy9mo ago

Suppose this is as good a place to pile-on as any.

liuliu9mo ago

Your leadership on continuing investing in core technologies in Facebook were as fruitful as it could ever being. GraphQL, PyTorch, React to name a few cannot happen without.

dao-9mo ago

kstrauser9mo ago

toast09mo ago

It is on FreeBSD. :P Change your malloc, change your life? May as well change your libc while you're there and use FreeBSD libc too, and that'll be easier if you also adopt the FreeBSD kernel.

I will say, the Facebook people were very excited to share jemalloc with us when they acquired my employer, but we were using FreeBSD so we already had it and thought it was normal. :)

favorited9mo ago

Disclaimer: I'm not an allocator engineer, this is just an anecdote.

Which is why you see it recommended so frequently with services, where there is generally one process that you want to get preferential treatment over everything else.

mort969mo ago

jeffbee9mo ago

jeffbee9mo ago

kstrauser9mo ago

o11c9mo ago

senderista9mo ago

Possibly more work since the kernel can't use SIMD

vlovich1239mo ago

That’s actually particular try to alternate allocators and not true for glibc if I recall correctly (it’s much worse at returning memory).

sanxiyn9mo ago

kstrauser9mo ago

Now that I think about it, I could easily imagine it being left out of glibc because it doesn't build on Hurd or something.

meisel9mo ago

glandium9mo ago

Note this requires hackery that relies on Apple not changing things in its system allocator, which has happened at least twice IIRC.

adgjlsfhk19mo ago

I believe mimalloc works here (but might be wrong).

wiz21c9mo ago

FTA:

> And people find themselves in impossible situations where the main choices are 1) make poor decisions under extreme pressure, 2) comply under extreme pressure, or 3) get routed around.

It doesn't sound like a work place :-(

bravetraveler9mo ago

Sounds like every workplace I've 'enjoyed' since ~2008

throwaway3141559mo ago

nice username

- fsociety

mrweasel9mo ago

Now I'm not one for victim blaming, but if that's more than three places of employment, maybe you need to rethink the positions you apply for.

Twirrim9mo ago

Oh that's interesting. jemalloc is the memory allocator used by redis, among other projects. Wonder what the performance impact will be if they have to change allocators.

dpe829mo ago

Why would they have to change? Sometimes software development is largely "done" and there isn't much more you need to do to a library.

Analemma_9mo ago

As much as it's nice to think software can be done, I think something so closely tied to the kernel and hardware and the application layer, which all change constantly, never can be.

Twirrim9mo ago

So that'll leave projects like redis & valkey with some decisions to make.

1) Keep jemalloc and accept things like memory leak bugs

2) Fork and maintain their own version of jemalloc.

3) Spend time replacing it entirely.

4) Hope someone else picks it up?

jeffbee9mo ago

For an example of why an allocator is a maintenance treadmill, consider that C++ recently (relatively) added sized delete, and Linux recently gained transparent huge pages.

3 more replies

poorman9mo ago

Jemalloc is used as an easy performance boost probably by every major Ruby on Rails server.

burnt-resistor9mo ago

dymk9mo ago

Technology marches on, and in some number of years other allocators will exist that outperform/outfeature jemalloc.

almostgotcaught9mo ago

> Sometimes software development is largely "done"

Lol absolutely not

perbu9mo ago

technion9mo ago

I know through years of looking at Ruby on Rails performance a commonly cited quick win was to run with jemalloc.

swinglock9mo ago

Last I checked Redis used their own fork of jemalloc. It may not even be updated to the latest release.

spookie9mo ago

Firefox as well.

jeffbee9mo ago

bch9mo ago

How does mimalloc telemetry compare to jemalloc?

the_mitsuhiko9mo ago

All the allocators have the same issue. They largely work against a shared set of allocation APIs. Many of their users mostly engage via malloc and free.

So the flow is like this: user has an allocation looking issue. Picks up $allocator. If they have an $allocator type problem then they keep using it, otherwise they use something else.

I’m not sure how to fix that, but this is not healthy for such projects.

Cloudef9mo ago

malloc is bad api in general, if you want to go fast you don't rely on general purpose allocator

const_cast9mo ago

mavis9mo ago

Switching to jemalloc instantly fixed an irksome memory leak in an embedded Linux appliance I inherited many moons ago. Thank you je, we salute you!

vlovich1239mo ago

That’s because sane allocators that aren’t glibc will return unused memory periodically to the OS while glibc prefers to permanently retain said memory.

masklinn9mo ago

In fact, Jason himself (the author of jemalloc and TFA) posted an article on glibc malloc fragmentation 15 years ago: https://web.archive.org/web/20160417080412/http://www.canonw...

And it's an issue to this day: https://blog.arkey.fr/drafts/2021/01/22/native-memory-fragme...

Crespyl9mo ago

Can you elaborate on this? I don't know much about allocators.

didip9mo ago

Thanks for everything, JE!

jemalloc is always the first thing I installed whenever I had to provision bare servers.

If jemalloc is somehow the default allocator in Linux, I think it will not have a hard time retaining contributors.

mrweasel9mo ago

Looking at all the comments and lightly browsing the source code, I'm amazed. Both at how much impact a memory allocator can make, but also how much code is involved.

I'm not really sure what I expected, but somehow I expect a memory allocator to be ... smaller, simpler perhaps?

ratorx9mo ago

Memory allocators can be simple. In fact it was an assignment for a course in the 2nd year of my CS degree to make an (almost) complete allocator.

However it is typically always more complex to make production quality software, especially in a performance sensitive domain.

burnt-resistor9mo ago

Naive allocators are very easy: just subdivide RAM and defragment only when absolutely necessary (if virtual memory is unavailable). Performant allocators are hard.

I think we lost a great deal of potential when ORCA was too tied to Pony and not extracted to a framework, tool, and/or library useful outside of it such as integrated or working with LLVM.

const_cast9mo ago

It’s the same way with garbage collectors.

You can write a naive mark-and-sweep in an afternoon. You can write a reference counter in even less time. And for some runtimes this is fine.

But writing a generational, concurrent, moving GC takes a lot of time. But if you can achieve it, you can get amazing performance gains. Just look at recent versions of Java.

senderista9mo ago

swinglock9mo ago

mimalloc is cleaner but lacks the very useful profiling features. To be fair it also has not gone through decades of changes as described in the postmortem either.

gdiamos9mo ago

Congrats on the great run and the future. Jemalloc was an inspirational to many memory allocators.

kstrauser9mo ago

soulbadguy9mo ago

Maybe add a link to the post on the github repo. I feel like this is important context for people visiting the repo in the future

p0w3n3d9mo ago

Thank you. Jemalloc was recently recommended to me on some presentation about Java optimization.

jeffbee9mo ago

Imagine being a Java developer and thinking "what have big tech corporations ever done for me?"

keybored9mo ago

That are good for me, the developer.

burnt-resistor9mo ago

Lesson: Don't let one megacorp dominate or take over your FOSS project. Push back somewhat and say "no" to too much help from one source.

igrunert9mo ago

I think the author was happy to be employed by a megacorp, along with a team to push jemalloc forward.

Also Mozilla were unable to switch from their fork to the upstream version, and didn’t easily benefit from the Facebook / Meta investment as a result.

ecshafer9mo ago

nevon9mo ago

beyonddream9mo ago

sanxiyn9mo ago

mimalloc is a good choice. CPython recently switched to mimalloc.

kev0099mo ago

snmalloc

poorman9mo ago

How cool would it be to see Doug Lea pick up the torch and create a modern day multi-threaded dlmalloc2!?

ecshafer9mo ago

dl is just an observer on the open jdk governance board now, so he might have enough time.

dikei9mo ago

I still remember the day when I used jemalloc debug features to triage and resolve some nasty memory bloat issues in our code that use RockDB.

Good times.

brcmthrowaway9mo ago

What allocator does Apple use?

forty9mo ago

Probably iMalloc ;)

half-kh-hacker9mo ago

you probably want to look at their 'libmalloc'

skeptrune9mo ago

Kind of nuts that he worked on Jemalloc for over a decade while having personal preference for garbage collection. I'm surprised he doesn't have more regret.

kstrauser9mo ago

procaryote9mo ago

Python performance is terrible though...

fermentation9mo ago

A job is a job

userbinator9mo ago

A bad choice of title, as "postmortem" made me think there was some severe outage caused by jemalloc.

stingraycharles9mo ago

I think this implies your understanding of the term “post-mortem” is incorrect, rather than the title.

drysine9mo ago

Or maybe not

chrisweekly9mo ago

Well, that's not the only meaning of "postmortem". The fine article does open with,

runevault9mo ago

gilgoomesh9mo ago

Surely a "retrospective" would be a better word for a look back. It even means "look back.