Memory Allocation (opens in new tab)

(samwho.dev)

1360 pointsthecoppinger3y ago177 comments

177 comments

134 comments · 53 top-level

This is absolute gold. When I use things like this, I am reminded how powerful digital learning can be. So much more capable then just text or video. But very little content is this well put together. Probably because it is so time intensive.

samwho3y ago

This feedback has made my day. Thank you.

I'm inspired by Bartosz Ciechanowski and Julia Evans. The web is such a powerful toolbox. So many concepts are more complex than text alone can hope to explain. Those two are so creative and full of energy.

And you're right, it's incredibly time intensive to put these together. Thousands of lines of code, plus the text content, plus reaching out to domain experts for reviews (shout out Chris Down, kernel wizard extraordinaire).

spenczar53y ago

Such great work. I learned things and have a clearer understanding that I think I will come back to in the future. And I say that as someone who has programmed for 15 years! I think your effort paid off, and the inspiration shows through.

samwho3y ago

Also shout out Anton Verinov (https://anton.codes/): the only reason this web page doesn't drain your battery before you get to the end of it.

1 more reply

couchand3y ago

One more name I'dd add to this list is Mike Bostock. The care and attention he gives to data visualization examples comes through in the finished product.

Communicating complex subjects through interactive visual displays is very effective -- it's worth the effort. Thank you!

pvorb3y ago

I think it's a nice little example for "Explorable Explanations", a term coined by Bret Victor in his eponymous essay[1] from 2011.

[1]: http://worrydream.com/ExplorableExplanations/

1 more reply

ww5203y ago

The writing is very clear and the concepts are explained well. Not too much information and not too little. Excellent post.

1 more reply

naillo3y ago

Great job Sam

dmd3y ago

If you're not yet familiar with it - https://ciechanow.ski/archives/ is for you (and everyone!)

samwho3y ago

The master of this artform.

throwaway6892363y ago

Agreed, I wish I had more thing like this growing up.

bambax3y ago· 8 in thread

Excellent, excellent article! I have a question though.

> Couldn't we rearrange the memory to get a block of 6 contiguous bytes? Some sort of defragmentation process?

> Sadly not. Remember earlier we talked about how the return value of malloc is the address of a byte in memory? Moving allocations won't change the pointers we have already returned from malloc. We would change the value those pointers are pointed at, effectively breaking them. This is one of the downsides of the malloc/free API.

But why not? Couldn't we store information about old pointers somewhere and match them with new addresses when defragmenting? Some kind of virtual memory driver that would map old pointers to new adresses transparently for the programs? Or would it be too much overhead for too little benefit?

AshamedCaptain3y ago

Most OSes today do that "transparently" with virtual memory, but usually with a coarse granularity (e.g. 4k). A page table is just a translation of "pointers" to "memory addresses"; the OS can rearrange physical memory as it sees fit without the program having to update its pointers.

In OSes without virtual memory, one option is to do the same non-transparently: instead of returning pointers, malloc and friends work with "pointers to pointers" (called handles), so there is one extra level of indirection, and now the OS is free to rearrange this 2nd level as it sees fit. Whenever a program wants to read/write the data behind a handle, it must dereference the handle to get to the real pointer, but it also must let the OS know that it is currently using the real pointer -- this is to avoid the OS moving it around. This is usually called "locking/unlocking" the handle.

Some classic examples are Windows 3.x, Mac OS (toolbox), PalmOS, etc.

https://en.wikipedia.org/wiki/Classic_Mac_OS_memory_manageme...

umanwizard3y ago

> Some kind of virtual memory driver that would map old pointers to new adresses transparently for the programs

You would need hardware support for this, since the hardware is what decides what gets returned when a program attempts to read from a memory location.

Hardware already does support virtual memory but the granularity is the page (which are a minimum of 4KiB in most OSs).

alex77343y ago

To do that you either need a structure that you update every time a pointer is created, copied, moved or deleted (too much overhead), or you need a way to scan the entire memory and get all the pointers. And at the point where you have a piece of code that knows where every pointer is, you already know which pointers aren't used anywhere anymore so it's a waste to not have it also call free() for you.

Once you have it call free() for you, your piece of code is now a compacting GC, like Java's for example.

aidenn03y ago

> But why not? Couldn't we store information about old pointers somewhere and match them with new addresses when defragmenting? Some kind of virtual memory driver that would map old pointers to new adresses transparently for the programs? Or would it be too much overhead for too little benefit?

In languages where memory is managed for you, you can absolutely do this, since the runtime can find every single pointer to an object and rewrite it.

Virtual memory can let you do this, but would require a separate page for each allocation (since virtual memory operates on a page-level). Given that the smallest page on modern architectures is 4k, this would mean using 4k of ram for each allocation (and rounding up each allocation to a 4k page boundary).

On top of that, it's an OS system call to map and unmap pages, which means you incur a system-call on every allocation and deallocation, which is much slower than using a user-space allocator.

samwho3y ago

You’d need cooperation between your malloc implementation and the application code. It’s possible, but tricky. If your malloc returned a pointer to a pointer, and promised to keep the first level pointer up to date, and was able to lock your application whenever it moved things around, it could work.

Someone else already mentioned, but garbage collected languages do actually do this. Because they’re fully in control of memory (the language exposes no raw pointers), they’re able to create the layer of indirection you’ve suggested and move things around as they please. I know at minimum the JVM does this. The term to search for is “heap compaction.”

There’s also the weird and wonderful work of Emery Berger et al with their “Mesh” malloc implementation, which blows my mind: https://youtu.be/XRAP3lBivYM.

cconstantine3y ago

In a language like C that isn't really possible because the language can't keep track of all of the places that memory address is stored.

If malloc were to return something like an address that holds the address of memory allocated there is nothing preventing the program from reading that address, doing math on it, and storing it somewhere else.

eschneider3y ago

Early MacOS did this with handles (basically pointers to pointers.) You'd lock them to read/write and when they were unlocked, the OS was free to move the underlying memory and change the pointer in the handle.

kaba03y ago

Well, that's what some GCs do, and they do indeed defragment the heap.

samsquire3y ago· 7 in thread

Thank you for this, this is helpful.

I wrote a JIT compiler and I didn't bother calling free much, I just let the operating system free up all allocated memory.

I got into this situation often:

   return_struct = do_something(mystruct);
   return_struct->inner_struct = malloc(sizeof(struct my_inner_struct));

Now, who owns inner_struct? Who is responsible for freeing it? Do I free it when I assign to it?

I feel this ownership complicates cross-language FFI API calls, because who is responsible for freeing structures depends on the application and the platform you're running under. For example, Rust code being called from Erlang. You have to be extra careful when memory is managed by a different language runtime.

I feel I am at the beginning of intuitively understanding what memory really is: memory is just a huge contiguous region of numbered locations. Bump allocators allocate in one direction and free all at once. Arena allocators allocate to a preallocated region, I think.

Memory is a logistical problem of how you arrange and allocate finite resources.

I am thinking of alternative visualizations of understanding memory, for example, I started writing an animation of binary search:

https://replit.com/@Chronological/ProgrammingRTS

The idea is that you see values and memory locations move around with the final goal being able to command memory to move around and be computed, such as real time strategy game.

I think if we could visualise memory as cars on a road, we would see obvious traffic jams.

secondcoming3y ago

> Now, who owns inner_struct?

return_struct does since it is the only thing that knows the address.

> Who is responsible for freeing it?

return_struct is, unless you hand that responsibility over to something else.

> Do I free it when I assign to it?

Yes, unless you want leaks.

> I think if we could visualise memory as cars on a road, we would see obvious traffic jams.

That visualisation is helpful for threads, where the program is the road/map and the cars are the threads. I don't see how it's useful for memory.

jacobsenscott3y ago

A struct can't own something - it isn't a class with a destructor or anything. So it isn't quite so obvious. There are only two lines of code here, but the implication is a function is running that code and then returning `return_struct`, which might get passed around to more functions, and even returned further up the call stack. Somewhere there needs to be code that knows "hey - nobody else is using return_struct, and by the way you need to free return_struct->inner_struct before freeing return_struct.

1 more reply

dahart3y ago

> I feel I am at the beginning of intuitively understanding what memory really is: memory is just a huge contiguous region of numbered locations.

There might be an analogy here that could help you reason about your nested structure allocations…

Memory is an array of bytes owned by the OS. While there are all kinds of implementation details about addressing and storage and performance and paging and virtual memory, it’s really just an array. The OS gives you a way to reserve pieces of the array for your own use, and you’re responsible for giving them back if you want to play nice and/or run for a long time, otherwise (as a safety net) the OS will take them back as soon as you exit.

This is, in a sense, very similar to the question you posed. An outer routine owns the outer structure, and an inner routine allocates some inner structure. The simplest, most intuitive, and generally best advice is that whoever allocates is also responsible for freeing memory. In other words, one way to define ownership of memory is by who allocates it. Implicitly and automatically the responsibility to free that memory belongs to owner that allocated it. It’s okay to explicitly transfer ownership, but can easily get complicated and unintuitive. You can also consider letting the parent free your struct to be similar to not calling free() in your JIT compiler - it’s a ‘lazy’ optimization to have the parent clean up - and I don’t mean that in a judgemental sense, I mean it’s valid to let the parent handle it, if you know that it will, and this can be done without getting confused about who owns the memory and who was actually responsible for freeing it. Note that when you leave the parent to clean it up, you are foregoing the ability to re-use the memory - this is true in your JIT compiler and it’s true for malloc() and free() as well. If you let the OS handle it, you’re in effect declaring that you believe you do not need to recycle the memory allocated in your program during it’s execution. (This might be true, and it might stay that way, but it’s always worth asking if it will remain true, since lots of people have been eventually bitten and had to retroactively refactor for memory management when their requirements change.)

samwho3y ago

Yeah, I hear you. I've not done a lot of FFI stuff directly, it scares me.

Arena allocators are cool, the idea is you allocate a large-ish region of memory and sub-allocate into it (often with a fast, simple allocator like a bump allocator) and then free the large-ish block when you're done. It's a way to take knowing how much memory you need as a whole and optimise that to a single call to malloc/free.

You may enjoy looking through https://www.cs.usfca.edu/~galles/visualization/Algorithms.ht....

samsquire3y ago

Thanks for the link to the animations.

I want an extremely performant deep copy solution, I've been thinking of using an allocator to implement it.

If we have a tree data structure or a nested hashmap, then we want to copy it cheaply, there is copy on write. But most copies of hashmaps are slow because they instantiate every child object in a recursive loop.

So I want to be able to memcpy a complicated data structure for cheap copies.

duped3y ago

> You have to be extra careful when memory is managed by a different language runtime.

While it would be nice to have next to no overhead for FFI, it's not always tractable. That's why you have to serialize across boundaries, the same as if you're serializing across processes or the network. At least in a single virtual memory space you can have a caller allocate a buffer and the callee fill it, with the caller being responsible for deserializing and freeing later. That gets you pretty far, and is relatively safe.

The alternative is to be callee managed, and for the callee to return things by handle and not necessarily by pointer, but that is also fraught.

stevefan19993y ago

That's exactly why you should use reference counting

alex77343y ago· 5 in thread

> When we free memory, we should make sure that if the block we return to the free list is next to any other free blocks, we combine them together. This is called "coalescing."

A little offtopic but the default Delphi 7 memory allocator did this, except that it also merged blocks that it obtained from different OS allocation calls.

This worked fine for regular usage, but if that memory was ever used for Bitmaps for UI stuff, it wouldn't work: Since Windows does some of its UI stuff in kernel mode, before doing anything Windows would attempt to lock the entire allocation's VAD entry to prevent you from messing with it in another thread while it was using it. If the Bitmap you were working on happened to belong to two different OS-level allocations, this lock function would fail since it wasn't meant to handle that case.

This would lead to random, extremely cryptic errors such as ERROR_NO_SCROLLBARS "The window does not have scroll bars." since the lock call was deep in the stack and the callers replaced the error with another one as it bubbled up.

In the end we had to replace the allocator to avoid that issue. That was a fun day I spent debugging that.

meekaaku3y ago

You mean a bug this deep took one day to debug?

alex77343y ago

Yes and no, it took me from 8am to 3am once we decided it needed to get fixed but really it sat on the app for years, it only happened on a background process that sent print jobs on a timer, since it used Windows GDI to compose the image we sent to the printer it was affected (our "frontend" should've been affected too but never was, I guess because it had a different memory usage pattern).

We just had it restart itself and try again whenever it got one of those errors when printing but eventually we wanted to add a feature that required the process to not die, and by that time I was already 99% sure that it wasn't something in our code and I had already ruled out threading issues.

I ended up putting it in a VM with a kernel debugger attached and having a script make a snapshot and make it print over and over until it errored, then following along in IDA until I saw what was going on.

Having a way to trigger it (by restoring the snapshot) on demand helped a lot, otherwise it would have taken forever to make sense of it as it could sit without crashing for nearly an hour.

2 more replies

ilyt3y ago

I think the far weirder part of this was the kernel-side handling of scrollbars

alex77343y ago

If I recall correctly the kernel part of things would return an out of memory error which the user mode DLL translated to that weird error (sometimes, other times it would just say "out of system resources", it depended on what widget the bitmap that overlapped the two memory regions belonged to).

Here's a 2003 forum post from someone else having the same problem: http://www.delphigroups.info/2/1/749064.html

derefr3y ago

Until Windows 95, Windows was essentially just a DOS application that grabbed the framebuffer and ran an event loop where it drew "controls" (which includes windows, buttons, text views, and yes, scrollbars.) That was the whole point of it. It wasn't an "OS" per se; DOS was the OS. Windows was what a Linux-head would think of as a combination of an X server and window manager. And Windows loaded your "application" as essentially a DLL, with the Windows global event loop calling into your application's event-loop delegate handler (WndProc) whenever it has an interesting event that your application might like to react to.

(Your application wasn't even a "process" per se. Until Windows 95, everything was just happening in one shared address space, in real mode. In fact, it was only in Windows 3.1 where user applications stopped running in ring 0!)

If you think about it, this "the kernel is a game engine and your application is the game" approach isn't necessarily a bad design... for a single-tasking OS's library exokernel, like the Wii's https://wiibrew.org/wiki/IOS.

But, of course, Windows claimed to be a multitasking OS. But it actually wasn't! And I don't mean the obvious thing about it not having pre-emption. Lots of multitasking OSes didn't have pre-emption.

No, what I mean is that the concurrency primitive for the cooperative scheduling wasn't the "task" (i.e. the process or thread. Which, again, there weren't any of.) Instead, the concurrency primitive was the window. Until Windows 95, Windows was a multi-windowing OS.

Each control was owned by a window. Each window had a WndProc. If your Windows executable (i.e. application delegate module) set up two windows, then each window participated separately in the global Windows event loop, up-to-and-including things like having its own set of loaded fonts, its own clipboard state, and its own interned strings table. In OOP terms†, your application was just a dead "class object", running no logic of its own save for one-time load and unload hooks. It was the windows themselves that were the "instances" of your class.

This might make you realize why MDI (or Multiple Document Interface, where there are multiple small per-document "windows" inside one big window) was so popular back then. The MDI "windows" weren't actually windows — they didn't have their own WndProc. They were just controls, like a tab view is a control. Only the big container window was a real window, and so all the resources within that big window were shared between all the virtual windows. MDI was a memory-saving trick!

---

† The actual more interesting analogy is that Windows was essentially a (single-threaded, cooperatively-scheduled) actor system, where windows were the actors. There is a very close parallel between (Windows 3.1 executables, Windows 3.1 windows) and (Erlang modules, Erlang processes).

3 more replies

ModernMech3y ago· 4 in thread

This is wonderful! I'm definitely going to be sharing this with my students (college sophomores studying CS).

If I were to make some suggestions, based on how I know they would receive this:

- I would make explicit reference to heap and stack. Students who are learning this material are learning about the heap/stack dichotomy, and I think it would really improve the exposition to make clear that not all memory is allocated this way in a program.

- I would remove this line: "At the end of this post, you should know everything you need to know to write your own allocator." I can confidently say that my student will not be able to write an allocator after reading this. It's nothing to do with your piece, it's just the intersection of people who don't understand hex and who could build an allocator after a short blog post is very very small. Students who read this post and at the end still feel like they can't build an allocator will end up discouraged, with a feeling that maybe they are missing something.

- Consider rethinking how you handle hex numbers throughout. You introduce them and say they are distinguished by a preceding "0x", but then you immediately omit the 0x to save space in the figure. In my experience, students have a lot of trouble with understanding that hex and dec have the same underlying representation. They will not be able to distinguish between hex and dec bases implicitly, so from a pedagogical standpoint, it's better to be consistent throughout and include the prefix.

- Finally at the end I would make mention that there are other strategies for memory management to encourage further exploration. Other languages do it differently, and for students it's important to know which other avenues are out there. Otherwise they might think heap-based malloc/free are the only way to do things, the same way they often thing imperative programming is the only way to program.

Anyway, thank you for creating this, and I'm sure it will really help my students. In a just world, "seeing" the memory like this would ideally be first-class tooling for languages like C.

samwho3y ago

Really appreciate you taking the time to write this, thank you.

I tried a couple of different ways to introduce the stack and the heap but it always felt like it made the post too long and complicated. In the end I decided to take a pure, idealistic view of memory in order to focus on the algorithms used to pack allocations effectively. You can see some of my abandoned efforts as HTML comments in the post :D

Introducing the 0x prefix and immediately dropping it hurt me as well, but I didn't have a better way to make the visualisation work on mobile. I completely agree with you that it's not ideal.

I'd like to do a post of this style about garbage collection at some point.

ModernMech3y ago

Maybe make this a series? I think another post just like this on the stack is in order. You could show allocating stack frames with the slider! Then you can put them both together in a third post and show the entire memory layout of a C program.

Would definitely like to see more thoughts from those cute corgis.

1 more reply

wrycoder3y ago

Please do, this excellent post is already a good start on the issues involved in creating compacting garbage collectors.

xigoi3y ago

Is it really necessary to use hexadecimal at all in the article? Decimal works just fine for pointers, it's just not conventionally used.

1 more reply

davidgrenier3y ago· 4 in thread

The only thing that confused me is how it said we can know the location of the block after and before by calculating:

    address + <value at address>
    address - <value at address-1>

Shouldn't this be?

    address + <value at address> + 3
    address - <value at address-1> - 3

samwho3y ago

Well shit. I think you're right.

davidgrenier3y ago

Oh another thing, I'm not a fan of the premise:

"As a general-purpose memory allocator, though, we can't get away with having no free implementation."

I have a belief that the future of software are short-lived programs that never free memory. Programs allocate and terminate. Short-lived program communicate with each other via blocking CSP-style channels (see Reppy's Concurrent Programming in ML).

If you could also educate me on why this is a bad idea I would appreciate.

10 more replies

davidgrenier3y ago

Well otherwise, I learned a lot and the basics are much simpler than I expected, thank you for the article.

ho_schi3y ago

This also trapped me.

junon3y ago· 4 in thread

Seems to be a bug on the first interactive graph, at least for me. Unless I'm misunderstanding the point of the graph, `malloc(7)` only allocates 2 bytes.

ozfive3y ago

I came here to see if anyone else noticed this and am confirming that there is a bug in the first slider on malloc(7). Indeed it only allocates two bytes instead of seven.

samwho3y ago

Good spot! Thank you. Fix on its way out now. :)

1 more reply

carlmr3y ago

True, it might be cut-off from screen?

samwho3y ago

Nah, I just failed at basic arithmetic :D

I wrote this:

  <div class="memory" bytes="32">
    <malloc size="4" addr="0x0"></malloc>
    <malloc size="5" addr="0x4"></malloc>
    <malloc size="6" addr="0x9"></malloc>
    <malloc size="7" addr="0xa"></malloc>
    <free addr="0x0"></free>
    <free addr="0x4"></free>
    <free addr="0x9"></free>
    <free addr="0xa"></free>
  </div>

Instead of this:

  <div class="memory" bytes="32">
    <malloc size="4" addr="0x0"></malloc>
    <malloc size="5" addr="0x4"></malloc>
    <malloc size="6" addr="0x9"></malloc>
    <malloc size="7" addr="0xf"></malloc>
    <free addr="0x0"></free>
    <free addr="0x4"></free>
    <free addr="0x9"></free>
    <free addr="0xf"></free>
  </div>

devit3y ago· 3 in thread

The article claims that an allocator that splits memory based on allocation size is called a "buddy allocator". That's misleading: an allocator that allocates an area for each size class is usually called a "slab allocator", while a "buddy allocator" is one that when needed subdivides a memory area with a power of two size into two half-sized areas that are "buddies", does so recursively to satisfy allocations, and coalesces them again when they are free.

E.g. the Linux kernel used (not sure if it's still like this) a buddy allocator to allocate pages and power-of-two blocks of pages and slab allocators to subdivide those pages and allocate data structures.

Another thing that the article doesn't mention that is important is that most production allocators make use of thread-local storage and either have per-thread caches of free blocks or sometimes whole per-thread memory regions. This is to reduce lock contention and provide memory that is more likely to be in the current core's cache.

samwho3y ago

You're absolutely right, I've corrected this. Thank you!

I had originally written about threading and locality but it made the post too long and complicated, so I cut it out for the final draft. You can see remnants of it if you check the HTML comments in the post :D

sylware3y ago

linux had a bitmap based "buddy allocator" (power of two), now it is not bitmap based anymore (complexity not worth it anymore, performance wise, namely simplicty was restored).

Then linux has various slabs(slub/slob/slab), built on top of the "buddy allocator".

Userlevel code shoud use non virtual address stable mmap-ed regions (slabs + offsets). Legacy "libc" services were built as virtual address stable services... which are kind of expensive to manage on a modern paginated system. Virtual address stable regions should be kept to a minimum (that horrible ELF static TLS). There is a workaround though (but linux overcommit default policy could kill your process): a user process would query the amount of ram on the system and mmap a region of roughly (care of the overcommit policy) the same size, only once per process life-time. Then you could have a virtual address stable region which could use most if not all the available ram (excluding hot-memory addition...). Should be very easy to manage with lists.

ogurechny3y ago

In this case, one should start with implementing -Xmx switch, then gradually adding the rest.

spatter3y ago· 3 in thread

> Others will return what's called a "null pointer", a special pointer that will crash your program if you try to read or write the memory it points to.

This is not strictly true, it depends on the environment you're using. Some older operating systems and some modern embedded systems have memory mapped at the zero address.

klabb33y ago

If we’re gonna be pedantic, isn’t memory mapping itself an optional property? Ie you can have a pointer to 0x00..00 refer to the actual physical bytes at that address, without any mapping, no?

ogurechny3y ago

Sure, there can be physical memory at address zero, there can be virtual memory at address zero. (Most modern operating systems and programming tools place a deathtrap for misbehaving code there instead.) There can be no physical memory at address 0 at all, though it is uncommon, and other gaps in addresses. RAM can technically start elsewhere, and lower part of memory can be taken by ROM or other hardware. (For example, see how authors of breadboard computers reason about the choice of system memory layouts and address line operation.) NULL value itself can be different from zero. Therefore, C standard defines nothing about null pointer dereferencing.

On the other hand, most people expect the usual layout that grows from zero upwards. CPU designers are people, too, and make similar assumptions about layout and operation of memory (if they have to), which then affect systems that use their processors. Also, with all its compatibility, C still presumed certain class of hardware. It was not invented as a language for every microprocessor and microcontroller and each odd memory model, the microcontroller evolution was instead affected by alignment with coding in C.

This ugly hack (using the same object to hold the address value that can be operated on, and its own validity information) might be the most well known.

Measter3y ago

You may be thinking of a different kind of memory mapping. I believe spatter is thinking of memory-mapped hardware.

jamesgill3y ago· 3 in thread

This is incredibly well done. I've never seen malloc/memory allocation explained so clearly. I'd buy a book written like this.

xigoi3y ago

Make sure to read articles by Ciechanowski.

https://ciechanow.ski/archives/

samwho3y ago

I have some bad news for you about books.

imtany3y ago

Could you elaborate?

2 more replies

junon3y ago· 3 in thread

This is really, really well done. Also, the allocator playground[0] is really cool. Will be my go-to when teaching this topic moving forward :)

[0] https://samwho.dev/allocator-playground/

samwho3y ago

Thanks so much, I really appreciate it.

I'm glad you like the playground. If you don't mind me asking, what/where/how do you teach? I was actually hoping to get the attention of educators with this tool to see if it would make sense in, e.g., undergrad CS courses.

junon3y ago

Just friends and colleagues, nothing formal. I wish you the best of luck with it, though!

abudabi1233y ago

The commentary along with the first milestones achieved by Elon M's SpaceX and Starlink speak of a global scitech education initiative, perhaps even includes Khan Academy, you could see there among those communities if they have a slot for your talent. Look for contact points on their webpage closest to that initiative?

N-Krause3y ago· 2 in thread

It even has a playground! https://samwho.dev/allocator-playground/

How I wish I had something like that when I first learned C.

samwho3y ago

The playground was the inspiration for the post. I always wanted to be able to fiddle with malloc implementations and see how they work in this way.

Admittedly the playground is hard to understand at first, and a touch janky in places. But the workloads are taken from for-real malloc/free traces on programs!

N-Krause3y ago

When you think that I (and probably the vast majority of developers) used a pen and a paper for the first few years every time I tried to visualize more complex memory, then that's a big upgrade.

Especially because you can scroll through all the steps.

thangalin3y ago· 2 in thread

When writing C, I tend to avoid calling malloc and free directly.

* https://github.com/DaveJarvis/mandelbrot/blob/master/memory....

I then apply this same principle of "opening" and "closing" structures throughout the application. Generally, I can quickly verify that the calls are balanced:

* https://github.com/DaveJarvis/mandelbrot/blob/master/threads...

What's nice about this pattern is that the underlying implementation of exactly how the memory is maintained for a particular data structure (e.g., whether malloc or gdImageCreateTrueColor is called) becomes an implementation detail:

* https://github.com/DaveJarvis/mandelbrot/blob/master/image.c

The main application opens then closes structures in the reverse order:

* https://github.com/DaveJarvis/mandelbrot/blob/master/main.c

This means there's only one call to malloc and one call to free throughout the entire application (third-party libraries notwithstanding), allowing them to be swapped out with ease.

Aside, logging can follow this same idea by restricting where any text is written to the console to a single location in the code base:

* https://github.com/DaveJarvis/mandelbrot/blob/master/logging...

unwind3y ago

Interesting.

As a very minor point, calling free(NULL) is well-defined and safe so there is no need for the if-statement in memory_close(). This is very clearly stated in the manual page [1] for instance:

If ptr is a null pointer, no action shall occur.

[1]: https://man7.org/linux/man-pages/man3/free.3p.html

thangalin3y ago

> calling free(NULL) is well-defined and safe

Probably showing my age. SunOS 4, PalmOS, and 3BSD reputedly crashed. (There were also double free exploits back in 1996.)

This further illustrates my point, though: Removing the NULL check is a single conditional to remove, as opposed to littering the free + guard everywhere. In effect, by isolating duplicated pieces of logic, it keeps possibilities open.

patleeman3y ago· 2 in thread

I love this so much, thank you for putting this together!

My only piece of feedback would be for the "Inline Bookkeeping" section (https://samwho.dev/memory-allocation/#inline-bookkeeping), it took a while for me to grok the numbered list to figure out which block corresponded to address + X. I wonder if there is a better way to visualize the 4 numbered bullet points? Maybe just arrows and text pointing to the visualization?

Thanks again for this wonderful article!

samwho3y ago

Yes, this is one of the things I look back on as a weak point in the writing. I wanted to make this a lot more clear but by the time I'd gotten to this point in the article, I'd sort of coded myself into a corner with the visualisation code.

In another universe I'd hook in to the page scroll and highlight each block being referred to as I talk about it in the text. I'm probably not explaining that well here, but imagine something like the way this article works: https://pudding.cool/2017/05/song-repetition/.

wizzwizz43y ago

Couldn't you highlight each block being referred to as you mouse over / click on relevant text? (The latter might not be terribly hard to do with <a> and CSS :target, if you can give the blocks ids.)

1 more reply

Mizoguchi3y ago· 2 in thread

Love this. It is so important to know these fundamentals concepts. I would like to see a similar version for SQL database indexes as 99% of the engineers I work with have no idea how to use them.

asnyder3y ago

If more took the time to EXPLAIN ANALYZE they'd see indexes in action and hopefully learn from a few variations, especially pre/post variations of indexes. Or so I'd hope :).

Dudester2305183y ago

It's kind of not important though, except for a tiny group of developers.

cinntaile3y ago· 2 in thread

Can you highlight the specific malloc() calls in the interactive part? It confused me when it said malloc(5) because it looks almost exactly like malloc(4). Specifically highlighting the malloc(5) allocation would make that a bit clearer I think.

samwho3y ago

I completely agree with you on this, though I couldn't find a great way to do it. It was suggested to me to put an outline around the allocation or free that just happened, but I struggled to get the box to look good when the allocation was split across 2 lines.

I've started writing my next post and have learnt a bit more about PixiJS/GSAP3, and think I know a way to do it that would work nicely but would require changing some of the underlying code. I can't promise I'll revisit this soon, but I would like to in future.

cinntaile3y ago

I understand. It's surprisingly difficult to do anything non-trivial on the web.

duped3y ago· 2 in thread

It's somewhat incomplete without a discussion of how one actually gets the memory allocate to a program.

samwho3y ago

Could you elaborate on what you mean?

duped3y ago

It talks about how a simple malloc implementation would doll out entries in a buffer of memory and manage free lists, but not how the implementation actually gets that buffer from the operating system, or return it to the system when done (mmap, munmap, for example).

1 more reply

alberth3y ago· 2 in thread

While I do like the article, I wish it simply used multiple images and/or animated gifs (instead of javascript) for the pictorials.

It would make the site much more accessible and clear in the event you didn't realize you had to click forward.

xigoi3y ago

An animated gif would be annoying since you can't pause it to look at it carefully.

While I also hate the overuse of JavaScript, this is exactly how it's meant to be used – adding small bits of interactivity to documents.

samwho3y ago

The accessibility of this technique is something that greatly worries me. I try and be quite thoughtful about things like colour palette. I'm not sure how to achieve the level of interactivity I want while still being accessible to screen readers, without writing two completely separate articles with and without JavaScript.

Definitely open to ideas.

shepherdjerred3y ago· 2 in thread

Oh, if only I had this in college! This is a fantastic explanation.

xpuente3y ago

[1] is not very far from that. IMHO, [1] it is better.

[1] https://pages.cs.wisc.edu/~remzi/OSTEP/vm-freespace.pdf

shepherdjerred3y ago

OSTEP is what made it _finally_ click for me, but I think the online demo is great because of the interactivity. Both have their place!

markus_zhang3y ago· 2 in thread

Interesting. Is there a book that focuses on evolution of allocators so we can follow along and code allocators of different difficulties?

samwho3y ago

Good question! I’m not aware of one, but I also haven’t looked.

Most of the research for this post was done by reading papers for various malloc implementations (phkmalloc, dlmalloc, tcmalloc, mimalloc) and reading their source code.

markus_zhang3y ago

Thanks! I'll perhaps find older versions of those functions to study.

thecoppingerOP3y ago· 1 in thread

All credit goes to my wonderful friend Sam Who: https://twitter.com/samwhoo

He recently published a similar guide on load balancing that is equally intuitive and insightful: https://samwho.dev/load-balancing/

I can't wait to see what he puts out next!

terrycody3y ago

https://news.ycombinator.com/item?id=35588797

We never missed good things lol!

ftxbro3y ago· 1 in thread

> "There's no shortage of information about memory allocators on the Internet, and if you've read this far you should be well-placed to dive in to it. Join the discussion on Hacker News! https://news.ycombinator.com/item?id=36029087 "

Interesting to use hacker news as the blog's own comment section in this way.

samwho3y ago

I’ve seen a few people doing it, seems to work well.

shreyshnaccount3y ago· 1 in thread

Someone needs to make an awesome list for visual guides

ocay3y ago

Absolutely, this is so helpful

tpoacher3y ago· 1 in thread

Great webpage, but it somehow left me more confused.

It seems to imply that malloc/free works by boundary tag? Which I don't think is the case? (and if not, it leaves the reader confused as to how it then actually works).

I know "some" languages use the tag technique (e.g. julia), but the article seems to imply that this also applies to the c code above? Or am I wrong and c also makes use of such a scheme when you use pointer arithmetic?? (I don't see how that would work with arrays if that's the case though)

samwho3y ago

I'm sorry you're more confused than you were when you started!

The post is intending to talk about the topic of memory allocation in a general sense. The way that malloc works on any given system is implementation dependent, it's not possible to say "malloc works in this way" because Debian can differ from Arch can differ from BSD, etc.

It's not my goal to tell you exactly how modern malloc/free implementations work. I could probably be more clear about this at the start of the post.

RamiAwar3y ago· 1 in thread

I love this! I wish it existed back when I was writing my first memory allocators in university or when building a custom EntityComponentSystem implementation.

I'd love to also see applications of custom memory allocations. I know about usecases in building game engines and the importance of hitting cache there, but I'm not sure where else in the world this would be as useful.

erwincoumans3y ago

Robotics, embedded systems, CUDA/gpgpu, AI/Deep learning/Reinforcement Learning/LLM (PyTorch) for example.

sailorganymede3y ago· 1 in thread

This is a great article. I also recommend K&R Chapter 8 if people wanna know more. Do the exercises and it’ll just click.

iamcreasy3y ago

I want to learn more about next steps, such as virtual memory. Should Ch 8 of K&R be the next step? I started reading the chapter and I think the whole chapter is about UNIX system calls.

robjwells3y ago· 1 in thread

Love the dog. A bit like the “Cool Bear” asides in Amos’s articles (fasterthanli.me), a chance to pause & consolidate.

samwho3y ago

Don't tell Amos, but Cool Bear is the inspiration for Haskie, my little husky companion.

winrid3y ago· 1 in thread

I was having some fun recently just seeing how fast we can allocate large chunks of memory.

There's something refreshing about firing up a C (or maybe now Zig, in my case) compiler and allocating a gigabyte of memory, and seeing that your process is using exactly 1GB.

the-smug-one3y ago

> I was having some fun recently just seeing how fast we can allocate large chunks of memory.

How did you measure that? What was fastest? I'd imagine sbrk is faster than mmap, but I haven't checked. If you're on Linux, then doing a virtual memory allocation will not cause the memory to have physical representation. Did you populate the pages through looping? Or did you use MAP_POPULATE?

I saw that madvise was going to get MADV_POPULATE_(READ|WRITE) in the Linux kernel mailing lists back in 2021, but I haven't seen it on the kernels that I use (5.4, 5.15)

dexter89_kp33y ago· 1 in thread

this is awesome

samwho3y ago

You’re awesome.

OliverJones3y ago

Oh man, this brings back the days when I wrote special debug-version malloc and free code to try to track down heap corruption due to malloc / free misuse (in code I had contributed to). Stuff like kbyte-long boundary buffers with bit patterns in them, and all sorts of lookaside lists run in parallel with libc's default code. Those bug-detectors worked OK. Hard-nosed code inspection worked far better.

As an old-timer in writing code, I think my generation's most-challenging legacies (=== the things we f**ked up) are the non-robust malloc/free concept and null-terminated text strings. Bugs in code using those conventions have been exploitable to a really damaging extent. I learned to program in C from K&R. And getting data-structure code right, and safe to deploy, in that language and its derivatives is HARD.

The C inventors are (were in Dennis Ritchie's case) brilliant and Bell Labs was great. Their ideas shaped the the stuff the global internet runs on. But these parts of what thy invented ..... ouch. (All OSs had the same problems.)

I wish the wonderful article presented here carried a more prominent warning about this. Many will read it as they learn to code. The history of our profession can teach about what NOT to do as well as what to do.

Damogran63y ago

Not really related, but the feeling of elation when I alloc'd 1M of RAM in OS/2 and it _didn't_ swap changed me.

This was on a 386sx with 8M RAM and it was pretty much all the available memory after the OS was loaded and settled down.

A MILLION BYTES!!

Didn't do anything with it, but still, after DOS and EMS/XMS memory and all the other falderol of managing memory. (At the time, it was also the only x86 OS that would format a floppy drive without bringing all the other threads to a crawl. UI was still available-ish, BiModem was still BiModeming...

jesselawson3y ago

Sam, this is such a wonderful resource that you've put out into the world. Thank you for the time and care you've put into each paragraph and interactive component. You've not only given me a lot to think about in terms of my basic assumptions about memory, but also about how to write and teach better online. I'm really excited to start following you!

photochemsyn3y ago

Nice article on general-purpose memory allocation approaches. While the article doesn't explicitly discuss it, these all seem to be list-allocation mechanisms?

> "List allocators used by malloc() and free() are, by necessity, general purpose allocators that aren’t optimized for any particular memory allocation pattern... To understand, let’s examine commonly used list allocation algorithms: first-fit, next-fit, best-fit and quick-fit."

That's from an article on custom memory management (aimed at embedded programming issues) I've found pretty useful, and it picks up where this leaves off:

https://www.embedded-software-engineering.de/dynamic-memory-...

You can practice writing custom memory managers in an application that runs on an operating system by only using the stack (just create a big array of int etc. and call that your memory space):

> "For the safety-critical tasks, the developer can replace the standard allocator with a custom allocator that sets aside a buffer for the exclusive use of that task, and satisfies all memory allocation requests out of that buffer... The remainder of this paper presents four examples of custom allocator algorithms: the block allocator, stack allocator, bitmap allocator and thread-local allocator. Custom memory managers can use any one of these algorithms or a combination of different algorithms."

jxf3y ago

A great example of why I'll always read a post over watching a video.

CliffStoll3y ago

A delightful page, written in a wonderfully interactive way.

My high congratulations for creating a most friendly, readable and useful lesson!

fzeindl3y ago

Ahh yes. Back at university we had a class called efficient programming where we had to implement a Unix utility and optimize it for speed, meaning cpu cycles.

While aggressively optimizing we replaced malloc in the end with a function we called "cooloc", that traded portability and security for speed. The debug tool here would have been useful.

FrankyHollywood3y ago

This is a nice read too (the whole blog actually)

http://igoro.com/archive/volatile-keyword-in-c-memory-model-...

It's a bit old by now (2010), but I always remembered the mental model Igor created.

terrycody3y ago

Glad there are always talent people and great guide like him/this exist! Great people and guides will help tremendous learners along the way, timeless, priceless.

matheusmoreira3y ago

So good! I wish I had an article like this to learn from when I implemented my language's memory allocator a few months ago! Wonderful, thank you!

cromulent3y ago

Nice article, but highly editorialized title, not sure what is intuitive about it.

kaba03y ago

Somewhat relevant about how GCs work, with also very great visuals: https://spin.atomicobject.com/2014/09/03/visualizing-garbage...

a2573y ago

I think it will be helpful if it discusses internal fragmentation, where the payload is smaller than the allocated block. I observed that this was important in understanding the purpose of various memory allocation algorithms when undertaking the malloc lab in college.

SV_BubbleTime3y ago

I am slightly embarrassed I didn’t know overallocate was the term for that. TIL!

jackphilson3y ago

This is the future of learning. You have a skill in it, learn to monetize it. Will benefit both of us, as you are more incentivized to do it if you can make money

delta_p_delta_x3y ago

I really like the website design, above all. I'd love to pen my thoughts online, but haven't had the energy to learn static site generation...

codr73y ago

I've played around quite a bit with slab allocation in C to get my interpreters to run faster, and this post inspired me to do a quick benchmark of a design I've been iterating.

malloc/free: 207294429ns

slab: 74795526ns

A 74% reduction in runtime is pretty nice.

https://github.com/codr7/libcnabl

fmiras3y ago

This is an example of why interactive blog post are awesome, loved the illustrative sliders

KevinChen63y ago

It would have been nice if I had seen this article before refactoring a Java project into a C project, where learning memory allocation rules in order to use C was a painful process for me :(

NeuroCoder3y ago

Love the visualizations. It would be great to have `realloc` covered too but I'm not sure how much that varies by system, potentially making it too complicated for this sort of post.

syngrog663y ago

I did work focused on optimizing the latency impacts of mem allocation for a recent client of mine. Loved doing that kind of low-level systems tuning. (Linux, C, mmap and friends.)

kayo_202110303y ago

Thanks. Well written and presented.

armatav3y ago

This is insanely good

bjt2n39043y ago

The asides with the dog are great. I've seen a few blogs use this technique -- I'm not big on Greek, but I think Plato used this technique too.

It instills the idea that you should be asking a question at this point. Some information has been provided that should generate more questions in your head, if you're keeping pace.

j / k navigate · click thread line to collapse

177 comments

134 comments · 53 top-level

celeritascelery3y ago· 10 in thread

samwho3y ago

This feedback has made my day. Thank you.

spenczar53y ago

samwho3y ago

Also shout out Anton Verinov (https://anton.codes/): the only reason this web page doesn't drain your battery before you get to the end of it.

1 more reply

couchand3y ago

One more name I'dd add to this list is Mike Bostock. The care and attention he gives to data visualization examples comes through in the finished product.

Communicating complex subjects through interactive visual displays is very effective -- it's worth the effort. Thank you!

pvorb3y ago

I think it's a nice little example for "Explorable Explanations", a term coined by Bret Victor in his eponymous essay[1] from 2011.

[1]: http://worrydream.com/ExplorableExplanations/

1 more reply

ww5203y ago

The writing is very clear and the concepts are explained well. Not too much information and not too little. Excellent post.

1 more reply

naillo3y ago

Great job Sam

dmd3y ago

If you're not yet familiar with it - https://ciechanow.ski/archives/ is for you (and everyone!)

samwho3y ago

The master of this artform.

throwaway6892363y ago

Agreed, I wish I had more thing like this growing up.

bambax3y ago· 8 in thread

Excellent, excellent article! I have a question though.

> Couldn't we rearrange the memory to get a block of 6 contiguous bytes? Some sort of defragmentation process?

AshamedCaptain3y ago

Some classic examples are Windows 3.x, Mac OS (toolbox), PalmOS, etc.

https://en.wikipedia.org/wiki/Classic_Mac_OS_memory_manageme...

umanwizard3y ago

> Some kind of virtual memory driver that would map old pointers to new adresses transparently for the programs

You would need hardware support for this, since the hardware is what decides what gets returned when a program attempts to read from a memory location.

Hardware already does support virtual memory but the granularity is the page (which are a minimum of 4KiB in most OSs).

alex77343y ago

Once you have it call free() for you, your piece of code is now a compacting GC, like Java's for example.

aidenn03y ago

In languages where memory is managed for you, you can absolutely do this, since the runtime can find every single pointer to an object and rewrite it.

On top of that, it's an OS system call to map and unmap pages, which means you incur a system-call on every allocation and deallocation, which is much slower than using a user-space allocator.

samwho3y ago

There’s also the weird and wonderful work of Emery Berger et al with their “Mesh” malloc implementation, which blows my mind: https://youtu.be/XRAP3lBivYM.

cconstantine3y ago

In a language like C that isn't really possible because the language can't keep track of all of the places that memory address is stored.

eschneider3y ago

kaba03y ago

Well, that's what some GCs do, and they do indeed defragment the heap.

samsquire3y ago· 7 in thread

Thank you for this, this is helpful.

I wrote a JIT compiler and I didn't bother calling free much, I just let the operating system free up all allocated memory.

I got into this situation often:

   return_struct = do_something(mystruct);
   return_struct->inner_struct = malloc(sizeof(struct my_inner_struct));

Now, who owns inner_struct? Who is responsible for freeing it? Do I free it when I assign to it?

Memory is a logistical problem of how you arrange and allocate finite resources.

I am thinking of alternative visualizations of understanding memory, for example, I started writing an animation of binary search:

https://replit.com/@Chronological/ProgrammingRTS

The idea is that you see values and memory locations move around with the final goal being able to command memory to move around and be computed, such as real time strategy game.

I think if we could visualise memory as cars on a road, we would see obvious traffic jams.

secondcoming3y ago

> Now, who owns inner_struct?

return_struct does since it is the only thing that knows the address.

> Who is responsible for freeing it?

return_struct is, unless you hand that responsibility over to something else.

> Do I free it when I assign to it?

Yes, unless you want leaks.

> I think if we could visualise memory as cars on a road, we would see obvious traffic jams.

That visualisation is helpful for threads, where the program is the road/map and the cars are the threads. I don't see how it's useful for memory.

jacobsenscott3y ago

1 more reply

dahart3y ago

> I feel I am at the beginning of intuitively understanding what memory really is: memory is just a huge contiguous region of numbered locations.

There might be an analogy here that could help you reason about your nested structure allocations…

samwho3y ago

Yeah, I hear you. I've not done a lot of FFI stuff directly, it scares me.

You may enjoy looking through https://www.cs.usfca.edu/~galles/visualization/Algorithms.ht....

samsquire3y ago

Thanks for the link to the animations.

I want an extremely performant deep copy solution, I've been thinking of using an allocator to implement it.

So I want to be able to memcpy a complicated data structure for cheap copies.

duped3y ago

> You have to be extra careful when memory is managed by a different language runtime.

The alternative is to be callee managed, and for the callee to return things by handle and not necessarily by pointer, but that is also fraught.

stevefan19993y ago

That's exactly why you should use reference counting

alex77343y ago· 5 in thread

> When we free memory, we should make sure that if the block we return to the free list is next to any other free blocks, we combine them together. This is called "coalescing."

A little offtopic but the default Delphi 7 memory allocator did this, except that it also merged blocks that it obtained from different OS allocation calls.

In the end we had to replace the allocator to avoid that issue. That was a fun day I spent debugging that.

meekaaku3y ago

You mean a bug this deep took one day to debug?

alex77343y ago

Having a way to trigger it (by restoring the snapshot) on demand helped a lot, otherwise it would have taken forever to make sense of it as it could sit without crashing for nearly an hour.

2 more replies

ilyt3y ago

I think the far weirder part of this was the kernel-side handling of scrollbars

alex77343y ago

Here's a 2003 forum post from someone else having the same problem: http://www.delphigroups.info/2/1/749064.html

derefr3y ago

But, of course, Windows claimed to be a multitasking OS. But it actually wasn't! And I don't mean the obvious thing about it not having pre-emption. Lots of multitasking OSes didn't have pre-emption.

---

3 more replies

ModernMech3y ago· 4 in thread

This is wonderful! I'm definitely going to be sharing this with my students (college sophomores studying CS).

If I were to make some suggestions, based on how I know they would receive this:

Anyway, thank you for creating this, and I'm sure it will really help my students. In a just world, "seeing" the memory like this would ideally be first-class tooling for languages like C.

samwho3y ago

Really appreciate you taking the time to write this, thank you.

Introducing the 0x prefix and immediately dropping it hurt me as well, but I didn't have a better way to make the visualisation work on mobile. I completely agree with you that it's not ideal.

I'd like to do a post of this style about garbage collection at some point.

ModernMech3y ago

Would definitely like to see more thoughts from those cute corgis.

1 more reply

wrycoder3y ago

Please do, this excellent post is already a good start on the issues involved in creating compacting garbage collectors.

xigoi3y ago

Is it really necessary to use hexadecimal at all in the article? Decimal works just fine for pointers, it's just not conventionally used.

1 more reply

davidgrenier3y ago· 4 in thread

The only thing that confused me is how it said we can know the location of the block after and before by calculating:

    address + <value at address>
    address - <value at address-1>

Shouldn't this be?

    address + <value at address> + 3
    address - <value at address-1> - 3

samwho3y ago

Well shit. I think you're right.

davidgrenier3y ago

Oh another thing, I'm not a fan of the premise:

"As a general-purpose memory allocator, though, we can't get away with having no free implementation."

If you could also educate me on why this is a bad idea I would appreciate.

10 more replies

davidgrenier3y ago

Well otherwise, I learned a lot and the basics are much simpler than I expected, thank you for the article.

ho_schi3y ago

This also trapped me.

junon3y ago· 4 in thread

Seems to be a bug on the first interactive graph, at least for me. Unless I'm misunderstanding the point of the graph, `malloc(7)` only allocates 2 bytes.

ozfive3y ago

I came here to see if anyone else noticed this and am confirming that there is a bug in the first slider on malloc(7). Indeed it only allocates two bytes instead of seven.

samwho3y ago

Good spot! Thank you. Fix on its way out now. :)

1 more reply

carlmr3y ago

True, it might be cut-off from screen?

samwho3y ago

Nah, I just failed at basic arithmetic :D

I wrote this:

  <div class="memory" bytes="32">
    <malloc size="4" addr="0x0"></malloc>
    <malloc size="5" addr="0x4"></malloc>
    <malloc size="6" addr="0x9"></malloc>
    <malloc size="7" addr="0xa"></malloc>
    <free addr="0x0"></free>
    <free addr="0x4"></free>
    <free addr="0x9"></free>
    <free addr="0xa"></free>
  </div>

Instead of this:

  <div class="memory" bytes="32">
    <malloc size="4" addr="0x0"></malloc>
    <malloc size="5" addr="0x4"></malloc>
    <malloc size="6" addr="0x9"></malloc>
    <malloc size="7" addr="0xf"></malloc>
    <free addr="0x0"></free>
    <free addr="0x4"></free>
    <free addr="0x9"></free>
    <free addr="0xf"></free>
  </div>

devit3y ago· 3 in thread

samwho3y ago

You're absolutely right, I've corrected this. Thank you!

sylware3y ago

linux had a bitmap based "buddy allocator" (power of two), now it is not bitmap based anymore (complexity not worth it anymore, performance wise, namely simplicty was restored).

Then linux has various slabs(slub/slob/slab), built on top of the "buddy allocator".

ogurechny3y ago

In this case, one should start with implementing -Xmx switch, then gradually adding the rest.

spatter3y ago· 3 in thread

> Others will return what's called a "null pointer", a special pointer that will crash your program if you try to read or write the memory it points to.

This is not strictly true, it depends on the environment you're using. Some older operating systems and some modern embedded systems have memory mapped at the zero address.

klabb33y ago

If we’re gonna be pedantic, isn’t memory mapping itself an optional property? Ie you can have a pointer to 0x00..00 refer to the actual physical bytes at that address, without any mapping, no?

ogurechny3y ago

This ugly hack (using the same object to hold the address value that can be operated on, and its own validity information) might be the most well known.

Measter3y ago

You may be thinking of a different kind of memory mapping. I believe spatter is thinking of memory-mapped hardware.

jamesgill3y ago· 3 in thread

This is incredibly well done. I've never seen malloc/memory allocation explained so clearly. I'd buy a book written like this.

xigoi3y ago

Make sure to read articles by Ciechanowski.

https://ciechanow.ski/archives/

samwho3y ago

I have some bad news for you about books.

imtany3y ago

Could you elaborate?

2 more replies

junon3y ago· 3 in thread

This is really, really well done. Also, the allocator playground[0] is really cool. Will be my go-to when teaching this topic moving forward :)

[0] https://samwho.dev/allocator-playground/

samwho3y ago

Thanks so much, I really appreciate it.

junon3y ago

Just friends and colleagues, nothing formal. I wish you the best of luck with it, though!

abudabi1233y ago

N-Krause3y ago· 2 in thread

It even has a playground! https://samwho.dev/allocator-playground/

How I wish I had something like that when I first learned C.

samwho3y ago

The playground was the inspiration for the post. I always wanted to be able to fiddle with malloc implementations and see how they work in this way.

Admittedly the playground is hard to understand at first, and a touch janky in places. But the workloads are taken from for-real malloc/free traces on programs!

N-Krause3y ago

When you think that I (and probably the vast majority of developers) used a pen and a paper for the first few years every time I tried to visualize more complex memory, then that's a big upgrade.

Especially because you can scroll through all the steps.

thangalin3y ago· 2 in thread

When writing C, I tend to avoid calling malloc and free directly.

* https://github.com/DaveJarvis/mandelbrot/blob/master/memory....

I then apply this same principle of "opening" and "closing" structures throughout the application. Generally, I can quickly verify that the calls are balanced:

* https://github.com/DaveJarvis/mandelbrot/blob/master/threads...

* https://github.com/DaveJarvis/mandelbrot/blob/master/image.c

The main application opens then closes structures in the reverse order:

* https://github.com/DaveJarvis/mandelbrot/blob/master/main.c

This means there's only one call to malloc and one call to free throughout the entire application (third-party libraries notwithstanding), allowing them to be swapped out with ease.

Aside, logging can follow this same idea by restricting where any text is written to the console to a single location in the code base:

* https://github.com/DaveJarvis/mandelbrot/blob/master/logging...

unwind3y ago

Interesting.

As a very minor point, calling free(NULL) is well-defined and safe so there is no need for the if-statement in memory_close(). This is very clearly stated in the manual page [1] for instance:

If ptr is a null pointer, no action shall occur.

[1]: https://man7.org/linux/man-pages/man3/free.3p.html

thangalin3y ago

> calling free(NULL) is well-defined and safe

Probably showing my age. SunOS 4, PalmOS, and 3BSD reputedly crashed. (There were also double free exploits back in 1996.)

patleeman3y ago· 2 in thread

I love this so much, thank you for putting this together!

Thanks again for this wonderful article!

samwho3y ago

wizzwizz43y ago

1 more reply

Mizoguchi3y ago· 2 in thread

Love this. It is so important to know these fundamentals concepts. I would like to see a similar version for SQL database indexes as 99% of the engineers I work with have no idea how to use them.

asnyder3y ago

If more took the time to EXPLAIN ANALYZE they'd see indexes in action and hopefully learn from a few variations, especially pre/post variations of indexes. Or so I'd hope :).

Dudester2305183y ago

It's kind of not important though, except for a tiny group of developers.

cinntaile3y ago· 2 in thread

samwho3y ago

cinntaile3y ago

I understand. It's surprisingly difficult to do anything non-trivial on the web.

duped3y ago· 2 in thread

It's somewhat incomplete without a discussion of how one actually gets the memory allocate to a program.

samwho3y ago

Could you elaborate on what you mean?

duped3y ago

1 more reply

alberth3y ago· 2 in thread

While I do like the article, I wish it simply used multiple images and/or animated gifs (instead of javascript) for the pictorials.

It would make the site much more accessible and clear in the event you didn't realize you had to click forward.

xigoi3y ago

An animated gif would be annoying since you can't pause it to look at it carefully.

While I also hate the overuse of JavaScript, this is exactly how it's meant to be used – adding small bits of interactivity to documents.

samwho3y ago

Definitely open to ideas.

shepherdjerred3y ago· 2 in thread

Oh, if only I had this in college! This is a fantastic explanation.

xpuente3y ago

[1] is not very far from that. IMHO, [1] it is better.

[1] https://pages.cs.wisc.edu/~remzi/OSTEP/vm-freespace.pdf

shepherdjerred3y ago

OSTEP is what made it _finally_ click for me, but I think the online demo is great because of the interactivity. Both have their place!

markus_zhang3y ago· 2 in thread

Interesting. Is there a book that focuses on evolution of allocators so we can follow along and code allocators of different difficulties?

samwho3y ago

Good question! I’m not aware of one, but I also haven’t looked.

Most of the research for this post was done by reading papers for various malloc implementations (phkmalloc, dlmalloc, tcmalloc, mimalloc) and reading their source code.

markus_zhang3y ago

Thanks! I'll perhaps find older versions of those functions to study.

thecoppingerOP3y ago· 1 in thread

All credit goes to my wonderful friend Sam Who: https://twitter.com/samwhoo

He recently published a similar guide on load balancing that is equally intuitive and insightful: https://samwho.dev/load-balancing/

I can't wait to see what he puts out next!

terrycody3y ago

https://news.ycombinator.com/item?id=35588797

We never missed good things lol!

ftxbro3y ago· 1 in thread

Interesting to use hacker news as the blog's own comment section in this way.

samwho3y ago

I’ve seen a few people doing it, seems to work well.

shreyshnaccount3y ago· 1 in thread

Someone needs to make an awesome list for visual guides

ocay3y ago

Absolutely, this is so helpful

tpoacher3y ago· 1 in thread

Great webpage, but it somehow left me more confused.

It seems to imply that malloc/free works by boundary tag? Which I don't think is the case? (and if not, it leaves the reader confused as to how it then actually works).

samwho3y ago

I'm sorry you're more confused than you were when you started!

It's not my goal to tell you exactly how modern malloc/free implementations work. I could probably be more clear about this at the start of the post.

RamiAwar3y ago· 1 in thread

I love this! I wish it existed back when I was writing my first memory allocators in university or when building a custom EntityComponentSystem implementation.

erwincoumans3y ago

Robotics, embedded systems, CUDA/gpgpu, AI/Deep learning/Reinforcement Learning/LLM (PyTorch) for example.

sailorganymede3y ago· 1 in thread

This is a great article. I also recommend K&R Chapter 8 if people wanna know more. Do the exercises and it’ll just click.

iamcreasy3y ago

I want to learn more about next steps, such as virtual memory. Should Ch 8 of K&R be the next step? I started reading the chapter and I think the whole chapter is about UNIX system calls.

robjwells3y ago· 1 in thread

Love the dog. A bit like the “Cool Bear” asides in Amos’s articles (fasterthanli.me), a chance to pause & consolidate.

samwho3y ago

Don't tell Amos, but Cool Bear is the inspiration for Haskie, my little husky companion.

winrid3y ago· 1 in thread

I was having some fun recently just seeing how fast we can allocate large chunks of memory.

There's something refreshing about firing up a C (or maybe now Zig, in my case) compiler and allocating a gigabyte of memory, and seeing that your process is using exactly 1GB.

the-smug-one3y ago

> I was having some fun recently just seeing how fast we can allocate large chunks of memory.

I saw that madvise was going to get MADV_POPULATE_(READ|WRITE) in the Linux kernel mailing lists back in 2021, but I haven't seen it on the kernels that I use (5.4, 5.15)

dexter89_kp33y ago· 1 in thread

this is awesome

samwho3y ago

You’re awesome.

OliverJones3y ago

Damogran63y ago

Not really related, but the feeling of elation when I alloc'd 1M of RAM in OS/2 and it _didn't_ swap changed me.

This was on a 386sx with 8M RAM and it was pretty much all the available memory after the OS was loaded and settled down.

A MILLION BYTES!!

jesselawson3y ago

photochemsyn3y ago

Nice article on general-purpose memory allocation approaches. While the article doesn't explicitly discuss it, these all seem to be list-allocation mechanisms?

That's from an article on custom memory management (aimed at embedded programming issues) I've found pretty useful, and it picks up where this leaves off:

https://www.embedded-software-engineering.de/dynamic-memory-...

You can practice writing custom memory managers in an application that runs on an operating system by only using the stack (just create a big array of int etc. and call that your memory space):

jxf3y ago

A great example of why I'll always read a post over watching a video.

CliffStoll3y ago

A delightful page, written in a wonderfully interactive way.

My high congratulations for creating a most friendly, readable and useful lesson!

fzeindl3y ago

Ahh yes. Back at university we had a class called efficient programming where we had to implement a Unix utility and optimize it for speed, meaning cpu cycles.

While aggressively optimizing we replaced malloc in the end with a function we called "cooloc", that traded portability and security for speed. The debug tool here would have been useful.

FrankyHollywood3y ago

This is a nice read too (the whole blog actually)

http://igoro.com/archive/volatile-keyword-in-c-memory-model-...

It's a bit old by now (2010), but I always remembered the mental model Igor created.

terrycody3y ago

Glad there are always talent people and great guide like him/this exist! Great people and guides will help tremendous learners along the way, timeless, priceless.

matheusmoreira3y ago

So good! I wish I had an article like this to learn from when I implemented my language's memory allocator a few months ago! Wonderful, thank you!

cromulent3y ago

Nice article, but highly editorialized title, not sure what is intuitive about it.

kaba03y ago

Somewhat relevant about how GCs work, with also very great visuals: https://spin.atomicobject.com/2014/09/03/visualizing-garbage...

a2573y ago

SV_BubbleTime3y ago

I am slightly embarrassed I didn’t know overallocate was the term for that. TIL!

jackphilson3y ago

This is the future of learning. You have a skill in it, learn to monetize it. Will benefit both of us, as you are more incentivized to do it if you can make money

delta_p_delta_x3y ago

I really like the website design, above all. I'd love to pen my thoughts online, but haven't had the energy to learn static site generation...

codr73y ago

I've played around quite a bit with slab allocation in C to get my interpreters to run faster, and this post inspired me to do a quick benchmark of a design I've been iterating.

malloc/free: 207294429ns

slab: 74795526ns

A 74% reduction in runtime is pretty nice.

https://github.com/codr7/libcnabl

fmiras3y ago

This is an example of why interactive blog post are awesome, loved the illustrative sliders

KevinChen63y ago

It would have been nice if I had seen this article before refactoring a Java project into a C project, where learning memory allocation rules in order to use C was a painful process for me :(

NeuroCoder3y ago

Love the visualizations. It would be great to have `realloc` covered too but I'm not sure how much that varies by system, potentially making it too complicated for this sort of post.

syngrog663y ago

I did work focused on optimizing the latency impacts of mem allocation for a recent client of mine. Loved doing that kind of low-level systems tuning. (Linux, C, mmap and friends.)

kayo_202110303y ago

Thanks. Well written and presented.

armatav3y ago

This is insanely good

bjt2n39043y ago

The asides with the dog are great. I've seen a few blogs use this technique -- I'm not big on Greek, but I think Plato used this technique too.

It instills the idea that you should be asking a question at this point. Some information has been provided that should generate more questions in your head, if you're keeping pace.

j / k navigate · click thread line to collapse