Safe-Linking – Eliminating a 20 year-old malloc() exploit primitive (opens in new tab)

(research.checkpoint.com)

115 pointseyalitki6y ago28 comments

28 comments

22 comments · 6 top-level

phkamp6y ago· 5 in thread

FreeBSD solved that problem 22 years ago:

https://papers.freebsd.org/1998/phk-malloc/

(And got better malloc(3) performance at the same time.)

kazinator6y ago

Nobody has solved "that problem".

No malloc has absolute detection of all corruption and even tools like Valgrind don't catch everything.

Let's not pretend that a few pointer hacks and checks have solved a problem.

phkamp6y ago

The particular problem they boast about having solved,does simply not exist in the malloc(3) I wrote 23 years ago, because it does not have the silly linked list to begin with.

In addition to all the security problems inherent in mingling metadata directly next to user-data, the linked list gives O(Nalloc) performance on free(3) and realloc(3) whereas my malloc had O(1) performance.

That may not seem like a lot, but when the 'O' is a page-in from disk, and Nalloc is from C++ code, it is nothing to sneeze at.

Not only did that make my malloc faster, but it would, not "could", but "would" detect several classes of malloc-usage errors, double-free etc, unconditionally.

Over the next 10-ish years, that practically eliminated entire classes of malloc-mistakes, making a lot of FOSS software safer, no matter which malloc you used it with.

Given the definition of the malloc(3) family API, there is no way you can detect all corruption without hardware support, but there are people working on that too, notably Robert Watsons CHERI project at Cambridge.

So yeah, nice pointer arithmetic, but how about people solved the real problem instead ?

(On big memory and multi-core systems use jemalloc, on small memory and single-core systems use phkmalloc.)

1 more reply

eyalitkiOP6y ago

Too sad we had to wait 22 years to a similar change to find its way to the more widely used glibc / uclibc(-NG)

chc46y ago

Windows also XORs chunk metadata with a random secret from PEB, at least on some OSes.

jboschpons6y ago

Good!

jfindley6y ago· 5 in thread

I was initially surprised at their benchmark results, until I read more closely and noticed that they were benchmarking on a cloud VM. I wish people wouldn't do this for this form of test - it adds so many variables it's hard to take their benchmark too seriously.

eyalitkiOP6y ago

We executed the benchmark suite multiple times, and results were consistent. It was way more stable than the same execution on a PC with multiple programs. Anyway, the maintainers of glibc performed the same checks and confirmed our measurements.

kees996y ago

Thumbs up with reaching out to upstream to mainline it.

Did you talk to musl team as well?

1 more reply

the_duke6y ago

In my experience, cloud VMs are often a more consistent benchmarking platform than your personal machine.

A desktop environment and consumer-grade hardware add a lot of noise.

Ideally you'd always benchmark on completely idle dedicated servers, but those are rarely available.

devit6y ago

You can just run it on your local machine and dedicate some of the cores to it with a tool like "cset shield".

If you _really_ want to avoid any contention, then build the benchmark as a Linux kernel module and call stop_machine(), which will prevent anything else including interrupt handlers from running.

arielb16y ago

Not that surprising: CPUs can perform scalar in parallel with computation, so I would expect a few scalar ops to be free in any code that is not scalar-bound.

kazinator6y ago· 3 in thread

This is a crude, unreliable debugging aid being trotted out as a security enhancement.

saagarjha6y ago

That being said, everyone attacks the tcache because it’s so poorly protected. It clearly needs something better.

ghostpepper6y ago

So you'd be opposed to merging this change upstream?

kazinator6y ago

Yes; it wastes machine cycles.

saurik6y ago· 2 in thread

Blocking 15 out of 16 exploit attempts sounds great, but you have to ask yourself how many shots you get, and since most people don't think "this program crashed: I should never run it ever again", this isn't anywhere near as helpful as it seems. If you crash my webserver, I spawn a new one. You might even already have 16 webservers running right now! If the user clicks a link and their tab crashes, they might hit reload a few times before giving up. If you receive a text message or an email that crashes a parser on your phone, the retry might seriously happen in a tight loop. Like, I hear about this sort of thing being a good idea constantly, but I have never once heard of it actually blocking an attack against any "normal" target (I say "normal", as some super hardened military target might be configured to go into complete lockdown immediately upon a single unexpected event, with a protocol to call in a forensics expert to analyze logs to see if it is safe to restart the service before ending the outage).

eyalitkiOP6y ago

We can detect corruption in 15/16 times. In our tests, when we didn't detect a corruption, the program still crashed, but without a proper error message. One should remember that the pointer is masked, so an attacker without the secret mask won't be able to control the reveal()ed pointer. And using a garbled pointer won't give the attacker anything useful, and most probably it will crash the target program.

mywittyname6y ago

Are you saying the technique is effective at preventing a successful attack nearly every time, but that pointer modification are detected (thus logged) 15 out of 16 times?

1 more reply

floatboth6y ago· 1 in thread

Does this apply to jemalloc?

eyalitkiOP6y ago

While ptmalloc/dlmalloc/tcmalloc use a single-linked list meta-data that is stored adjacent to the user buffers, in jemalloc there is no such meta-data. jemalloc's design stores the sensitive meta-data separately, thus removing the need to add a dedicated protection layer for it.

daanx6y ago

Interesting article -- thanks! I see they used the `key ^ P` encoding where the `key` is `L >> PAGESHIFT` to use the ASLR randomized bits from the free list position `L`.

In [mimalloc](https://github.com/microsoft/mimalloc) we use a similar strategy to protect the free list in secure mode. However, there are some weaknesses associated with using a plain _xor_ -- if the attacker can guess `P` that reveals the key immediately (`P^key^P == key`) or even if there is a read overflow, and the attacker can read multiple encodings, they can xor with each other, say `(P1^key)^(P2^key)` and then we have`(P1^P2)` which may reveal information about the pointers (like alignment).

What we use in _mimalloc_ instead is two keys `key1` and `key2` and encode as `((P^key2)<<<key1)+key1`. Since these operations are not associative, the above approaches do not work so well any more even if the `P` can be guesstimated. For example, for the read case we can subtract two entries to discard the `+key1` term, but that leads to `((P1^key2)<<<key1) - ((P2^key2)<<<key1)` at best. (We include the left-rotation since xor and addition are otherwise linear in the lowest bit).

Just some thoughts. Of course, this may be too much for the use-case. However, we found just little extra overhead for the extra operations (as most programs are dominated by memory access) so it may be of benefit.

j / k navigate · click thread line to collapse

28 comments

22 comments · 6 top-level

phkamp6y ago· 5 in thread

FreeBSD solved that problem 22 years ago:

https://papers.freebsd.org/1998/phk-malloc/

(And got better malloc(3) performance at the same time.)

kazinator6y ago

Nobody has solved "that problem".

No malloc has absolute detection of all corruption and even tools like Valgrind don't catch everything.

Let's not pretend that a few pointer hacks and checks have solved a problem.

phkamp6y ago

The particular problem they boast about having solved,does simply not exist in the malloc(3) I wrote 23 years ago, because it does not have the silly linked list to begin with.

That may not seem like a lot, but when the 'O' is a page-in from disk, and Nalloc is from C++ code, it is nothing to sneeze at.

Not only did that make my malloc faster, but it would, not "could", but "would" detect several classes of malloc-usage errors, double-free etc, unconditionally.

Over the next 10-ish years, that practically eliminated entire classes of malloc-mistakes, making a lot of FOSS software safer, no matter which malloc you used it with.

So yeah, nice pointer arithmetic, but how about people solved the real problem instead ?

(On big memory and multi-core systems use jemalloc, on small memory and single-core systems use phkmalloc.)

1 more reply

eyalitkiOP6y ago

Too sad we had to wait 22 years to a similar change to find its way to the more widely used glibc / uclibc(-NG)

chc46y ago

Windows also XORs chunk metadata with a random secret from PEB, at least on some OSes.

jboschpons6y ago

Good!

jfindley6y ago· 5 in thread

eyalitkiOP6y ago

kees996y ago

Thumbs up with reaching out to upstream to mainline it.

Did you talk to musl team as well?

1 more reply

the_duke6y ago

In my experience, cloud VMs are often a more consistent benchmarking platform than your personal machine.

A desktop environment and consumer-grade hardware add a lot of noise.

Ideally you'd always benchmark on completely idle dedicated servers, but those are rarely available.

devit6y ago

You can just run it on your local machine and dedicate some of the cores to it with a tool like "cset shield".

If you _really_ want to avoid any contention, then build the benchmark as a Linux kernel module and call stop_machine(), which will prevent anything else including interrupt handlers from running.

arielb16y ago

Not that surprising: CPUs can perform scalar in parallel with computation, so I would expect a few scalar ops to be free in any code that is not scalar-bound.

kazinator6y ago· 3 in thread

This is a crude, unreliable debugging aid being trotted out as a security enhancement.

saagarjha6y ago

That being said, everyone attacks the tcache because it’s so poorly protected. It clearly needs something better.

ghostpepper6y ago

So you'd be opposed to merging this change upstream?

kazinator6y ago

Yes; it wastes machine cycles.

saurik6y ago· 2 in thread

eyalitkiOP6y ago

mywittyname6y ago

Are you saying the technique is effective at preventing a successful attack nearly every time, but that pointer modification are detected (thus logged) 15 out of 16 times?

1 more reply

floatboth6y ago· 1 in thread

Does this apply to jemalloc?

eyalitkiOP6y ago

daanx6y ago

Interesting article -- thanks! I see they used the `key ^ P` encoding where the `key` is `L >> PAGESHIFT` to use the ASLR randomized bits from the free list position `L`.

j / k navigate · click thread line to collapse