CVE-2026-31431: Copy Fail vs. rootless containers (opens in new tab)

(dragonsreach.it)

165 pointsaveri5d ago89 comments

89 comments

I think it was a bad idea to put cryptographic APIs or VPN in the kernel. If userspace is too slow for this, you should either reduce context switch overhead, or create special kind of processes, which are isolated, but quick to switch into. They are repeating Windows mistakes.

cormorant5d ago

It's not faster than userspace, it's much slower normally. On special boards with crypto accelerators it can be faster, and there can be compliance reasons to want it. References: [1] https://www.chronox.de/libkcapi/html/ch01s02.html [2] https://lwn.net/Articles/410763/ [3] https://trac.gateworks.com/wiki/linux/encryption#PerformaceC...

cpach5d ago

Well at least if it’s crufty stuff like AF_ALG that barely no-one is using and is kind of a forgotten place of the kernel.

I don’t oppose reasonable crypto in the kernel, like WireGuard.

cluckindan4d ago

>barely no-one is using

Except, you know, many things

cpach4d ago

Many? No, I don’t agree.

nwallin4d ago

I like the idea of keeping stuff out of the kernel as much as possible, but in this case, there are good reasons why cryptography has to live in the kernel.

We need on disk encryption, and we need to be able boot from an encrypted disk. So we need encryption for that.

We need network filesystems, and we need the traffic over the network to be encrypted. So we need encryption.

IPsec, for better or for worse, is authenticated and partially encrypted at the transport layer, so if we want a linux machine to speak IPsec, we need encryption.

Fixing/changing this would require a huge restructuring of the kernel; it would basically require switching to a microkernel. Given the fact that nobody's ever written a microkernel that doesn't completely suck ass, I don't know that it would be worth the effort.

ranger_danger4d ago

What about having a way to run the same crypto code but in userspace? Or perhaps turn it into a library that can be used from userspace.

Anonbrit3d ago

For encrypted disks, you've now got high-performance data shuffling between userspace and kernel space - a massive new attack surface

cpach4d ago

Sure. But it would probably still be a good thing if the kernel maintainers could tear out AF_ALG.

ohnei5d ago

I don't think it was a bad idea, doing any idea requires an investment and a better investment would have been kernel layer, just ask the history of export control law what the US feared breaking more. Having security in userland means attacks in kernel or in userland are worthwhile against it. In the kernel it could have been secured better than OpenSSL was with less resources and could have had keys unavailable from userland. Instead it got basically no uptake as everyone hobbled along on slightly more resources spread even thinner on OpenSSL clones.

pjmlp4d ago

Those Windows mistakes have been sorted out for a long time now.

bawolff5d ago

It sounds like they are saying the exploit works but the proof-of-concept doesn't due to superficial reasons(?) That hardly seems like something to brag about.

raddan5d ago

It’s not exactly superficial. It’s defense in depth: make sure that root inside a container is not root outside a container. There is also some good discussion about how the elevated user has access to page caches which can be dangerous when containers share pages (which is common). An attack “not working” for some seemingly trivial structural reason is a common trait of defense in depth. We would all love it if attacks like this were impossible, but absent some evidence of impossibility, why not hedge a little?

bawolff4d ago

> make sure that root inside a container is not root outside a container.

And its a great idea in general, it just doesn't stop this exploit.

The proof of concept becomes root as a quick way to prove it has control of your computer. The system in the article isnt blocking the exploit its just blocking the mechanism to prove it worked. It still worked, just the test to verify is now giving a false negative.

Good defense in depth disables neccesary steps that by themselves arent sufficient but are a neccesary condition. In the context of this exploit (but not in general) this mitigation is more like renaming the su command to mysu and hoping nobody notices.

angry_octet4d ago

They seem to be in a weird state of denial? Why don't they make it clear that it's just this POC that is blocked? It's like they don't understand.

amluto5d ago

Sigh.

1. I would hope the default seccomp policy blocks AF_ALG in these containers. I bet it doesn’t. Oh well.

2. The write-to-RO-page-cache primitive STILL WORKED! It’s just that the particular exploit used had no meaningful effect in the already-root-in-a-container context. If you think you are safe, you’re probably wrong. All you need to make a new exploit is an fd representing something that you aren’t supposed to be able to write. This likely includes CoW things where you are supposed to be able to write after CoW but you aren’t supposed to be able to write to the source.

So:

- Are you using these containers with a common image or even a common layer in an image to isolate dangerous workloads from each other. Oops, they can modify the image layers and corrupt each other. There goes any sort of cross-tenant isolation.

- What if you get an fd backed by the zero page and write to it? This can’t result in anything that the administrator would approve of.

- What if you ro-bind-mount something in? It’s not ro any more.

jeroenhd5d ago

> I would hope the default seccomp policy blocks AF_ALG in these containers. I bet it doesn’t. Oh well.

I see a lot of projects blocking those sockets in containers as a response to this exploit, but it seems rather strange to me. We're disabling a cryptographic performance enhancement feature entirely because there was a security bug in them that one time? It's a rather weird default to use. It's not like we're mass-disabling kernel modules everywhere every time someone discovers an EoP bug, do we? Did we blacklist OpenSSL's binaries after Heartbleed?

I suppose it makes sense as a default on vulnerable kernels (though people running vulnerable kernels should put effort into patching rather than workarounds in my opinion), but these defaults are going to be around ten years from now when copy.fail is a distant memory.

throw0101a5d ago

> We're disabling a cryptographic performance enhancement feature entirely because there was a security bug in them that one time? It's a rather weird default to use.

The need for this feature/functionality in the fist place is questioned by some:

> As someone who works on the Linux kernel's cryptography code, the regularly occurring AF_ALG exploits are really frustrating. AF_ALG, which was added to the kernel many years ago without sufficient review, should not exist. It's very complex, and it exposes a massive attack surface to unprivileged userspace programs. And it's almost completely unnecessary, as userspace already has its own cryptography code to use. The kernel's cryptography code is just for in-kernel users (for example, dm-crypt).

> The algorithm being used in this [specific] exploit, "authencesn", is even an IPsec implementation detail, which never should have been exposed to userspace as a general-purpose en/decryption API. […]

* https://news.ycombinator.com/item?id=47952181#unv_47956312

e12e5d ago

In fairness, after heartbleed - there was quite a push to move away from openSSL - like Google's boring ssl, openbsd libressl and Mozilla/nss or gnutls - but the alternative here would be moving to a different kernel, like freebsd or open Solaris/Illumos ...

PunchyHamster5d ago

that's just moving to kernel that had 1000x less eyes on it. Yeah sure it will have less exploits but purely because nobody bothers to look when there are much juicer targets on Linux.

But I am disappointed that we still don't have clear OpenSSL successor, there is nothing to be salvaged from this mess of a project

1 more reply

nubinetwork5d ago

> We're disabling a cryptographic performance enhancement feature entirely because there was a security bug in them that one time?

To my knowledge, not many things were using the in-kernel code anyways, the recommended way is to use userland tools...

It's optional for openssl, systemd apparently needs it, but deleting the module from one of my systems didn't cause any issues. /shrug

PunchyHamster5d ago

I haven't had it loaded on 100s of servers ranging kernel version from 5.10 to 6.14. The use is just that low

Retr0id5d ago

iiuc the AF_ALG interface only offers real performance wins if you have specialized hardware that the kernel can offload computations to. If you're not using that hardware, there's little reason not to do the crypto in userspace.

hlieberman5d ago

In fact, the authors specifically say on the very first line of their website that the copy/fail primitive can be used as a container escape. The entire premise of this article is flawed and irresponsible.

eqvinox5d ago

AIUI they haven't shown a container escape and are just claiming it so far. Or did I miss something?

fguerraz5d ago

I just contributed this [1] which does what you want for seccomp. Well, not by default, but profiling is now effective against this attack.

Oh, an this [2] just happened

[1] https://github.com/containers/oci-seccomp-bpf-hook/pull/209 [2] https://github.com/moby/moby/pull/52501

PunchyHamster5d ago

> I would hope the default seccomp policy blocks AF_ALG in these containers. I bet it doesn’t. Oh well.

there is no reason it would be default policy. Else might as well block every socket and just multiplex everything on stdin/out

staticassertion5d ago

The reason is that it's very rarely used and has a history of issues.

cduzz5d ago

I'd have guessed that the default paranoia-first policy would be "drop everything; verify what you need" which would include AF_ALG.

share and enjoy!

SV_BubbleTime5d ago

>might as well block every socket and just multiplex everything on stdin/out

You may be on to something…

dwroberts5d ago

There is an addendum at the bottom where they admit the page corruption is still problematic even with rootless podman.

Although using this to justify their migration to micro-VMs is very strange to me. Sure for this CVE it would have been better, but surely for a future attack it could hit a component shared across VMs but not containers? Are people really choosing technology based on CVE-of-the-week?

staticassertion5d ago

These sorts of vulns are extremely common on Linux. This one is making the rounds for various reasons but it's a good justification for a migration away from containers if your threat model is concerned about it.

MicroVMs have much lower attack surface and you can even toss a container into one if you'd like.

Or use gvisor, which mitigates this vulnerability.

anygivnthursday5d ago

Containers were never a security boundary. VMs have better isolation, which is why people choose them for security. Containers are convenience and usually have better performance.

dwroberts5d ago

I see the ‘not a security boundary’ thing repeated constantly, and while it makes sense (eg. they’re sharing the underlying kernel or at least some access to it) if you think about it a little more, VMs are not magically different: they are better isolated, but VMs on the same host still share the host in common. A CVE next week that allows corruption of host state that affects eg every VM under a particular hypervisor will be no less damaging than this CVE is to containers

2 more replies

graemep5d ago

They may not provide isolation as VMs but they clearly do limit some attacks. VMs do not provide the same isolation as using physically separate hardware either.

I would have thought they provide better isolation than using multiple users which is the traditional security boundary.

It might depends on what you mean by a container? Are sandboxes such as Bubblewrap and Firejail containers?

ButlerianJihad5d ago

Containers are a convenience boundary and they increase complexity of your risk assessments.

It is easy for security scanners to scan a Linux system, but will they inspect your containers, and snaps, and flatpaks, and VMs? It is easy for DevOps to ssh into your Linux server, but can they also get logged in to each container, and do useful things? Your patches and all dependencies are up-to-date on your server, but those containers are still dragging around legacy dependencies, by design. Is your backup system aware of containers and capable of creating backup images or files, that are suitable for restoring back to service?

1 more reply

raesene95d ago

I've not looked for podman but moby/docker I believe does now block this https://github.com/moby/profiles/commit/7158007a83005b14a24f...

Titan21895d ago

> [...] that root was just my unprivileged podman user on the host

Couldn't you then simply re-run the exploit again as unprivileged podman user and gain root on the host?

kelnos5d ago

No, because you're still in the container, and there's no route to the host's root from there.

If you can orchestrate a container escape from the container's "root", then you're on to something.

tuananh5d ago

did anyone try it? it suppose to work right?

itvision5d ago

Exploit download/source: https://github.com/theori-io/copy-fail-CVE-2026-31431/blob/m...

The dedicated website: https://copy.fail

grimblee5d ago

If I understand correctly, rootfull podman with --userns=auto would also prevent the privilege escalation ?

angry_octet4d ago

No it wouldn't. The exploit is not impacted by namespaces.

cpach5d ago

How?

grimblee4d ago

--userns=auto asign a different namespace for each container, so if you escape it you get a random uid far far away from root it also protects other containers from the compromise since they each have their own namespace and uid/gid range, the drawback though is that you can't mount shared volume unless you use a pod, since you would see files from outside your uid/gid range as owned by nobody and inaccessible.

cpach4d ago

That might make Copy Fail harder to exploit, but I still wouldn’t bet money on CF being impossible to use in that scenario.

1 more reply

netheril965d ago

If the goal is just preventing full root privileges, a CapabilityBoundingSet in a systemd unit will do.

However copy fail can be used in many other ways not contained by containers or the above settings. For example it can modify the /etc/ssl/certs to prepare for MitM attacks. If you have multiple containers based on the same image then one compromised CA set affects another.

est5d ago

I added these

    AmbientCapabilities=CAP_NET_BIND_SERVICE
    CapabilityBoundingSet=CAP_NET_BIND_SERVICE
    NoNewPrivileges=yes

to my .service. Is it good enough?

2bitencryption5d ago

tl;dr - within the container, the exploit works, and elevates to root (uid 0) within the container - BUT because that namespace actually maps to uid 1000 (the user) outside the container, the escalation does not flow up to the host.

But… does this escape the container? If not (the author seems to indicate it does not) then does it matter if you are in Docker or rootless Podman, right, since the end result is always: you have elevated to root within the container. If the rest of the container filesystem isolation does its job, the end result is the same? Though I guess another chained exploit to escape the container would be worse in Docker? Do I have that right?

firesteelrain5d ago

This is a problem and most people hadn’t considered it before because the caching is done to speed up build pipeline performance:

“ While rootless containers prevent the attacker from escalating to host root, the page cache is still shared across the host. Containers that re-use the same base image layers share the same cached pages for those layers — if a malicious CI job corrupts a binary in the page cache, other containers launched from that same image could end up executing the poisoned version.”

eqvinox5d ago

Running sstrip on an ELF binary is called ELF "golfing"? TIL…

Retr0id5d ago

It is, although real ELF golfers consider that a little naive.

eqvinox5d ago

It does feel a little simplistic to get a special name. But lesser things have gotten fancier names...

repelsteeltje5d ago

Sorry for posting a n00b question, but could you share etymology on this term golfing?

mbreese5d ago

It’s manipulating the binary to make it as small as possible. In golf, the lowest score wins. So, in this context, the smallest binary that still works wins.

Retr0id5d ago

In golf, lower scores are better.

walletdrainer5d ago

This feels LLM generated, lots of emdashes and even more text around a completely false premise.

cpach5d ago

What is the false premise in the article?

Retr0id5d ago

That rootless containers mitigate kernel exploits.

hackeman3005d ago

It's a shame, this seems like an interesting topic but I can't get past the blatant AI-isms littered throughout.

>This is not raw shellcode — it is a fully formed ELF executable

washbasin5d ago

Please post a tl;dr at the top or even in the subject. Many of us are scrambling to patch/reboot our **.

donaldjbiden5d ago

This isn't a new CVE. It's just documenting what happened when this person ran the exploit inside a certain type of container.

PunchyHamster5d ago

tl;dr (not from article)

    echo -e 'install algif_aead /bin/false\n' > /etc/modprobe.d/disable-algif.conf

that just prevents the faulty module from loading. So you have time to fix it properly (kernel upgrade)

Technically there should be zero impact (the very very few tools that use it will fall back to userspace), I haven't even found that module loaded in infrastructure

Then check if it is loaded, and if it is, unload/reboot

isityettime5d ago

It already has a table of contents. The heading titled "why rootless containers stopped the escalation" is your tl;dr.

j / k navigate · click thread line to collapse

89 comments

codedokode5d ago

cormorant5d ago

cpach5d ago

Well at least if it’s crufty stuff like AF_ALG that barely no-one is using and is kind of a forgotten place of the kernel.

I don’t oppose reasonable crypto in the kernel, like WireGuard.

cluckindan4d ago

>barely no-one is using

Except, you know, many things

cpach4d ago

Many? No, I don’t agree.

nwallin4d ago

I like the idea of keeping stuff out of the kernel as much as possible, but in this case, there are good reasons why cryptography has to live in the kernel.

We need on disk encryption, and we need to be able boot from an encrypted disk. So we need encryption for that.

We need network filesystems, and we need the traffic over the network to be encrypted. So we need encryption.

IPsec, for better or for worse, is authenticated and partially encrypted at the transport layer, so if we want a linux machine to speak IPsec, we need encryption.

ranger_danger4d ago

What about having a way to run the same crypto code but in userspace? Or perhaps turn it into a library that can be used from userspace.

Anonbrit3d ago

For encrypted disks, you've now got high-performance data shuffling between userspace and kernel space - a massive new attack surface

cpach4d ago

Sure. But it would probably still be a good thing if the kernel maintainers could tear out AF_ALG.

ohnei5d ago

pjmlp4d ago

Those Windows mistakes have been sorted out for a long time now.

bawolff5d ago

It sounds like they are saying the exploit works but the proof-of-concept doesn't due to superficial reasons(?) That hardly seems like something to brag about.

raddan5d ago

bawolff4d ago

> make sure that root inside a container is not root outside a container.

And its a great idea in general, it just doesn't stop this exploit.

angry_octet4d ago

They seem to be in a weird state of denial? Why don't they make it clear that it's just this POC that is blocked? It's like they don't understand.

amluto5d ago

Sigh.

1. I would hope the default seccomp policy blocks AF_ALG in these containers. I bet it doesn’t. Oh well.

So:

- What if you get an fd backed by the zero page and write to it? This can’t result in anything that the administrator would approve of.

- What if you ro-bind-mount something in? It’s not ro any more.

jeroenhd5d ago

> I would hope the default seccomp policy blocks AF_ALG in these containers. I bet it doesn’t. Oh well.

throw0101a5d ago

> We're disabling a cryptographic performance enhancement feature entirely because there was a security bug in them that one time? It's a rather weird default to use.

The need for this feature/functionality in the fist place is questioned by some:

* https://news.ycombinator.com/item?id=47952181#unv_47956312

e12e5d ago

PunchyHamster5d ago

that's just moving to kernel that had 1000x less eyes on it. Yeah sure it will have less exploits but purely because nobody bothers to look when there are much juicer targets on Linux.

But I am disappointed that we still don't have clear OpenSSL successor, there is nothing to be salvaged from this mess of a project

1 more reply

nubinetwork5d ago

> We're disabling a cryptographic performance enhancement feature entirely because there was a security bug in them that one time?

To my knowledge, not many things were using the in-kernel code anyways, the recommended way is to use userland tools...

It's optional for openssl, systemd apparently needs it, but deleting the module from one of my systems didn't cause any issues. /shrug

PunchyHamster5d ago

I haven't had it loaded on 100s of servers ranging kernel version from 5.10 to 6.14. The use is just that low

Retr0id5d ago

hlieberman5d ago

eqvinox5d ago

AIUI they haven't shown a container escape and are just claiming it so far. Or did I miss something?

fguerraz5d ago

I just contributed this [1] which does what you want for seccomp. Well, not by default, but profiling is now effective against this attack.

Oh, an this [2] just happened

[1] https://github.com/containers/oci-seccomp-bpf-hook/pull/209 [2] https://github.com/moby/moby/pull/52501

PunchyHamster5d ago

> I would hope the default seccomp policy blocks AF_ALG in these containers. I bet it doesn’t. Oh well.

there is no reason it would be default policy. Else might as well block every socket and just multiplex everything on stdin/out

staticassertion5d ago

The reason is that it's very rarely used and has a history of issues.

cduzz5d ago

I'd have guessed that the default paranoia-first policy would be "drop everything; verify what you need" which would include AF_ALG.

share and enjoy!

SV_BubbleTime5d ago

>might as well block every socket and just multiplex everything on stdin/out

You may be on to something…

dwroberts5d ago

There is an addendum at the bottom where they admit the page corruption is still problematic even with rootless podman.

staticassertion5d ago

MicroVMs have much lower attack surface and you can even toss a container into one if you'd like.

Or use gvisor, which mitigates this vulnerability.

anygivnthursday5d ago

Containers were never a security boundary. VMs have better isolation, which is why people choose them for security. Containers are convenience and usually have better performance.

dwroberts5d ago

2 more replies

graemep5d ago

They may not provide isolation as VMs but they clearly do limit some attacks. VMs do not provide the same isolation as using physically separate hardware either.

I would have thought they provide better isolation than using multiple users which is the traditional security boundary.

It might depends on what you mean by a container? Are sandboxes such as Bubblewrap and Firejail containers?

ButlerianJihad5d ago

Containers are a convenience boundary and they increase complexity of your risk assessments.

1 more reply

raesene95d ago

I've not looked for podman but moby/docker I believe does now block this https://github.com/moby/profiles/commit/7158007a83005b14a24f...

Titan21895d ago

> [...] that root was just my unprivileged podman user on the host

Couldn't you then simply re-run the exploit again as unprivileged podman user and gain root on the host?

kelnos5d ago

No, because you're still in the container, and there's no route to the host's root from there.

If you can orchestrate a container escape from the container's "root", then you're on to something.

tuananh5d ago

did anyone try it? it suppose to work right?

itvision5d ago

Exploit download/source: https://github.com/theori-io/copy-fail-CVE-2026-31431/blob/m...

The dedicated website: https://copy.fail

grimblee5d ago

If I understand correctly, rootfull podman with --userns=auto would also prevent the privilege escalation ?

angry_octet4d ago

No it wouldn't. The exploit is not impacted by namespaces.

cpach5d ago

How?

grimblee4d ago

cpach4d ago

That might make Copy Fail harder to exploit, but I still wouldn’t bet money on CF being impossible to use in that scenario.

1 more reply

netheril965d ago

If the goal is just preventing full root privileges, a CapabilityBoundingSet in a systemd unit will do.

est5d ago

I added these

    AmbientCapabilities=CAP_NET_BIND_SERVICE
    CapabilityBoundingSet=CAP_NET_BIND_SERVICE
    NoNewPrivileges=yes

to my .service. Is it good enough?

2bitencryption5d ago

firesteelrain5d ago

This is a problem and most people hadn’t considered it before because the caching is done to speed up build pipeline performance:

eqvinox5d ago

Running sstrip on an ELF binary is called ELF "golfing"? TIL…

Retr0id5d ago

It is, although real ELF golfers consider that a little naive.

eqvinox5d ago

It does feel a little simplistic to get a special name. But lesser things have gotten fancier names...

repelsteeltje5d ago

Sorry for posting a n00b question, but could you share etymology on this term golfing?

mbreese5d ago

It’s manipulating the binary to make it as small as possible. In golf, the lowest score wins. So, in this context, the smallest binary that still works wins.

Retr0id5d ago

In golf, lower scores are better.

walletdrainer5d ago

This feels LLM generated, lots of emdashes and even more text around a completely false premise.

cpach5d ago

What is the false premise in the article?

Retr0id5d ago

That rootless containers mitigate kernel exploits.

hackeman3005d ago

It's a shame, this seems like an interesting topic but I can't get past the blatant AI-isms littered throughout.

>This is not raw shellcode — it is a fully formed ELF executable

washbasin5d ago

Please post a tl;dr at the top or even in the subject. Many of us are scrambling to patch/reboot our **.

donaldjbiden5d ago

This isn't a new CVE. It's just documenting what happened when this person ran the exploit inside a certain type of container.

PunchyHamster5d ago

tl;dr (not from article)

    echo -e 'install algif_aead /bin/false\n' > /etc/modprobe.d/disable-algif.conf

that just prevents the faulty module from loading. So you have time to fix it properly (kernel upgrade)

Technically there should be zero impact (the very very few tools that use it will fall back to userspace), I haven't even found that module loaded in infrastructure

Then check if it is loaded, and if it is, unload/reboot

isityettime5d ago

It already has a table of contents. The heading titled "why rootless containers stopped the escalation" is your tl;dr.

j / k navigate · click thread line to collapse