CVE-2024-6409: OpenSSH: Possible remote code execution in privsep child (opens in new tab)

(openwall.com)

141 pointsandreyv1y ago56 comments

56 comments

34 comments · 8 top-level

Couldn't this entire class of bug be solved by annotating signal handlers in the source code and checking at compile time that anything called from a signal handler is async-signal-safe?

dgrunwald1y ago

There are some static analysis tools that can check this.

Cert's SIG30 rule page has a list: https://wiki.sei.cmu.edu/confluence/display/c/SIG30-C.+Call+...

Also there's https://clang.llvm.org/extra/clang-tidy/checks/bugprone/sign...

klysm1y ago

Sounds reasonable, but since the language layer has no knowledge of signal handlers or what that means, it would be a separation of concerns problem. I'm sure you could get clang to do it, but still a tricky thing to design around.

Ultimately it's an example of an invariant where it's clear that programmers can't be trusted to uphold it. In this case, the consequences can be very significant.

Joker_vD1y ago

> the language layer has no knowledge of signal handlers or what that means

Despite the fact that there is explicit runtime support for signal handlers in the language runtime (i.e. libc).

2 more replies

loeg1y ago

Static analysis tools would go a long way here, yes, and it should be a relatively straightforward analysis. You probably don't even need to explicitly annotate signal handlers, just examine arguments to calls to signal() and sigaction().

22 more replies

immibis1y ago

This entire class of bug could also be solved by avoiding signal handlers. You can still use SIGALRM for a timeout, but don't log it. If you need complex processing, use signalfd to read signals in the event loop.

bregma1y ago

Well, the compiler has no way of knowing if a function will later be a signal handler after linking, or even dynamic loading.

There is no portable way to annotate all functions ever written or ever will be written as being async signal safe.

Which functions are async signal safe varies with the operating system and runtime (eg. an unsafe function in linux-gnu might be safe in linux-musl or linux-bionic).

Other than those insurmountable problems, yeah, good idea.

fanf21y ago

What I want when writing a signal handler is to be able to say, this function must be async-signal-safe and therefore all the functions it calls must be async-signal-safe. That can be done purely at compile time; I don’t need to worry about linking.

The annotation does not need to be portable; if it’s present on one system then other systems still benefit because the code is written to pass the check.

The list of async-signal-safe functions is well documented and quite short, so it would not be much work to add the annotations to the header files. It’s OK if some safe functions are omitted, because signal handlers should be written to do the absolute bare minimum.

1 more reply

adrianmonk1y ago

> compiler has no way of knowing if a function will later be a signal handler after linking, or even dynamic loading

You could check it at runtime.

Just like with array bounds checking, in many cases the compiler could sometimes prove the runtime check isn't necessary and eliminate it.

> Which functions are async signal safe varies with the operating system and runtime

Annotations could enumerate specific platforms where it is safe or unsafe. Or you could annotate based on specific attributes of platforms that make it safe or unsafe.

account421y ago

> Well, the compiler has no way of knowing if a function will later be a signal handler after linking, or even dynamic loading.

That's why GP suggested annotating them. Typically this would be done via a function attribute.

> There is no portable way to annotate all functions ever written or ever will be written as being async signal safe.

This is not a requirement for such an annotation to exist and to be used by projects that care about security or even just correctness.

> Which functions are async signal safe varies with the operating system and runtime (eg. an unsafe function in linux-gnu might be safe in linux-musl or linux-bionic).

And libc implementations already annotate many of their functions to tell the compiler how they work. Compilers are also more than happy to assume behavior of standard function matches the C/C++ standards in non-freestanding environmnets.

> Other than those insurmountable problems, yeah, good idea.

All fairly trivial problems that have already been solved many times for similar issues.

I'd like a more general attribute though to declare that a particular funcion is in some abstract domain and then annotations that certain functions may or may not be called in certain domains. This could come useful in cases where you want some functions to only be called from special threads.

1 more reply

stncls1y ago· 5 in thread

No vulnerability name, no website, concise description, neutral tone, precise list of affected distros (RHEL + derivatives and some EOL Fedoras) and even mention of unaffected distros (current Fedoras), plain admission that no attempt was made to exploit. What a breath of fresh air!

(I am only joking of course. As a recovering academic, I understand that researchers need recognition, and I have no right to throw stones -- glass houses and all. Also, this one is really like regreSSHion's little sibling. Still, easily finding the information I needed made me happy.)

tptacek1y ago

I don't think recognition for researchers is the big win for named vulnerabilities. In the places that matter, they can just describe their findings in a short sentence and get all the recognition that matters. The names are mostly for the benefit of users.

ericpauley1y ago

Security researchers definitely do the naming gimmick for personal brand purposes. This may not be as obvious when it’s successful, but academic papers routinely name vulnerabilities when there is no real benefit to users.

1 more reply

AndyMcConachie1y ago

The author of the mail is Solar Designer, a bit of a legend AFAIC. He has no need to pump up his brand and he really really knows what he's doing.

formerly_proven1y ago

Yeah. He created openwall and the oss-security list.

dimask1y ago

At least they do not name them after themselves.

qalmakka1y ago· 5 in thread

This is why I've always disliked Debian and Red Hat.

1. I hate the fact they have the hubris to think they can be smarter than the upstream developers and patch old versions

2. I hate the fact they don't ship vanilla packages, but instead insist on patching things for features that nobody relies on anyway, __because they're not upstream__.

Maintainers should stick to downloading tarballs, building them and updating them promptly when a new version is out. If there's no LTS available, pay upstream and get an LTS, don't take a random version and patch it forever just to keep the same version numbers, it's nonsensical and it was only a matter of time before people tried to exploit it. Just look at the XZ backdoor for instance, which relied on RedHat and Debian deploying a patched libsystemd.

gruturo1y ago

Enterprises don't go for RHEL because it's free software, yay freedom!

They go for it because it gives a very stable, solid foundation. They don't want a fragile base layer prone to breaking every day of the week.

This involves backporting a lot of stuff (primarily security fixes) because you can't just upgrade any package to its latest version, it will have entirely new dependencies, potentially breaking changes etc.

What should RedHat do, which does not:

1) make them lose their enterprise customers wanting a stable base

2) have unpatched security holes all over their distros

3) not cause them to backport stuff (we are here at the moment) ?

JackSlateur1y ago

Compagnies I've worked for that uses redhat do so because they think paying will prevent them from working. As if, sudenly, running a nearly 10y-old code was no longer stupid, because you paid for it.

3 more replies

qalmakka1y ago

I understand the business logic behind that. The point is, maybe they should consider paying the upstream developers to backport the stuff themselves instead of dabbling with C code they somewhat understand?

2 more replies

rlpb1y ago

Distributions patch to get consistent and integrated behaviour across the platform, including new features that require implementation into multiple places to be even minimally useful, and ports to newer APIs to make everything work against a single version of each library so that they do not conflict.

Upstreams generally don't do this until prompted, and sometimes resist until the path is proven and becomes best practice because distributions pushed it.

This process is mostly invisible to you because distributions have been successful at getting the changes needed for sane and consistent behaviour embedded into tools and expectations by default. All you see are the patches that are in flight or didn't make it.

If you want a distribution that does only minimal patching then there are distributions for that. The fact that the major distributions do patch speaks volumes about which approach results in a better experience for users.

Diederich1y ago

Do you think there is a reason that some distros go through all of this additional trouble?

candiddevmike1y ago· 4 in thread

The risk you take when you use a distribution that modifies upstream. Debian has had similar issues in the past (maybe not CVEs, but certainly packager-created bugs).

2OEH8eoCRo01y ago

It's risks all the way down. There are risks to not patching upstream as well.

klysm1y ago

Debian has a fairly famous one: CVE-2008-0166

meowface1y ago

Ouch, that one's bad: https://github.com/g0tmi1k/debian-ssh#the-bug

>These lines were removed because they caused the Valgrind and Purify tools to produce warnings about the use of uninitialized data in any code that was linked to OpenSSL. Removing this code has the side effect of crippling the seeding process for the OpenSSL PRNG. Instead of mixing in random data for the initial seed, the only "random" value that was used was the current process ID. On the Linux platform, the default maximum process ID is 32,768, resulting in a very small number of seed values being used for all PRNG operations.

rlpb1y ago

In that particular case upstream _was_ consulted and had acked the patch.

1 more reply

hannob1y ago· 2 in thread

For clarification, the bug is in a patch applied by red hat, not in openssh itself.

bonzini1y ago

Technically the bug is in upstream code, but it is latent without the Red Hat patch:

> cleanup_exit() was not meant to be called from a signal handler [...] Fedora 38+ has moved to newer upstream OpenSSH that doesn't have the problematic cleanup_exit() call.

> This extra problematic logic only existed in upstream OpenSSH(-portable) for ~9 months

The fix also doesn't touch the Red Hat-specific code:

     diff -urp openssh-8.7p1-38.el9_4.1-tree.orig/sshd.c openssh-8.7p1-38.el9_4.1-tree/sshd.c
     --- openssh-8.7p1-38.el9_4.1-tree.orig/sshd.c 2024-07-08 03:42:51.431994307 +0200
     +++ openssh-8.7p1-38.el9_4.1-tree/sshd.c 2024-07-08 03:48:13.860316451 +0200
     @@ -384,7 +384,7 @@ grace_alarm_handler(int sig)
      
       /* Log error and exit. */
       if (use_privsep && pmonitor != NULL && pmonitor->m_pid <= 0)
     -  cleanup_exit(255); /* don't log in privsep child */
     +  _exit(1); /* don't log in privsep child */
       else {
        sigdie("Timeout before authentication for %s port %d",
            ssh_remote_ipaddr(the_active_state),

They suggest applying it even on non Red Hat distros.

loeg1y ago

Sort of. The upstream bug isn't thought to be exploitable alone.

ta9881y ago· 1 in thread

My understanding here is that it only impacts Redhat (and maybe derivatives)?

stncls1y ago

Yes, only RHEL 9 (the current version of RHEL) and its upstreams/downstreams (CentOS Stream 9, Rocky Linux 9, Alma Linux 9,...).

Also affected: Fedora 37, 36 and possibly 35, which are all end-of-life (since December 2023 in the case of Fedora 37).

Not affected: Fedora 38 (also EOL), 39 (maintained) and 40 (current).

password43211y ago

Is this in any way related to CVE-2024-6387 "RegreSSHion" discussed last week?

https://news.ycombinator.com/item?id=40843778

Edit: Ok it seems very closely related; I was just surprised no one had linked the previous discussion.

crest1y ago

It's almost as if you should understand security critical C code before you start patching it to death.

1 more reply

j / k navigate · click thread line to collapse

56 comments

34 comments · 8 top-level

londons_explore1y ago· 9 in thread

Couldn't this entire class of bug be solved by annotating signal handlers in the source code and checking at compile time that anything called from a signal handler is async-signal-safe?

dgrunwald1y ago

There are some static analysis tools that can check this.

Cert's SIG30 rule page has a list: https://wiki.sei.cmu.edu/confluence/display/c/SIG30-C.+Call+...

Also there's https://clang.llvm.org/extra/clang-tidy/checks/bugprone/sign...

klysm1y ago

Ultimately it's an example of an invariant where it's clear that programmers can't be trusted to uphold it. In this case, the consequences can be very significant.

Joker_vD1y ago

> the language layer has no knowledge of signal handlers or what that means

Despite the fact that there is explicit runtime support for signal handlers in the language runtime (i.e. libc).

2 more replies

loeg1y ago

22 more replies

immibis1y ago

bregma1y ago

Well, the compiler has no way of knowing if a function will later be a signal handler after linking, or even dynamic loading.

There is no portable way to annotate all functions ever written or ever will be written as being async signal safe.

Which functions are async signal safe varies with the operating system and runtime (eg. an unsafe function in linux-gnu might be safe in linux-musl or linux-bionic).

Other than those insurmountable problems, yeah, good idea.

fanf21y ago

The annotation does not need to be portable; if it’s present on one system then other systems still benefit because the code is written to pass the check.

1 more reply

adrianmonk1y ago

> compiler has no way of knowing if a function will later be a signal handler after linking, or even dynamic loading

You could check it at runtime.

Just like with array bounds checking, in many cases the compiler could sometimes prove the runtime check isn't necessary and eliminate it.

> Which functions are async signal safe varies with the operating system and runtime

Annotations could enumerate specific platforms where it is safe or unsafe. Or you could annotate based on specific attributes of platforms that make it safe or unsafe.

account421y ago

> Well, the compiler has no way of knowing if a function will later be a signal handler after linking, or even dynamic loading.

That's why GP suggested annotating them. Typically this would be done via a function attribute.

> There is no portable way to annotate all functions ever written or ever will be written as being async signal safe.

This is not a requirement for such an annotation to exist and to be used by projects that care about security or even just correctness.

> Which functions are async signal safe varies with the operating system and runtime (eg. an unsafe function in linux-gnu might be safe in linux-musl or linux-bionic).

> Other than those insurmountable problems, yeah, good idea.

All fairly trivial problems that have already been solved many times for similar issues.

1 more reply

stncls1y ago· 5 in thread

tptacek1y ago

ericpauley1y ago

1 more reply

AndyMcConachie1y ago

The author of the mail is Solar Designer, a bit of a legend AFAIC. He has no need to pump up his brand and he really really knows what he's doing.

formerly_proven1y ago

Yeah. He created openwall and the oss-security list.

dimask1y ago

At least they do not name them after themselves.

qalmakka1y ago· 5 in thread

This is why I've always disliked Debian and Red Hat.

1. I hate the fact they have the hubris to think they can be smarter than the upstream developers and patch old versions

2. I hate the fact they don't ship vanilla packages, but instead insist on patching things for features that nobody relies on anyway, __because they're not upstream__.

gruturo1y ago

Enterprises don't go for RHEL because it's free software, yay freedom!

They go for it because it gives a very stable, solid foundation. They don't want a fragile base layer prone to breaking every day of the week.

What should RedHat do, which does not:

1) make them lose their enterprise customers wanting a stable base

2) have unpatched security holes all over their distros

3) not cause them to backport stuff (we are here at the moment) ?

JackSlateur1y ago

3 more replies

qalmakka1y ago

2 more replies

rlpb1y ago

Upstreams generally don't do this until prompted, and sometimes resist until the path is proven and becomes best practice because distributions pushed it.

Diederich1y ago

Do you think there is a reason that some distros go through all of this additional trouble?

candiddevmike1y ago· 4 in thread

The risk you take when you use a distribution that modifies upstream. Debian has had similar issues in the past (maybe not CVEs, but certainly packager-created bugs).

2OEH8eoCRo01y ago

It's risks all the way down. There are risks to not patching upstream as well.

klysm1y ago

Debian has a fairly famous one: CVE-2008-0166

meowface1y ago

Ouch, that one's bad: https://github.com/g0tmi1k/debian-ssh#the-bug

rlpb1y ago

In that particular case upstream _was_ consulted and had acked the patch.

1 more reply

hannob1y ago· 2 in thread

For clarification, the bug is in a patch applied by red hat, not in openssh itself.

bonzini1y ago

Technically the bug is in upstream code, but it is latent without the Red Hat patch:

> cleanup_exit() was not meant to be called from a signal handler [...] Fedora 38+ has moved to newer upstream OpenSSH that doesn't have the problematic cleanup_exit() call.

> This extra problematic logic only existed in upstream OpenSSH(-portable) for ~9 months

The fix also doesn't touch the Red Hat-specific code:

     diff -urp openssh-8.7p1-38.el9_4.1-tree.orig/sshd.c openssh-8.7p1-38.el9_4.1-tree/sshd.c
     --- openssh-8.7p1-38.el9_4.1-tree.orig/sshd.c 2024-07-08 03:42:51.431994307 +0200
     +++ openssh-8.7p1-38.el9_4.1-tree/sshd.c 2024-07-08 03:48:13.860316451 +0200
     @@ -384,7 +384,7 @@ grace_alarm_handler(int sig)
      
       /* Log error and exit. */
       if (use_privsep && pmonitor != NULL && pmonitor->m_pid <= 0)
     -  cleanup_exit(255); /* don't log in privsep child */
     +  _exit(1); /* don't log in privsep child */
       else {
        sigdie("Timeout before authentication for %s port %d",
            ssh_remote_ipaddr(the_active_state),

They suggest applying it even on non Red Hat distros.

loeg1y ago

Sort of. The upstream bug isn't thought to be exploitable alone.

ta9881y ago· 1 in thread

My understanding here is that it only impacts Redhat (and maybe derivatives)?

stncls1y ago

Yes, only RHEL 9 (the current version of RHEL) and its upstreams/downstreams (CentOS Stream 9, Rocky Linux 9, Alma Linux 9,...).

Also affected: Fedora 37, 36 and possibly 35, which are all end-of-life (since December 2023 in the case of Fedora 37).

Not affected: Fedora 38 (also EOL), 39 (maintained) and 40 (current).

password43211y ago

Is this in any way related to CVE-2024-6387 "RegreSSHion" discussed last week?

https://news.ycombinator.com/item?id=40843778

Edit: Ok it seems very closely related; I was just surprised no one had linked the previous discussion.

crest1y ago

It's almost as if you should understand security critical C code before you start patching it to death.

1 more reply

j / k navigate · click thread line to collapse