Technical Details on Today's Outage (opens in new tab)

(crowdstrike.com)

124 pointsxjia1y ago131 comments

131 comments

64 comments · 19 top-level

tail_exchange1y ago· 20 in thread

Can someone who actually understands what CrowdStrike does explain to me why on earth they don't have some kind of gradual rollout for changes? It seems like their updates go out everywhere all at once, and this sounds absolutely insane for a company at this scale.

hatsunearu1y ago

It sounds like Channel files are just basically definition updates in normal antivirus software; it's not actually code, just some stuff on what the software should "look out for".

And it sounds like they shipped some malformed channel file and the software that interprets it can't handle malformed inputs and ate shit. That software happened to be kernel mode, and also marked as boot-critical, so it if falls over, it causes a BSOD and inability to boot.

and it's kind of understandable that channel files might seem safe to update constantly without oversight, but that's just assuming that the file that interprets the channel file isn't a bunch of dogshit code.

omoikane1y ago

Configuration files should be treated like code and follow the same gradual rollout practices. See also:

https://sre.google/workbook/canarying-releases/

Which starts with "a majority of incidents are triggered by binary or configuration pushes". The stats for config related failures is one link away at

https://sre.google/workbook/postmortem-analysis/

Where it says 31% of outages in 2010-2017 are caused by "configuration push".

Murky35151y ago

It's not understandable imo. At the very least they should have tests for the loader component that shows it can handle corrupted input. Amateur hour.

4 more replies

timbelina1y ago

I was reading these two threads:

https://x.com/perpetualmaniac/status/1814376668095754753?s=4...

https://x.com/ananayarora/status/1814269058088304760

The authors explain the coding error and coredump well, but I'm lost: Is the buggy code that they're describing the channel file, or some kernel code that consumes the channel file? Is there a way to tell?

1 more reply

Hizonner1y ago

> it's not actually code, just some stuff on what the software should "look out for"

If it controls the behavior of a computer, then it's code.

> and it's kind of understandable that channel files might seem safe to update constantly without oversight

Yeah, no, it's not. They pushed an update that crashed the majority of their Windows installed base in a way that couldn't be fixed remotely. It doesn't matter what the update was to. It needed to be tested. There is no way that any deployment pipeline that could fail to catch something that blatant could possibly be "understandable".

... and that kernel mode code shouldn't have been parsing anything with any complexity to begin with. And should have been tested into oblivion, and possibly formally verified.

This is amateur-hour nonsense. Which is what you expect from most of these "Enterprise Cyber Security(TM)" vendors.

... AND the users shouldn't have just gone and shoved that kind of thing into every critical path they could think of.

notepad0x901y ago

This "channel file" is equivalent to an AV signature file. Crowdstrike is the company, the product here is "Falcon" which does behavioral monitoring of processes both on the device and using logs collected from the device in the cloud.

I can see your perspective, but you should consider this: They protect these many companies, industries and even countries at such a global scale and you haven't even heard of them in the last 15 years of their operation until this one outage.

You can't take days testing gradual roll outs for this type of content, because that's how long customers are left unprotected by that content. Although the root cause is on the channel files, I feel like the driver that processes them should have been able to handle the "logic bug" in question so we'll find out more over time I guess.

For example, with windows defender which runs on virtually all windows systems, the signature updates on billions of devices are pushed immediately (with exception to enterprise systems, but even then there is usually not much testing on signature files themselves, if at all). As far as the devops process Crowdstrike uses to test the channel files, I think it's best to leave commentary on that to actual insiders but these updates happen several times a day sometimes and get pushed to every Crowdstrike customer.

joaomacp1y ago

> They protect these many companies, industries and even countries at such a global scale and you haven't even heard of them in the last 15 years of their operation

I certainly don't want to know (through disaster news) about the construction company that built the bridge I drive through everyday, not for another 15 years, not ever!

This kind of software simply should not fail, with such a massive install base on so many sensitive industries. We're better than that, the software industry is starting to mature and there are simple and widely-known procedures that could have been used to prevent it.

I have no idea how CrowdStrike stock has only dropped 10% to the values of 2 months ago. Actually, if the financial troubles you get into are only these, take back what I said, software should be failing a lot (why spend money on robustness when you don't lose money on bugs?)

1 more reply

zug_zug1y ago

>> You can't take days testing gradual roll outs for this type of content, because that's how long customers are left unprotected by that content.

If you can't take days to do it then do a gradual rollout in hours. It's not a high bar.

1 more reply

Marsymars1y ago

> They protect these many companies, industries and even countries at such a global scale and you haven't even heard of them in the last 15 years of their operation until this one outage.

They certainly run their software on those many customers' systems, but but based on my experience with them, "protect" isn't a descriptor I'm willing to grant them.

We don't have the counter-factual where Crowdstrike doesn't exist, but I'm not convinced that they've been a net economic or security benefit to the world over the span of their existence.

1 more reply

tail_exchange1y ago

Another commenter said that this change was a malformed configuration that crashed the application. If this is the case, you wouldn't need days to see this problem manifest, but only a few minutes. If they had rolled it out to 1% of their customers and waited for a couple hours before releasing it everywhere, they probably would have caught it.

1 more reply

lotr51y ago

they are dumb enough to process their "channel files" in kernel, this should be only done in usermode

2 more replies

SkyPuncher1y ago

My understanding is they basically deployed a configuration file. It seems like these files might be akin to virus signatures or other frequently updated run-time configuration.

I actually don't think it's outrageous that these files are rolled out globally, simultaneously. I'm guessing they're updated frequently and _should_ be largely benign.

What stands out to me is the fact that a bad config file can crash the system. No rollback mechanism. No safety checks. No safe, failure mode. Just BSOD.

Given the fix is simply deleting the broken file, it's astounding to me that the system's behavior is BSOD. To me, that's more damning that a bad "software update". These files seem to change often and frequently. Given they're critical path, they shouldn't have the ability to completely crash the system.

joshka1y ago

> I actually don't think it's outrageous that these files are rolled out globally, simultaneously.

Anyone competent that manages software at scale should generally hold the opposite opinion to this.

Analemma_1y ago

That’s the danger of running in kernel mode. I’ve seen some people claim this is because the bad file starts a chain of events which concludes in trying to page an unpageable file, which is an application crash in user space but brings down the whole system if it happens in the kernel.

1 more reply

opello1y ago

Entertainingly enough I got to see a similar thing happen, where a configuration file was killing hardware in the field. After the failure and remediation multiple CI jobs were put in place (some months later) to do basic validity checks on the files.

The lesson of "multiple parser implementations for the same thing seems bad" and "sanity checks to prevent breaking things are hard heuristics to define" such that further changes were deferred.

All that to say that I can appreciate circumstances in which satisfying "don't crash the system" in response to configuration data can actually be fairly hard to realize. It can very significantly depend on the design of the pieces in question. But I also agree that it's pretty damning.

userbinator1y ago

I'm more surprised at the fact that they didn't appear to have tested it on themselves first.

FWIW, at least Microsoft still "dogfoods" (and it's what coined that term), and even if the results of that aren't great, I'm sure they would've caught something of this severity... but then again, maybe not[1].

[1] https://news.ycombinator.com/item?id=18189139

Ekaros1y ago

This is what really would concern me too. With this wide spread issue any reasonable testing should have detected it. Having a few dozen machines with different configurations for an few hours should have detected this. This should have been in a smoke test.

Push update to machines, observe, power cycle them, observe...

I could understand error in some rarer setup, but this was so common that it should have been obvious error.

Zamiel_Snawley1y ago

Truly, how the extent the damage was so widespread is my main question at this point.

Everyone has a buggy release at some point, but impacting global customers at this level is damn near unforgivable.

Heads need to roll for this oversight.

Murky35151y ago

Because if release immediately, velocity go up

jefurii1y ago

I have a friend who is a security guard at a bank in Hollywood, CA, who told me the computers at his location started going down between 12:00 and 13:00PDT (19:00-20:00UTC).

I don't understand CrowdStrike's rollout system, but given that people started seeing trouble earlier in the day, surely by that time they could have shut down the servers that were serving the updates, or something??

He also told me that soon after that the street outside the bank (another bank across the street, a hospital several blocks down) was lined with police who started barring entry to the buildings unless people had bank cards. By the time I woke up this morning technical people already knew basically what was going on, but I really underestimated how freaked out the average person must have been today.

PedroBatista1y ago· 8 in thread

Light on technical and light on details.

Putting the actual blast radius aside, this whole thing seems a bit amateurish for a "security company" that pulls the contracts they do.

hi-v-rocknroll1y ago

Yup.

- They don't do enough or the right kind of smoke tests.

- They don't do exponential-canary deployments with an ability to rollback, and instead just YOLO it.

- They don't appear to have a customer-side security / client platform team update approval gating change control process for software updates or for definitions (or whatever they use).

This is fundamentally laziness and/or incompetency.

tootie1y ago

They say they run this process multiple times a day. Must be tens of thousands of deployments. I'd guess complacency set in at some point. Just completely inured to the risks they were taking.

nine_zeros1y ago

> This is fundamentally laziness and/or incompetency.

From management - that couldn't care less about low visibility, low impact projects

mattmaroon1y ago

I have been wondering why they didn’t do canary deployments. Seems like quite an obvious thing to do.

grecy1y ago

You assume the most competent company got the contracts, which is simply not the world we live in.

The company that lobbied the hardest and paid the most in bribes got the contracts.

tootie1y ago

Having been through enough procurement cycles as both a buyer and seller there does not need to be a whit of malfeasance for a bad decision to occur. It's aggressive sales, price wars, poorly informed decision makers, gut instinct, favoritism, familiarity, incumbency, network effects.

You notice how this outage affected hospitals and airlines? There is a strong tendency in software sales for industries to align around one or two leaders. Oh, American chose Crowdstrike? Maybe we at Delta should just do what they did. Or literally Delta hires the VP from American to be their CISO and he just does what he did before.

Vendor selection is hard and buyer's remorse is frequently hard to deal with once you've sunk cost into a migration.

manquer1y ago

Rather point I think is there technical and evaluation gates companies of this nature regularly go through while contracting, part of that is being able to talk the language of the industry properly .

This seems very amateurish for companies who regularly talk professionally to win said contracts , whether the best product or not.

My guess is C-suite, crisis consultants and lawyers are involved heavily so the actual engineering folks have little voice now in any communication and we get stuff like this.

dev-jayson1y ago

Yeah, I think I'm getting more detailed analysis on Social Media from strangers, which I know I should take with a grain of salt. But I guess I'm expecting a lot more than "a filed caused this" from the company that caused this havoc.

hatsunearu1y ago· 6 in thread

I'm not a big expert but honestly this read like a bunch of garbage.

> Although Channel Files end with the SYS extension, they are not kernel drivers.

OK, but I'm pretty sure usermode software can't cause a BSOD. Clearly something running in kernel mode ate shit and that brought the system down. Just because a channel file not in kernel mode ate shit doesn't mean your kernel mode software isn't culpable. This just seems like a sleezy dodge.

gjm111y ago

It doesn't read to me as trying to dodge anything. They aren't saying "they're not kernel drivers, so everything is OK", they're saying "seeing the .sys on the filenames, you might think they're kernel drivers, but as it happens they're something else".

(Maybe there's some subtext that I'm missing, but I don't see how saying "these aren't kernel drivers" makes them look any better, and I do see why they might say it to be informative, so it looks like to me like they're doing the latter.)

hatsunearu1y ago

> It doesn't read to me as trying to dodge anything.

It absolutely reads like this. They are getting blasted online for shipping kernel mode driver updates without proper QA and release engineering. Which just from face value just seems like some insano style engineering. They are saying "it's not actually a kernel mode value" to deflect blame.

I mean, I really don't understand why they would make this statement otherwise. If they are innocently just trying to say "this is just a channel file", there are other ways to say this, and it really isn't relevant enough to underline and emphasize.

2 more replies

SoftTalker1y ago

The kernel driver reads the channel files. It choked on this one, and crashed.

SAI_Peregrinus1y ago

Which implies that any malware capable of replacing these channel files can crash their kernel driver. I wonder if there's a non-crashing way to exploit this & get kernel-space code execution.

4 more replies

teeray1y ago

The kernel driver is an interpreter that executed an HCF instruction from the channel file.

hatsunearu1y ago

That's what I thought. So saying "it's not a kernel mode driver" is technically true, but I don't need to explain why it's a bunch of nonsense to try to damage control their incompetence.

1 more reply

xyst1y ago· 4 in thread

“Technical” detail report reads more like a lawyer generated report. This company is awful.

If I ever get a sales pitch from these shit brains, they will get immediately shut down.

Also fuck MS and their awful operating system that then spawned this god awful product/company known as “CrowdStike Falcon”

robjan1y ago

You are probably not the target market of this product then. The real product CrowdStrike Falcon sells is regulatory compliance and it's a defacto requirement in many regulated industries including banking.

By the way, Falcon can be and is deployed to Linux and MacOS hosts in these organisations too it's just that this particular incident only affected Windows.

hello_moto1y ago

2 things:

1. critical infrastructure around the globe seemed to depend on CrowdStrike

2. "If I ever get a sales pitch from..." suggested you are in an environment that is far from critical infrastructure.

userbinator1y ago

If Windows wasn't as popular, then this might've happened to Linux to macOS instead. Blame CrowdStrike's incompetence, not MS.

acdha1y ago

It couldn't happen on macOS: Apple stopped letting third-parties run code in the kernel after years of failures like this.

It also wouldn't happen on Linux: they use eBPF there which was designed by grownups and validates its inputs.

2 more replies

kyriakos1y ago· 3 in thread

Why is everyone blaming Microsoft? Is this something of an oversight in their side too? Can someone explain?

cyrnel1y ago

I'm no kernel expert, but people are saying Microsoft deserves some blame for not exposing necessary functionality to user space, requiring the use of a very-unsafe kernel driver.

Linux provides eBPF and macOS provides system extensions.

I'll also add that Windows itself heavily prioritizes backwards-compatibility over security, which leads companies to seek out third-party solutions for stopping malware instead of design-based mitigations being built into Windows.

mardifoufs1y ago

I don't agree. I'm glad Microsoft doesn't provide the functionality to do what crowdstrike does to user space. Crowdstrike acts in a similar way to deeply seated malware, except that it is usually installed voluntarily. But the behavior and capabilities that it has are basically what any malware would dream of, and exposing them to user space would imo create a mess (especially on windows). If anything, this is good as it will make people even more weary of kernel mode software.

And I'm not sure epbf actually allows you to do a lot of the stuff crowdstrike-like software does. I know they use it on Linux though so maybe eBPF has evolved a lot since I last looked at it.

1 more reply

sgammon1y ago

For letting a failure of this magnitude be possible, I suspect

patrickthebold1y ago· 2 in thread

>The configuration update triggered a logic error that resulted in an operating system crash.

> We understand how this issue occurred and we are doing a thorough root cause analysis to determine how this logic flaw occurred.

There's always going to be flaws in the logic of the code, the trick is to not have single errors be so catastrophic.

chris_nielsen1y ago

Yeah “how this logic flaw occurred” is the wrong question.

How a common bug was rolled out globally with no controls, testing, or rollback strategy is the right question

YZF1y ago

They're all good questions. The thing that reads the config should have been fuzz tested with something like AFL. Likely should have a lot more tests. Maybe shouldn't run in a device driver. There's almost no doubt there are engineering process and culture issues here.

And then absolutely the release process.

Rollback is hard I guess once your OS can't boot.

1 more reply

jchiu11061y ago· 2 in thread

Where are the technical details?

augustk1y ago

This is proprietary (closed-source) software as far as I understand.

Zamiel_Snawley1y ago

Yeah, this PR statement is pretty much devoid of information other than it is not a cyber attack.

dang1y ago

Related ongoing thread:

CrowdStrike Update: Windows Bluescreen and Boot Loops - https://news.ycombinator.com/item?id=41002195 - July 2024 (3590 comments)

rdtsc1y ago

> The update that occurred at 04:09 UTC was designed to target newly observed, malicious named pipes being used by common C2 frameworks in cyberattacks

The obvious joke here is CS runs the malicious C2 framework. So the system worked as designed: it prevented further execution and quarantined the affected machines.

But given they say that’s just a configuration file (then why the hell is it suffixed with .sys?), it’s actually plausible. A smart attacker could disguise themselves and use the same facilities as the CS. CS will try to block them and blocks itself in the process?

nonfamous1y ago

>>> Systems that are not currently impacted will continue to operate as expected, continue to provide protection, and have no risk of experiencing this event in the future.

Given that this incident has now happened twice in the space of months (first on Linux, then on Windows), and that as stated in this very post the root cause analysis is not yet complete, I find that statement of “NO RISK” very hard to believe.

ungreased06751y ago

This seems very unsatisfying. Not sure if I was expecting too much, but that’s a lot of words for very little information.

I’d like more information on how these Channel Files are created, tested, and deployed. What’s the minimum number of people that can do it? How fast can the process go?

pneumonic1y ago

> we are doing a "root cause analysis to determine how this logic flaw occurred"

That's going to find a cause: a programmer made an error. That's not the root of the problem. The root of the problem is allowing such an error to be released (especially obvious because of its widespread impact).

isthisreallife21y ago

So - a malformed configuration is capable of crashing a kernel process. Sounds very exploitable. Very

canistel1y ago

> This issue is not the result of or related to a cyberattack.

Must be corrected to "the issue is not the result of or related to a cyberattack by external agents".

geuis1y ago

Weak.

Very weak and over corporate level of ass covering. And it doesn't even come close to doing that.

They should just let the EM of the team involved provide a public detailed response that I'm sure is floating around internally. Just own the problem and address the questions rather than trying to play at politics, quite poorly.

0nate1y ago

The lower you go in system architecture, the greater the impact when defects occur. In this instance, the Crowdstrike agent is embedded within the Windows Kernel, and registered with the Kernel Filter Engine illustrated in the diagram below.

https://www.nathanhandy.blog/images/blog/OSI%20Model%20in%20...

If the initial root cause analysis is correct, Crowdstrike has pushed out a bug that could have been easily stopped had software engineering best practices been followed: Unit Testing, Code Coverage, Integration Testing, Definition of Done.

automatoney1y ago

To my biased ears it sounds like these configuration-like files are a borderline DSL that maybe isn't being treated as such. I feel like that's a common issue - people assume because you call it a config file, it's not a language, and so it doesn't get treated as actual code that gets interpreted.

bryan_w1y ago

It kinda feels like someone added a watch for c:\COM\COM like we did back in the day on AOL

timbelina1y ago

Can someone aim me at some RTFM that describes the sensor release and patching process, please? I'm lost trying to understand: When a new version 'n' of the sensor is released, we upgrade a selected batch of machines and do some tests (mostly waiting around :-)) to see that all is well. Then we upgrade the rest of the fleet by OU. However, 'cause we're scaredy cats, we leave some critical kit on n-1 for longer. And some really critical kit even on n-2. (Yeah, there's a risk in not applying patches I know but there are other outage-related risks that we balance; forget that for now) Our assumption is that n-1, n-2, etc are old, stable releases, and so when fan and shit collided yesterday, we just hopped on the console and did a policy update to revert to n-2 and assumed we'd dodged the bullet. But of course, that failed... you know what they say about assumptions :-) So in a long-winded way that leads to my three questions: Why did the 'content update' take out not just n but n-whatever sensors equally as effectively? Are the n-whatever versions not actually stable? And if the n-whatever versions are not actually stable and are being patched, what's the point of the versioning? Cheers!

j / k navigate · click thread line to collapse

131 comments

64 comments · 19 top-level

tail_exchange1y ago· 20 in thread

hatsunearu1y ago

It sounds like Channel files are just basically definition updates in normal antivirus software; it's not actually code, just some stuff on what the software should "look out for".

omoikane1y ago

Configuration files should be treated like code and follow the same gradual rollout practices. See also:

https://sre.google/workbook/canarying-releases/

Which starts with "a majority of incidents are triggered by binary or configuration pushes". The stats for config related failures is one link away at

https://sre.google/workbook/postmortem-analysis/

Where it says 31% of outages in 2010-2017 are caused by "configuration push".

Murky35151y ago

It's not understandable imo. At the very least they should have tests for the loader component that shows it can handle corrupted input. Amateur hour.

4 more replies

timbelina1y ago

I was reading these two threads:

https://x.com/perpetualmaniac/status/1814376668095754753?s=4...

https://x.com/ananayarora/status/1814269058088304760

1 more reply

Hizonner1y ago

> it's not actually code, just some stuff on what the software should "look out for"

If it controls the behavior of a computer, then it's code.

> and it's kind of understandable that channel files might seem safe to update constantly without oversight

... and that kernel mode code shouldn't have been parsing anything with any complexity to begin with. And should have been tested into oblivion, and possibly formally verified.

This is amateur-hour nonsense. Which is what you expect from most of these "Enterprise Cyber Security(TM)" vendors.

... AND the users shouldn't have just gone and shoved that kind of thing into every critical path they could think of.

notepad0x901y ago

joaomacp1y ago

> They protect these many companies, industries and even countries at such a global scale and you haven't even heard of them in the last 15 years of their operation

I certainly don't want to know (through disaster news) about the construction company that built the bridge I drive through everyday, not for another 15 years, not ever!

1 more reply

zug_zug1y ago

>> You can't take days testing gradual roll outs for this type of content, because that's how long customers are left unprotected by that content.

If you can't take days to do it then do a gradual rollout in hours. It's not a high bar.

1 more reply

Marsymars1y ago

> They protect these many companies, industries and even countries at such a global scale and you haven't even heard of them in the last 15 years of their operation until this one outage.

They certainly run their software on those many customers' systems, but but based on my experience with them, "protect" isn't a descriptor I'm willing to grant them.

We don't have the counter-factual where Crowdstrike doesn't exist, but I'm not convinced that they've been a net economic or security benefit to the world over the span of their existence.

1 more reply

tail_exchange1y ago

1 more reply

lotr51y ago

they are dumb enough to process their "channel files" in kernel, this should be only done in usermode

2 more replies

SkyPuncher1y ago

My understanding is they basically deployed a configuration file. It seems like these files might be akin to virus signatures or other frequently updated run-time configuration.

I actually don't think it's outrageous that these files are rolled out globally, simultaneously. I'm guessing they're updated frequently and _should_ be largely benign.

What stands out to me is the fact that a bad config file can crash the system. No rollback mechanism. No safety checks. No safe, failure mode. Just BSOD.

joshka1y ago

> I actually don't think it's outrageous that these files are rolled out globally, simultaneously.

Anyone competent that manages software at scale should generally hold the opposite opinion to this.

Analemma_1y ago

1 more reply

opello1y ago

The lesson of "multiple parser implementations for the same thing seems bad" and "sanity checks to prevent breaking things are hard heuristics to define" such that further changes were deferred.

userbinator1y ago

I'm more surprised at the fact that they didn't appear to have tested it on themselves first.

[1] https://news.ycombinator.com/item?id=18189139

Ekaros1y ago

Push update to machines, observe, power cycle them, observe...

I could understand error in some rarer setup, but this was so common that it should have been obvious error.

Zamiel_Snawley1y ago

Truly, how the extent the damage was so widespread is my main question at this point.

Everyone has a buggy release at some point, but impacting global customers at this level is damn near unforgivable.

Heads need to roll for this oversight.

Murky35151y ago

Because if release immediately, velocity go up

jefurii1y ago

I have a friend who is a security guard at a bank in Hollywood, CA, who told me the computers at his location started going down between 12:00 and 13:00PDT (19:00-20:00UTC).

PedroBatista1y ago· 8 in thread

Light on technical and light on details.

Putting the actual blast radius aside, this whole thing seems a bit amateurish for a "security company" that pulls the contracts they do.

hi-v-rocknroll1y ago

Yup.

- They don't do enough or the right kind of smoke tests.

- They don't do exponential-canary deployments with an ability to rollback, and instead just YOLO it.

- They don't appear to have a customer-side security / client platform team update approval gating change control process for software updates or for definitions (or whatever they use).

This is fundamentally laziness and/or incompetency.

tootie1y ago

They say they run this process multiple times a day. Must be tens of thousands of deployments. I'd guess complacency set in at some point. Just completely inured to the risks they were taking.

nine_zeros1y ago

> This is fundamentally laziness and/or incompetency.

From management - that couldn't care less about low visibility, low impact projects

mattmaroon1y ago

I have been wondering why they didn’t do canary deployments. Seems like quite an obvious thing to do.

grecy1y ago

You assume the most competent company got the contracts, which is simply not the world we live in.

The company that lobbied the hardest and paid the most in bribes got the contracts.

tootie1y ago

Vendor selection is hard and buyer's remorse is frequently hard to deal with once you've sunk cost into a migration.

manquer1y ago

This seems very amateurish for companies who regularly talk professionally to win said contracts , whether the best product or not.

My guess is C-suite, crisis consultants and lawyers are involved heavily so the actual engineering folks have little voice now in any communication and we get stuff like this.

dev-jayson1y ago

hatsunearu1y ago· 6 in thread

I'm not a big expert but honestly this read like a bunch of garbage.

> Although Channel Files end with the SYS extension, they are not kernel drivers.

gjm111y ago

hatsunearu1y ago

> It doesn't read to me as trying to dodge anything.

2 more replies

SoftTalker1y ago

The kernel driver reads the channel files. It choked on this one, and crashed.

SAI_Peregrinus1y ago

Which implies that any malware capable of replacing these channel files can crash their kernel driver. I wonder if there's a non-crashing way to exploit this & get kernel-space code execution.

4 more replies

teeray1y ago

The kernel driver is an interpreter that executed an HCF instruction from the channel file.

hatsunearu1y ago

That's what I thought. So saying "it's not a kernel mode driver" is technically true, but I don't need to explain why it's a bunch of nonsense to try to damage control their incompetence.

1 more reply

xyst1y ago· 4 in thread

“Technical” detail report reads more like a lawyer generated report. This company is awful.

If I ever get a sales pitch from these shit brains, they will get immediately shut down.

Also fuck MS and their awful operating system that then spawned this god awful product/company known as “CrowdStike Falcon”

robjan1y ago

By the way, Falcon can be and is deployed to Linux and MacOS hosts in these organisations too it's just that this particular incident only affected Windows.

hello_moto1y ago

2 things:

1. critical infrastructure around the globe seemed to depend on CrowdStrike

2. "If I ever get a sales pitch from..." suggested you are in an environment that is far from critical infrastructure.

userbinator1y ago

If Windows wasn't as popular, then this might've happened to Linux to macOS instead. Blame CrowdStrike's incompetence, not MS.

acdha1y ago

It couldn't happen on macOS: Apple stopped letting third-parties run code in the kernel after years of failures like this.

It also wouldn't happen on Linux: they use eBPF there which was designed by grownups and validates its inputs.

2 more replies

kyriakos1y ago· 3 in thread

Why is everyone blaming Microsoft? Is this something of an oversight in their side too? Can someone explain?

cyrnel1y ago

I'm no kernel expert, but people are saying Microsoft deserves some blame for not exposing necessary functionality to user space, requiring the use of a very-unsafe kernel driver.

Linux provides eBPF and macOS provides system extensions.

mardifoufs1y ago

And I'm not sure epbf actually allows you to do a lot of the stuff crowdstrike-like software does. I know they use it on Linux though so maybe eBPF has evolved a lot since I last looked at it.

1 more reply

sgammon1y ago

For letting a failure of this magnitude be possible, I suspect

patrickthebold1y ago· 2 in thread

>The configuration update triggered a logic error that resulted in an operating system crash.

> We understand how this issue occurred and we are doing a thorough root cause analysis to determine how this logic flaw occurred.

There's always going to be flaws in the logic of the code, the trick is to not have single errors be so catastrophic.

chris_nielsen1y ago

Yeah “how this logic flaw occurred” is the wrong question.

How a common bug was rolled out globally with no controls, testing, or rollback strategy is the right question

YZF1y ago

And then absolutely the release process.

Rollback is hard I guess once your OS can't boot.

1 more reply

jchiu11061y ago· 2 in thread

Where are the technical details?

augustk1y ago

This is proprietary (closed-source) software as far as I understand.

Zamiel_Snawley1y ago

Yeah, this PR statement is pretty much devoid of information other than it is not a cyber attack.

dang1y ago

Related ongoing thread:

CrowdStrike Update: Windows Bluescreen and Boot Loops - https://news.ycombinator.com/item?id=41002195 - July 2024 (3590 comments)

rdtsc1y ago

> The update that occurred at 04:09 UTC was designed to target newly observed, malicious named pipes being used by common C2 frameworks in cyberattacks

The obvious joke here is CS runs the malicious C2 framework. So the system worked as designed: it prevented further execution and quarantined the affected machines.

nonfamous1y ago

>>> Systems that are not currently impacted will continue to operate as expected, continue to provide protection, and have no risk of experiencing this event in the future.

ungreased06751y ago

This seems very unsatisfying. Not sure if I was expecting too much, but that’s a lot of words for very little information.

I’d like more information on how these Channel Files are created, tested, and deployed. What’s the minimum number of people that can do it? How fast can the process go?

pneumonic1y ago

> we are doing a "root cause analysis to determine how this logic flaw occurred"

isthisreallife21y ago

So - a malformed configuration is capable of crashing a kernel process. Sounds very exploitable. Very

canistel1y ago

> This issue is not the result of or related to a cyberattack.

Must be corrected to "the issue is not the result of or related to a cyberattack by external agents".

geuis1y ago

Weak.

Very weak and over corporate level of ass covering. And it doesn't even come close to doing that.

0nate1y ago

https://www.nathanhandy.blog/images/blog/OSI%20Model%20in%20...

automatoney1y ago

bryan_w1y ago

It kinda feels like someone added a watch for c:\COM\COM like we did back in the day on AOL

timbelina1y ago

j / k navigate · click thread line to collapse