That's too semantic IMHO. The failure mode was "enforced invariant stopped being true". If they'd written explicit code to fail the request when that happened, the end result would have been exactly the same.
If the `.unwrap()` was replaced with `.expect("Feature config is too large!")` it would certainly make the outage shorter.
It wouldn't, not meaningfully. The outage was caused by change in how they processed the queries. They had no way to observe the changes, nor canaries to see that change is killing them. Plus, they would still need to manually feed and restart services that ingested bad configs.
`expect` would shave a few minutes; you would still spend hours figuring out and fixing it.
Granted, using expect is better, but it's not a silver bullet.
1:
?
2: map_err, or, or_else, etc.
3: match ... {
Ok(..) => {},
Err(..) => {},
}
4: if let ... {
}
Then it would have been idiomatic Rust code and wouldn't have failed at all.The function signature returned a `Result<(), (ErrorFlags, i32)>`
Seems like it should have returned an Err((ErrorFlags, i32)) here. Case 2 or 3 above would have done nicely.
Removing unwrap() from Rust would have forced the proper handling of the function call and would have prevented this.
Unwrap() is Rust's original sin.
> That's too semantic IMHO. The failure mode was "enforced invariant stopped being true". If they'd written explicit code to fail the request when that happened, the end result would have been exactly the same.
Problem is, the enclosing function (`fetch_features`) returns a `Result`, so the `unwrap` on line #82 only serves as a shortcut a developer took due to assuming `features.append_with_names` would never fail. Instead, the routine likely should have worked within `Result`.
But it's a fatal error. It doesn't matter whether it's implicit or explicit, the result is the same.
Maybe you're saying "it's better to be explicit", as a broad generalization I don't disagree with that.
But that has nothing to do with the actual bug here, which was that the invariant failed. How they choose to implement checking and failing the invariant in the semantics of the chosen language is irrelevant.
> But it's a fatal error. It doesn't matter whether it's implicit or explicit, the result is the same.
I agree it is an error, but disagree that it should be a fatal error at that location. The reason being is the method defining the offending `unwrap` construct produces a `Result`, which is fully capable of representing any error `features.append_with_names` could produce.
> But that has nothing to do with the actual bug here, which was that the invariant failed.
The bug is by invoking `unwrap` the process crashed. To the degree that Cloudfare had a massive outage.
Had the logic been such that a `Result` representing this error condition activated an alternate workflow to handle the error (perhaps by logging it, emitting a notification event alerting SRE's, transitioning into a failure mode, or all of these options), then a global outage might have been averted.
Which makes:
> How they choose to implement checking and failing the invariant in the semantics of the chosen language is irrelevant.
Very relevant indeed.