We are now discussing what can be done to improve code correctness beyond memory and thread safety. I am excited for what is to come.
The most useful thing exceptions give you is not static compile time checking, it's the stack trace, error message, causal chain and ability to catch errors at the right level of abstraction. Rust's panics give you none of that.
Look at the error message Cloudflare's engineers were faced with:
thread fl2_worker_thread panicked: called Result::unwrap() on an Err value
That's useless, barely better than "segmentation fault". No wonder it took so long to track down what was happening.A proxy stack written in a managed language with exceptions would have given an error message like this:
com.cloudflare.proxy.botfeatures.TooManyFeaturesException: 200 > 60
at com.cloudflare.proxy.botfeatures.FeatureLoader(FeatureLoader.java:123)
at ...
and so on. It'd have been immediately apparent what went wrong. The bad configs could have been rolled back in minutes instead of hours.In the past I've been able to diagnose production problems based on stack traces so many times I was been expecting an outage like this ever since the trend away from providing exceptions in new languages in the 2010s. A decade ago I wrote a defense of the feature and I hope we can now have a proper discussion about adding exceptions back to languages that need them (primarily Go and Rust):
https://blog.plan99.net/what-s-wrong-with-exceptions-nothing...
tldr: Capturing a backtrace can be a quite expensive runtime operation, so the environment variables allow either forcibly disabling this runtime performance hit or allow selectively enabling it in some programs.
By default it is disabled in release mode.
Similarly, capturing a stack trace in a error type (within a Result for example) is perfectly possible. But this is a choice left to the programmer, because capturing a trace is not cheap.
I am not sure that watching the trendy forefront successfully reach the 1990s and discuss how unwrapping Option is potentially dangerous really warm my heart. I can’t wait for the complete meltdown when they discover effect systems in 2040.
To be more serious, this kind of incident is yet another reminder that software development remains miles away from proper engineering and even key providers like Cloudfare utterly fail at proper risk management.
Celebrating because there is now one popular language using static analysis for memory safety feels to me like being happy we now teach people to swim before a transatlantic boat crossing while we refuse to actually install life boats.
To me the situation has barely changed. The industry has been refusing to put in place strong reliability practices for decades, keeps significantly under investing in tools mitigating errors outside of a few fields where safety was already taken seriously before software was a thing and keeps hiding behind the excuse that we need to move fast and safety is too complex and costly while regulation remains extremely lenient.
I mean this Cloudfare outage probably cost millions of dollars of damage in aggregate between lost revenue and lost productivity. How much of that will they actually have to pay?
> I mean this Cloudfare outage probably cost millions of dollars of damage in aggregate between lost revenue and lost productivity. How much of that will they actually have to pay?
Probably nothing, because most paying customers of cloudflare are probably signing away their rights to sue Cloudflare for damages by being down for a while when they purchase Cloudflare's services (maybe some customers have SLAs with monetary values attached, I dunno). I honestly have a hard time suggesting that those customers are individually wrong to do so - Cloudflare isn't down that often, and whatever amount it cost any individual customer by being down today might be more than offset by the DDOS protection they're buying.
Anyway if you want Cloudflare regulated to prevent this, name the specific regulations you want to see. Should it be illegal under US law to use `unwrap` in Rust code? Should it be illegal for any single internet services company to have more than X number of customers? A lot of the internet also breaks when AWS goes down because many people like to use AWS, so maybe they should be included in this regulatory framework too.
We have collectively agreed to a world where software service providers have no incentive to be reliable as they are shielded from the consequences of their mistakes and somehow we see it as acceptable that software have a ton of issues and defects. The side effect is that research on actually lowering the cost of safety has little return on investment. It doesn't have be so.
> Anyway if you want Cloudflare regulated to prevent this, name the specific regulations you want to see.
I want software provider to be liable for the damage they cause and minimum quality regulation on par with an actual engineering discipline. I have always been astounded that nearly all software licences start with extremely broad limitation of liability provisions and people somehow feel fine with it. Try to extend that to any other product you regularly use in your life and see how that makes you fell.
How to do proper testing, formal methods and resilient design have been known for decades. I would personnaly be more than okay with let's move less fast and stop breaking things.
So do you want to make it illegal to punish GNU GPL licensed software because that license has a warranty disclaimer? Do you want to make it illegal for a company like Cloudflare to use open source licensed software with similar warranty disclaimers, or for the SLA agreements and penalties for violating them that they make with their own paying customers to be legally unenforceable? What if I just have a personal website and I break the javascript on it because I was careless, how should that be legally treated?
I'm not against research into more reliable software or using better engineering techniques that result in more reliable software. What I'm concerned about is the regulatory regime - in other words, what software it is or is not legal to write or sell for money - and how to properly incentivize software service providers to use techniques that result in more reliable software without causing a bunch of bad second order effects.
That we’re even having this discussion is a major step forward. That we’re still having this discussion is a depressing testament to how slow slowly the mainstream has adopted better ideas.
But yes, I wish I had learned more, and somehow stumbled upon all the good stuff, or be taught at university about at least what Rust achieves today.
I think it has to be noted Rust still allows performance with the safety it provides. So that's something maybe.
Zig is undergoing this meltdown. Shame it's not memory safe. You can only get so far in developing programming wisdom before Eternal September kicks in and we're back to re-learning all the lessons of history as punishment for the youthful hubris that plagues this profession.