Modern CPUs add complications to arguments like this. Branches stall the execution pipeline, so branch prediction was invented to keep the pipeline flowing. Return instructions are perfectly predicted, which makes them literally free. At the very least, any alternate return scheme has to pay for a full misprediction. That can be expensive.
CALL
jump-without-RET
and the calls and the rets don’t line up. This defeats the return prediction on the next return.First explanation I found on Google. Haven't read it:
As long as you don't overwrite the return address on the stack.
MOV SP, [installed_handler]
JMP [installed_handler+WORD]
but it only works if you don't need to run the defers/Drop's/destructors/etc. for stuff that's on the stack between the current frame and the handler's frame. Which you do, most of the time.Indeed. And the per frame cleanup is also language agnostic which adds overhead; it also must support both sjlj and dwarf frames[1]; it is also done in two phases: destructors are only run if an actual catch is found: an unhandled exception doesn't run destructors to preserve state in the core file. This requires a two-phase unwinding that again slows things down.
Another big bottleneck that might not be captured in OP test is that the unwinder has to take (or used to, things got better recently) a global lock to prevent races with dlclose, which greatly limit scalability of exception handling.
Still very nice improvements from OP.
[1] although I'm not sure you can mix them in the same program or it is a platform-wide decision.
If someone from another thread decided to unload a library whose code is still being executed in this thread then this thread would normally crash anyhow, and do so irrecoverably, right?
> Now imagine that calls could specify alternate return points, letting the callee decide the statement to return to:
// Dreamed-up syntax
fn f() {
g() alternate |x| {
dbg!(x); // x = 123
};
}
fn g() -> () alternate i32 {
return_alternate 123;
}
This sort of nonlocal control flow immediately calls to mind an implementation in terms of continuation passing style, where the return point is given as a function which is tail called. Nonlocal returns and multiple returns are easy to implement in this style.Does there exist a language where some function
fn foo() -> T throws U
is syntactic sugar for something more like? fn foo(ifOk: fn(T) -> !, ifExcept: fn(U) -> !) -> !And one thing that could be needed is the ability to throw within a catch and if you do that you can corrupt the TLS (ie memory safety) unless you’re careful and follow the guidelines. In other words you personally can have written 100% safe code that is not memory sound unless you follow the high level rules - this is closer to a C API than anything that would be “allowed” as a traditional rust api where the guarantee of a safe API is that no unsoundness can happen no matter how you hold it. That’s a lot of safety to sacrifice for something tried and true. Use it if you really need it but I think following the advice that error states should be rare in the first place is probably better - return an error for any failable operation and panic on unwind. Trying to catch unwind panics is a landmine approach of trying to get things to work and I know from experience having tried that approach. It doesn’t play with things like async too. And then you have to bubble them up across threads?
This approach would fail there. These aren’t unfixable design flaws thankfully. You’d need a sum type to have the underlying memory to be detachable to the heap and somehow guarantee it’s always detached safely and soundly before overwriting (eg having a counter in the TLS header that is copied to the struct being unwound to guarantee that the TLS values you think you are accessing indeed has not been overwritten or having a TLS pointer to the stack value containing the unwound value somehow be written through to detach whenever someone doesn’t call the right catch mechanism). So I think this work is super valuable and the ideas should be refined and mainlined because inefficiencies like this aren’t great but simultaneously no one should be writing error handling by catching unwraps except for very very limited situations that you can clearly articulate as necessary for the goal you are trying to achieve. Like I spawned a background thread but if the computation fails I can report the failure gracefully to the human operator of the machine in a non debug context. But in those cases you want to be a supervisor forked process that is responsible for process death only rather than compiling it all into one binary. I wish Rust made that part easier. Ie start the process in a different mode but then switch to panic so that you carry the performance gains (ie this crate should be built using optimized panic with unwind but then this other crate is with a different unwind mode and you could spawn the other crate through a guaranteed fork to fix the soundness potential and the panic information is serialized across the wire to the patent process via a private pipe and unwound that way). That would provide an easier API to indicate more clearly the delegation of responsibility you should have been catching unwind and how to structure your code operationally. However it can’t be the only way because you might have something like an http framework. And you want to “guarantee” that you deliver an HTTP response to a socket and log metrics before crashing and you want the next request to be handled immediately with minimal additional CPU work. You can’t just do it across a fork barrier because that’s an expensive thing to dispatch to a new thread in the happy path + you have a thread poll you need to keep healthy and alive to maintain in tokio - can’t fork or spawn a thread on every new inbound connection. So there are cases where you want to catch unwind which is to have consistent behavior in a framework even if the user’s code or our code has a bug (but in those cases you might probably use the panic method during debug builds to notice failures like that before your release to production where you prevent bugs from manifesting as a mechanism/gadget for attackers to DOS your service)
Like wouldn't bypassing any unwinding be faster than improving unwinding. You already seem to have control over the code as the thrown and caught exception have to be the same so might as well just write the code as a `Result<T, Exception>` in the first place no?