There are no errors, just unnecessary abstractions and control flow hacks. You try to open a file; either you can or you cannot, and both possibilities are equally likely and must be handled by normal control flow in your program. Forcing an artificial asymmetry in the treatment of both cases (as championed by the error handling people) adds ugly complexity to any language that tries to do so.
data = open("/var/foo")
Most programmers are going to expect 'data', the result returned by 'open', to be a type that allows reading/writing the file. The programmer expects /var/foo to exist, or they might have checked before calling 'open', but even that's not foolproof.Historically, a failure might just set 'data' to an invalid value (like 0 or null) but that ended up being a bad idea. And we needed some way to return more information about the error. So we started doing this:
error = open(data, "/var/foo")
But this mainly just complicated things. Is 'data' input or output? The function doesn't return its actual output. And it's still possible to ignore 'error', so 'data' is still potentially undefined.Then exceptions were invented so we could use proper function call styles again, and the program wouldn't go into an undefined state. Instead, the error could be handled with separate logic, or the program would halt if it was ignored. This was far from a perfect solution, though.
Then sum types entered the mainstream, so 'data' had well-defined ways of returning something other than the expected result. But that resulted in a lot of competing conventions and stylistic decisions for what to do when 'data' is an error type, that haven't quite been settled yet.
data, err := open("/var/foo")
if err != nil....
In fact, the compiler will make you deal with both values after assignment unless you explicitly ignore one of the return values.Each library/system call you do can result in a set of possible consequences. We usually don't care about them equally, though: in fact, in the file example, 99% of the time we care about whether the file was opened or not - and in the latter case, we don't need to know why. So the asymmetry is already introduced by the intent of the program - there's usually only one path of execution we want; other ones are distractions. Error handling exists to express that asymmetry of caring at tool level.
Disagreeing is alright, but here you don't really do, do you? I can translate the paragraph you have written into pseudocode:
> If that operation requires the contents of the file, then the program cannot continue unless it successfully reads the contents of the file. (...) If you cannot open the file, there isn't any more to do.
if (file opening fails):
stop doing things
else:
continue with your operations
This is just a regular "if-else" that can be done with any programming language. The behavior of your program when the file cannot be opened is part of the specification; just as its behavior when it can be opened. I agree with you on that, and I add that the desired behavior can always be implemented using regular control flow constructions. You do not need a specific language construct for "errors", as you have proven by the algorithm that you have described in your text.Of course the error handling story is still perfectible but so far it's already one of the best I know.
Two popular crates for handing these situations are thiserror [1] and anyhow [2], for handling errors by type and handling all errors, respectively.
There are additional ways, like just returning a Box wrapper around the stdlib error type [3], or just unwrapping everything. It depends on what your program needs.
[1] https://crates.io/crates/thiserror
[2] https://crates.io/crates/anyhow
[3] https://play.rust-lang.org/?version=stable&mode=debug&editio...
See https://news.ycombinator.com/item?id=23843525 for a recent long discussion about my upcoming book on the topic. (Disclosure: this is a shameless plug.)
Broadly speaking, conditions can be implemented as a library feature, so you can reintroduce such things in cases where the extra functionality is compelling (though now users won’t be familiar with it, so it’ll be much harder to justify).
Other programming languages have been tending in the direction of implementing generators and async/await, which can be used to more smoothly implement some of the key concepts of conditions. (They’re not the same by any means, but related.)
I'll take the time to read up on it properly, but from a quick glance it seems to me to be in the category of non-local transfer of control with a co-routine flavour.
It looks powerful, but I get the sense that a lot of language designers are on purpose trying to restrict the powers of error handling. So returning sym types or error codes are simpler than throwing exceptions which - again looks to me - to be simpler than allowing transfer of control to be decided at run time as in the condition system.
Again, very interesting. And thank you for making me aware of its existence.
Kind-of-but-not-exactly. There are no coroutines whatsoever; the main technical defining point is that the stack is not unwound when the error happens, but it is wound further. Some code is executed that then searches the dynamic environment for matching error handlers, which are executed sequentially; these are then capable of executing arbitrary code provided from earlier in the stack in form of invoking so-called restarts; both handlers and restarts are also capable of performing transfers of control to any point on the stack that was properly annotated as being available for such.
- errors i.e bugs made by programmer - logical "error" conditions that the program is expected to handle for example network connection failed or user input failed - unexpected error conditions that typically boild down to resource allocation errors, socket could not be allocated or memory allocation failed etc.
In my experience all of these are best handled with a different tool.
For bugs use an assert that dumps the core and leaves a clear stack trace. For conditions that the program needs to handle use error codes. And finally for the truly unexpected case use exceptions .
And even for conditions that the program is expected to handle, 99.9% of the time all it can do is notify the user and ask for guidance, which means that the error must be bubbled up from a networking or storage layer all the way to the presentation layer - a perfect task for exceptions or something like an error monad.
The only problem with exceptions or error monads is that they get tricky in the presence of resources that need to be released, and even that is well handled with patterns like RAII.
I see from your replies what you're trying to say. If an error occurs, most likely you want the entire operation to abort -- that doesn't necessarily mean the whole program depending on the program.
For example, if I have a GUI app and the "save" operation fails and I typically roll that back right to the event loop of the application and the user gets an error and they can retry the save.
For other types of applications, killing the whole process is ending the operation.
If that one obscure code path corrupted my state, I want to limit the incorrect actions that the software takes based on that state.
This "want" of mine is to be balanced with all the other things I want out of the program, and the relative weights will vary by context... but it is often the case that continuing erroneously risks more harm than simply falling over.
Related, I always find it funny when C programmers write `if (malloced == NULL) return NULL;` Either you’re going to forget that this can happen and dereference null (in which case it’s just better to abort the program immediately) or the caller will check this and then close the program. If it doesn’t, the next malloc will be null anyways, and the problem repeats. Just call abort().
I.e. I don’t care about restoring the program to a known state when handling an error (memory deallocation is just one case of processing unwind blocks; locks need releasing,my file handles returned to kernel etc). This really only makes sense when your error “handling” is merely printing a user friendly error message and exiting.
When I use setjmp/longjmp error handling I almost always want abort semantics but at the library level rather than at the OS process level. [1] Where applicable it's the simplest, most robust model I know. You have a context object that owns all your resources (memory blocks, file handles, etc) which is what lets you do simple and unified clean-up rather than fine-grained scoped clean-up in the manner of RAII or defer. You can see an example in tcc here:
https://github.com/LuaDist/tcc/blob/255ba0e8e34f999ee840407c...
https://github.com/LuaDist/tcc/blob/255ba0e8e34f999ee840407c...
[1] It goes without saying that a well-written library intended for general use is never allowed to kill the process. This presents a conundrum in writing systems-level C libraries. What do you do if something like malloc fails in a deep call stack within the library? Systems-level libraries need to support user-provided allocation functions which often work out of fixed-size buffers so failure isn't a fatal error from the application's point of view. You'd also want to use this kind of thing for non-debug assert failures for your library's internal invariants.
This style of setjmp/longjmp error handling works well for such cases since you can basically write the equivalent of xmalloc but scoped to the library boundary; you don't have to add hand-written error propagation to all your library functions just because a downstream function might have such a failure. I'm not doing this as a work-around for a lack of finally blocks, RAII or defer statements. It's fundamentally about solving the problem at a different granularity by erecting a process-like boundary around a library.
I can see some minor corner cases where it could be worthwhile but the mental overhead isn't worth it.
I've written plenty of realtime code but spending a lot of time on the code running in the interrupt handlers is mentally exhausting and error prone; I do that when I have no choice. Likewise I've written a lot of assembly code but it's been decades since I wrote a whole program that way -- I don't have enough fingers to keep track of all the labels and call paths.
E.g. just because c++ has pointers doesn't mean I use them very often. >90% of the cases can be references instead.
I have never used setjmp/longjmp myself. And I agree with you that my first instinct would be to use it in the similar manner as in many GUI programs: they have have a catch statement in the message loop that shows a dialog box of the thrown exception. You just jump to a point where you print a user friendly error message and exit.
But I still can imagine use cases where you've isolated all other side effects (locks, shared memory, open file handles) and are just dealing with a buffer that you parse. Has anyone used setjmp/longjmp for that around here?
Given your many years in the field and Cygnus background I guess you've used it a few times? Do you happen to have any horror stories related to it? :-)
Think about how it works: it copies the CPU state (basically the registers: program counter, stack pointer, etc). When you longjmp back the CPU is set back to the call state, but any side effects in memory etc are unchanged. You go back in time yet the consequences of prior execution are still lying around and need to be cleaned up. It's as if you woke up, drove to work, then longjmped yourself back home -- but your car was still at work, your laptop open etc.
Sure, if you're super careful you can make sure you handle the side effects of what happened while the code was running, but if you forget one you have a problem. Why not use the language features designed to take care of those problems for you?
This sort of works in a pool-based memory allocator.
The failures happen three ways: one is you forget something and so you have a leak. The second is that you haven't registered usage properly so have a dangling pointer. Third is by going back in time you lose access to and the value of prior and/or partial computation.
If you use this for a library, and between the setjmp and longjmp is entirely in a single invocation you can sometimes get away with it. But in a thing like a memory allocator where the user makes successive calls, unless you force the user to do extra work you can't be sure what dependencies on the memory might exist. If your library uses callbacks you can be in a world of hurt.
Trying to keep track of all those fiddly details is hard. C++ does it automatically, at the risk of potentially being more careful (e.g. deallocating two blocks individually rather than in one swoop -- oh, but that language has an allocator mechanism precisely to avoid this problem). The point is the programmer doesn't have to remember anything to make it work.
Swift does this by default! You have to annotate (via @discardableResult) those functions which should not warn.
But of course try/catch is used in Swift more often.
While I have your attention: A big thank you for Fish shell!
And related to the current subject: How does fish handle errors? A quick skim found some constants that are returned upon failure, such as this case for disown: https://github.com/fish-shell/fish-shell/blob/master/src/bui...
What trade-offs did you face when designing error handling for your shell?
Shells are rarely CPU bound, so some perf overhead is acceptable. But shells may be used to recover badly broken systems. If fork or pipe fails, most programs are OK to abort, but a shell may be the user's last hope, so has to keep going.
For example, if pipe() fails, it's probably due to fd exhaustion. If your system is in that state, the best thing to do is immediately unwind whatever is executing, and put the user back at the prompt. fish uses ad-hoc error codes (reflecting its C legacy) instead of exceptions, though it uses RAII for cleanup. Your question made me realize that fish needs a better abstraction here; at least use `nodiscard`.
The story is different for script errors [1]. If the user forgets to (say) close a quote in a config file, fish will print the line, a caret, and a backtrace to the executing script. A lot of effort has gone into providing good error messages with many special cases detected. The parser also knows how to recover and keep going; I think Fabien would approve.
1: https://github.com/fish-shell/fish-shell/blob/225470493b3cd1...
FWIW I find actual exception usage rare aside from automatic error out parameter to exception conversion by the Clang importer when bridging to Objective-C code.
* The exceptional path is slow (00:10:23). Facebook was using exceptions to signal parsing errors, which turned out to be too slow when dealing with loosely formatted input. Facebook found that using exceptions in this way increased the cost of parsing a file by 50x (00:10:42). No real surprise here, this is also a common pattern in the Java world and clearly the wrong way to do it. Exceptions are for the exceptional. * Exceptions require immediate and exclusive attention (00:11:28). To me, this is a killer argument for errors over exceptions. With exceptions, you can be in your normal control flow, or the exceptional control flow, not both. You have to deal with the exception at the point it occurs, even if that exception is truly exceptional. You cannot easily stash the first exception and do some cleanup if that may itself throw an exception.
You can stash/rethrow exceptions since c++11 with an exception pointer if you really need to.
Maybe I'm naive to how exceptions are actually implemented, but it seems to me that both a checked exception and Sum Type would incur the same overhead, a single branch to make sure things haven't exploded.
Essentially, I would argue that error signaling is important enough and common enough that it deserves attention by the compiler in the same sense that many of the other things we provide syntax for (such as traits or inherentence) are things which developers can type naive manual implementations of with basic tools (such as switch statements or dictionaries or dragging around lots of function pointers), but if you can abstract it in a way such that the semantics are available to the compiler you can come up with much better ways to handle the problem (such as vtables or polymorphic dispatch caches) for a given set of tradeoffs (such as low memory usage, low construction cost, consistent latency, etc.). If everyone is implementing the feature themselves manually in the code then you have lost any real ability to make great optimizations.
(Note that you don't necessarily have to have it be syntax to do this: you can also have a language such as Haskell--where notably these Either-style errors are usually cited as being from--where they do it in the language but abstracted everything an additional level higher, letting you define a lot of these flow control concepts in terms of a monad, so then downstream users use "do" notation to feel like custom syntax and the monad's bind operator provides a central chokepoint on what was otherwise a bunch of boilerplate. You sometimes--not always--can then do optimizations across the entire program of that shared abstraction. The way languages like Rust and Go are handling this, without support for monads, simply precludes anything other than attempts at reverse engineering semantics from the code, which is ridiculous.)
> The Erlang Approach - Let it Crash
That is only a superficial level of composition, if one can call it that at all, that doesn't account for actual composition of errors of different types. The example provided is just encapsulation and therefore orthogonal to the issue of error handling approaches. i.e. in the example, the error handling code is only centralized, not composed.
Do you mean the fact that there must be some if-statement within the API that reacts to the different errors and sets a flag used by the Err() method?
Is you opinion that "composition of errors" always requires special syntactic elements such as the match statement?
The code from the blog section:
scanner := bufio.NewScanner(input)
for scanner.Scan() {
token := scanner.Text()
// process token
}
if err := scanner.Err(); err != nil {
// process the error
}