e.g. - you are trying to open a file for reading. The file does not exist. Is this exceptional? That depends on context, but the function that opens the file, being in an independent library, is usually designed without this context.
If it does throw an exception, some people complain that "of course it'e expected that file won't be there sometimes! that's not exceptional".
If it doesn't throw an exception, some people complain that "we tried to open a file, but didn't succeed, of course that's an exception". But if you want to avoid an exception in this case, you'll need to check for existence before opening (LBYL "look-before-you-leap"), and get into a race condition (TOCTOU "time-of-check,time-of-use"), which is really bad.
So it very often happens that you are forced by your ecosystem to use exceptions for normal control flow. Claims that you can only use it for "exceptional / unexpected" tend to be incompatible with a project in which you do not develop/control all of the library you use to your strict standard of exceptionalness.
The semantic function of exceptions is just a way for summing additional values onto the return type of a function because it has results that are not contained within the primary type. In this way, they are a more general and better typed version of NULL (which has it's places - contrary to the modern dogma, these sort of features are needed due to inherent complexity). The standard ways of attempting to avoid this are to either use sentinel values that exist in your standard return type like fd == -1 for an error (thereby making your program less typed), or to create a top-level sum type for every aggregated function return type (cluttering your program with nominal types). Multiple values make the most sense, but those are ad-hoc product types, so you're eschewing the type system in favor of informal invariants.
The syntactic function of exceptions is to avoid constantly repeating (check for error, return error), which often leads to the poor practices of ignoring errors or calling a global exit(). One goal of programming languages is to automate, so it makes sense to capture this oft-repeated pattern. But problems arise when people end up forgetting that every function can have a possible return immediately following it.
It seems some syntactic middleground is needed to signal the complete return type of a function definition, and the possibility that a given function call may quick-return. Honestly (and I hate to say it), but Java probably started down the right track with checked exceptions, but being a B&D language it ended up being waaaay too verbose. And lacking a way to aggregate types along anything but the baked-in hierarchy, people fell into using generic and uninformative 'throws Exception'. And open types make it so there's little point trying to enumerate exhaustive causes. But that doesn't mean that one can't start with the idea of non-silent but syntactically lightweight exceptions and come up with something that avoids a lot of the pitfalls.
edit: Oh. http://c2.com/cgi/wiki?BondageAndDisciplineLanguage ? Makes sense.
> There's still room for improvement on working with sum types in Haskell (e.g. nested sum types can be annoying)
What I dismissed is requiring sum types to be nominal. For instance, imagine that Haskell used the Scheme/Java representation of objects (ie everything is basically a member of one sum type, discriminated on a machine word in the header). We could then do things like:
type Foo = Bar | Baz
type Foo2 = Bar | Baz
Where Bar and Baz could be any type. Now both Foo and Foo2 are just different names for exact same thing, and in fact the names Foo/Foo2 are irrelevant when pattern matching the result of a function that's been declared to return (Bar | Baz). This philosophy is a bit different from Haskell in that it assumes that "everything is an object" (rather than the zero-overhead structs of Haskell), and it implies that every sum type defined this way can only contain one branch for each included type (without names, there's no way to differentiate them), but a merging of the semantics could definitely be hammered out.I think Rust (being not quite formed yet) could benefit from taking a stab at this, having every sum discriminator be globally unique, and every non-sum type having an associated global tag that only gets prepended when it is promoted to being an anonymous branch of a sum. The immediate use I envision is being able to create ad-hoc type hierarchies that are descriptive rather than prescriptive.
[0] http://www.scala-lang.org/api/current/index.html#scala.util.... [1] http://www.scala-lang.org/api/current/index.html#scala.util....
if (ret == -1) return -1;
after every function invocation, which is why I was calling that aspect a syntactic feature.It's this implicit return after every function that trips people up (when prematurely exiting from stateful computation). So what I was saying that making the call of an exception-throwing function (slightly) more verbose would alleviate that and pay for the complexity where it was used.
Maybe the right approach would be to enrich the type system so that it can express those invariants.
Related discussions:
- http://stackoverflow.com/questions/8805238/run-time-penalty-...
- http://stackoverflow.com/questions/299068/how-slow-are-java-...
As with much of the exceptions debate, it’s important not to over-generalise here.
For example, if you’re writing in a compiled language like C++ and your compiler uses a table-driven implementation for the exception mechanism (as most modern ones did, the last time I checked) then there isn’t necessarily any direct runtime overhead at all when no exception is thrown. In fact, the non-exceptional code path can even run a little faster than equivalent code with manual error handling via return codes, if conditional logic for propagating error codes can be omitted at all the intermediate levels between the one(s) where exceptions are thrown and the one(s) where they are caught.
On the other hand, the possibility of an exception being thrown might interfere with some optimisations. Also, the jump tables can be huge: I once saw the compiled output for a moderately large code base drop in size by 1/3 just from compiling it with exceptions disabled.
In short, there are a lot of factors at play, but anyone who parrots the line that using exceptions always slows things down has never spent much time looking at what actually happens with real compilers. And of course, this is only in one type of compiled language, which doesn’t necessarily imply anything about the performance characteristics of other languages (which vary widely).
It's actually handy to have an API that includes both situations. For example, parse and tryParse in C#. Parse will trigger an exception if it fails but tryParse will not. When used in your code, this actually documents what the programmer is expected. If an open and tryOpen operations existed, you would know whether or not the programmer expects the file to exist or not.
However, an important point is that a "valid result" maybe a status indicating what went wrong. For example, a function that reads an http url may return (OK, 200), (NOT_FOUND, 404), (REDIRECT, 301) and so on. But in addition to that, the function may also throw an exception if a dns lookup error occurred for example.
Opening files on the other hand, can fail unexpectedly for a billion different reasons. Most of which an opening function can't detect or do anything about and therefore can't return a valid result if they occur. Therefore it must throw an exception.
What's wrong with (DNS Lookup Failure, -17), as long as it is documented? why doesn't 404 NOT_FOUND merit an exception, yet a DNS lookup failure does?
(I can argue both ways, I'm just trying to point that there's no clear criterion)
> therefore can't return a valid result if they occur. Therefore it must throw an exception.
But open() in C does return a valid result on any of those billion reasons: (fd -1, errno reason). Yes, errno is returned in a global variable, but it is still a return from the open() call. Therefore, exceptions are never needed according to your logic?
Because then every function that could potentially cause a dns lookup failure needs to have that error code documented. Then ask yourself this, what's wrong with having (Out of Memory Failure, -1234) in addition to DNS Lookup Failure and HTTP Status codes?
Think of yourself as a http function. You know about the http protocol and therefore from your perspective a 404 NOT_FOUND is a valid result. However, you do not know about DNS lookups or memory allocation therefore if a problem occurs in those areas it is exceptional -> exception.
On the other hand, if you were a memory allocator function then returning out of memory instead of an address to the allocated memory would be fine. Because memory handling is your job.
> But open() in C does return a valid result on any of those billion reasons: (fd -1, errno reason).
Syntactically valid, but not semantically as -1 isn't a valid file descriptor.
I don't think anyone reasonably expects a library to know the context of the file being opened and its importance in the logic of the rest of the program.
a = some_dict['nonexistent_key']
would raise a KeyError, while a = some_dict.get('nonexistent_key', 'DEFAULT')
would return 'DEFAULT' instead.I doubt anyone would say end-of-file is unexpected, but I am not sure I would say it was exceptional.
1) something like this (it has been 20yrs)
init data structure S
open file A
loop
read line from A
process line and add to S
next
catch EOF
close A
return S
catch file-not-found
return empty S
any mistakes are my memory not my old professorThe only exception to this that I've found is that error is something you may need to deal with in the IO monad, where in Python it would still be handled via exceptions.
For example, I used to work on a system where there were various threads each controlling a different piece of hardware. But the hardware was hot-swapable, so it was possible that suddenly a thread would find itself trying to talk to hardware that was no longer there. We would throw an exception at the low level hardware access point, at catch at the (nearly) highest level of the thread. This allowed us to reclaim memory and other resources used by the thread in an orderly fashion, and return to a "ready for hardware insert" mode. This was key since the other threads were doing just fine and the overall program needed to continue.
Here's an example. Say I'm writing a function and I have a dictionary I need to access values from often. Now say that 85% of the time the key I need is in the dictionary, but 15% of the time it is not. I could do this:
if key in my_dict:
execute_operation(my_dict[key])
else:
pass
So that if the key is not in the dictionary I do nothing. But this can be expressed more clearly like this: try:
execute_operation(my_dict[key])
except KeyError:
pass
This is considered to be clearer because it expresses the fact that the key should be in the dictionary, but in some cases is not. And interestingly, performance reflects this. The first method is more efficient in cases where the key is usually not in the dictionary. The second method, on the other hand, is more efficient when the key is usually present in the dictionary.So, in summary, exceptions are used in Python to represent cases where successfully executing the code within the try block is normal, but exceptional conditions could once in awhile occur. Almost all uses of exceptions occur for conditions where the program should not crash. When the program should crash we just let the exception bubble up to the top level where we gracefully shut down the program and call an exit function.
An exception should mean "normal operation cannot continue", and should signify a bug in your code. As such, it is an exceptionally good software development tool.
data = ""
socket.settimeout(1.5)
while 1:
try:
data += socket.recv(1024)
except Exception,e:
store_data(data)
break
If this is not okay, what would you change?(And yes, I know there are edge'ish cases were I'd miss some data here)
The fix is to have a specific nonambiguous name for your exception, so that other error conditions still work properly. As examples consider the StopIteration and GeneratorExit exceptions from Python's standard library. (See http://docs.python.org/2/library/exceptions.html for a list of built-in exceptions.)
This is on the basis code like this is hard to debug, moreso than any glorious exception free master style. You aren't guaranteed to have recieved your entire transmission in your exception block because you are catching any recieve exception. You do want exception handling here, but not as control flow, you want it as damage control if you get an unexpected early termination, not when you get desired behavior.
I think his point of there being easy syntax for multiple returns is critically important to make this sort of error handling non annoying - which Go does remarkably well.
I think this factor has a lot to contribute to the fact that you get the warm fuzzy feeling after your code compiles. You feel confident that you have already handled all the error cases (that you care about) in your code.
EDIT - Obligatory nitpick accepted. ;)
After writing a fair amount of Go recently, I've come to believe that usage of explicit error returns only appears to increase code density. In reality, it exposes the complexity of correct error handling and forces you to factor your error handling logic accordingly, rather then letting you sweep it all into a few top level handlers.
What about the errors you don't care about at the time, but realize later than you should care about? Do you go back and add them? This process seems error prone to me.
From the Go FAQ "[Excpetions] also tends to encourage programmers to label too many ordinary errors, such as failing to open a file, as exceptional."
So the problem is not with exceptions, but how some programmers use them.
That said, this was, for me, the single weirdest thing to get used to when starting to program in Go, coming from a mostly Python/Java exception-style background. (I imagine it's easier if you're a C programmer).
However, once I got into the swing of it, I realized I really, really like Go's error handling, and I can't imagine going back to Python's exceptions voluntarily. Returning errors makes code much more readable, and it also reduces the risk of code suddenly failing because of an exception that got thrown somewhere that you can't even find easily.
Go also gets a more subtle point correct: errors are interfaces, not types, which means that you can use any type as an error, as long as it supports the "Error() string" method. This is irrelevant 99% of the time, but I've seen a few pieces of code which utilize this feature very effectively.
Let’s say there is a function that’s 20 lines long, and if you did a thorough analysis of possible error conditions, regardless of likelihood, you might come up with 50 or more.
We are not going to write code to address all 50 possibilities from the start. Instead, we are going de facto to wait and see what fails in the real world, and address them as we discover them, because we value our time. We make an economic distinction between a 1-in-1,000 problem, and a 1-in-1,000,000,000 problem.
Is this un-robust? Yes. Are we allowing exceptions to be a control-flow catch-all? Yes. Would a Go-like approach of simply returning error conditions reduce bugs? Probably. But it’s important to recognize programmers’ ‘revealed preference’ for exceptions.
So if we're talking about 5 error-checks in 20 lines, then yes, we absolutely should write code to address them from the start.
I mean, I can understand not dealing with errors from memory allocation failing, or even possibly failure to write bytes to disk, depending on the situation (e.g., if those fail, you've got bigger problems to worry about than your error handling -- and it's not like exceptions are probably helping you to recover anyways).
But for stuff like network communication, writing to databases, etc., you had definitely better be addressing all error possibilities from the start, because these things fail all the time.
Is the network up?
Is the connection to SQL up?
Did someone just turn off the SQL machine half way through the query?
Can we find the server?
Are there any rows?
Did the SQL compile?
Do I have rights on this table?
Have you just terminated me as a result of a deadlock?
Did you return a Null when I was expecting a value?
Did you return a float when I was expecting an int?
Did my value just overflow?
Did you just return 0 and I tried to use it in division?
Did I just try to access the session but some other idiot clear it?
Did I just try to call a method on an object that is in fact null?
And that's all possible in a three liner off the top of my head. I'm sure there's plenty more than that that are possible! I didn't even start on the file ones...
Is the network up? / Is the connection to SQL up? / Did someone just turn off the SQL machine half way through the query? / Can we find the server?
These are not errors, just normal business logic that has to be handled: Are there any rows? / Did you return a Null when I was expecting a value?
These are not runtime errors, they're just debugging during development (with possible exception of overflow, depending on context): Did the SQL compile? / Do I have rights on this table? / Did you return a float when I was expecting an int? / Did my value just overflow? / Did you just return 0 and I tried to use it in division?
And likewise, these all just have to do with the design of your program, which you either know you have to deal with or not: Have you just terminated me as a result of a deadlock? / Did I just try to access the session but some other idiot clear it? / Did I just try to call a method on an object that is in fact null?
I already said that you may not have to worry about things like memory allocation errors, depending on your needs. Most of the stuff listed above is either redundant, or has more to do with the design of your program. I stand by my point that, in most programming (say, back-end web stuff), you're handling more like 5 errors per 20 lines, not 50 per 20.And that, yes, those 5 (or however many) errors should be planned for from the start.
Using your example, we have a function that queries a database. It will be given a valid database connection, and return the result of the query.
There is no "valid database connection" logic, since that is handled elsewhere. There is no result validation logic, since that is handled upstream. This function only cares about A) querying and B) returning a value (possibly null). The only exceptions that is handles is when something specific to it's domain goes wrong - for example, unauthorized access to a table. That is an error that is above networking (the connection worked fine) but clearly not a data validation problem (no data), so we handle the exception here.
If you find yourself throwing exceptions "across problem domains", that's a good indicator that your functions are doing too much.
Resource availability (Is the network up? Is the connection to SQL up? Can we find the server? Did I just try to access the session but some other idiot clear it?)
Incomplete response (Did someone just turn off the SQL machine half way through the query? Have you just terminated me as a result of a deadlock?)
Result semantics and types (Are there any rows? Did you return a Null when I was expecting a value? Did you return a float when I was expecting an int? Did my value just overflow? Did you just return 0 and I tried to use it in division?)
Translation to intermediate language (Did the SQL compile?)
Assumptions about remote state (Do I have rights on this table? Did I just try to call a method on an object that is in fact null?)
Smart code handles these all separately. It's crazy to try to bundle them all into one function.
When used that way, they give you very useful information as to what went wrong and where, while making your program more robust and resilient to errors. We've found this to be the case time after time at https://starthq.com, which runs on Node but uses fibers via https://github.com/olegp/common-node.
Maybe in Java and C++, but in Common Lisp we have restarts that allow you to recover from an exception. I like to use the example of attempting to write to a file when the disk is full, because:
1. It is possible to recover from the exception (e.g. ask the user to delete some files)
2. It makes no sense for the I/O library to do all the things needed to recover
3. It is a maintenance headache for client code to do all the things needed to recover
With restarts things would look like this: the I/O library would set up a restart for write that would retry the operation, the client code would catch the exception, prompt the user to free some space, and when the user indicates the space is free the restart is invoked. The I/O library knows the right way to restart the operation, and client code knows whether or not that should happen, and you get code that does not just quit over a disk being full.
Exceptions are meant be thrown frequently and caught very infrequently. Catch in the few places where recovery is possible and where you can log the error. That's it.
That is a subjective view, and certainly not a universal one. In Python, for example, exceptions are routinely used for flow control purposes; see StopIteration.
Was he being serious? I'd rather not wade through all that error checking cruft to see normal flow. Exceptions (or goto) don't create buggy hard to read programs, people do.
Wishing it wasn't, or just laziness, leads to bugs. It's not cruft, it's integral. That's the whole point.
P.S. You can do this in pretty much any language. Just define pair, triplet, etc.
What I do not like about the "check return values" approach is:
1. It means that client code must understand how to handle error conditions. No disk space? Well whoever called the top-level function that invoked write needs to figure out what to do if there is any chance of recovery. It is a maintenance headache that can quickly accumulate bugs.
2. In both Java and C++ there are functions that cannot return values: constructors, and in C++ destructors. No, it is not acceptable for a program to crash just because a constructor call failed. No, it is not any better to have every class have a special variable that indicates that the constructor failed. No, having empty constructors is not the answer, and it is certainly not going to help with destructor calls (the C++ standard library actually requires some destructors to silently fail because of the issues with error reporting).
I really wish more people knew about this option. Maybe it would even get into mainstream languages.
[0] http://nedbatchelder.com/blog/200310/joel_on_exceptions.html [1] http://nedbatchelder.com/text/exceptions-vs-status.html [2] http://damienkatz.net/2004/08/crash-only-software.html [3] http://damienkatz.net/2004/09/crash-only-software-revisited....
1. Avoid action-at-a-distance and side-effects that are hard to reason about.
2. Use immutable objects and values rather than references and pointers.
3. Avoid intricate control flow with many branches.
4. Greater isolation of processes/threads (actor model).
5. Use systems and platforms with simple and strong guarantees (e.g. ACID) that are easy to reason about. Special cases and nuances to the underlying platform should be avoided.
6. Use tools with good support for static analysis (e.g. a good type system with a compiler that can understand it).
Exceptions seem to violate #1, #3, and #6.
These rules only apply to software engineering; that is, the reliability, robustness, long-term maintainability, and total project development cost (including maintenance and support).
Of course, there are other considerations, such as: performance, the time to achieve the minimum viable product, how much developers like the tools, how "hackable" it is, or utility for research purposes. These other concerns may be a good reason to violate the above rules.
While it is true that exceptions create innumerable code paths through a function, RAII makes that manageable. If you're not taking advantage of RAII in your C++ code, you might as well be writing C code.
Exist a way to have both styles, cleanly?
I have thinking in how could look a language where objects, like in UNIX pipes, have stdout & stderr, and if stderr is consumed (returning codes) the exception is handled, but if stderr is not consumed, then raise it?
Exceptions are arguably the one feature of C++ that isn't just syntax sugar over C, the one feature that makes the language fundamentally more expressive: they reduce a certain kind of code complexity from O(N) to O(1).
The proper analogy would be not GOTO, but COMEFROM, no? A catch block is basically a COMEFROM. :)