How much do you know about modern testing, abstract interpretation, SAT/SMT solving? In any case, as of Feb 2017, a lot of this technology is not yet economical for non-safety critical mainstream programming. Peter O'Hearn's talk at the Turing Institute https://www.youtube.com/watch?v=lcVx3g3SmmY might be of interest.
Why isn't it economical yet?
There are some ways in which these tools are not economical. There is currently a big gap. On one side of the gap, you have SMT solvers, which have encoded in them decades of institutional knowledge about generating solutions to formula. An SMT solver is filled with tons of "strategies" and "tactics" which are also known has "heuristics" and "hacks" to everyone else. It applies those heuristics, and a few core algorithms, to formula to automatically come up with a solution. This means that the behavior is somewhat unpredictable, sometimes you can have a formula with 60,000 free variables solved in under half a second, sometimes you can have a formula with 10 that takes an hour.
It sucks when that's in your type system, because then your compilation speeds become variable. Additionally, it's difficult to debug why compiling something would be slow (and by slow, I mean sometimes it will "time out" because otherwise it would run forever) because you have to trace through your programming language's variables into the solvers variables. If a solver can say "no, this definitely isn't safe" most tools are smart enough to pull the reasoning for "definitively not safe" out into a path through the program that the programmer can study.
On the other end of the spectrum are tools like coq and why3. They do very little automatically and require you, the programmer, to specify in painstaking detail why your program is okay. For an example of what I mean by "painstaking" the theorem prover coq could say to you "okay, I know that x = y, and that x and y are natural numbers, but what I don't know is if y = x." You have to tell coq what axiom, from already established axioms, will show that x = y implies y = x.
Surely there's room for some compromise, right? Well, this is an active area of research. I am working on projects that try to strike a balance between these two design points, as are many others, but unlike the GP I don't think there's anything to be that excited about yet.
There's a lot of problems with existing work and applying it to the real world. Tools that reason about C programs in coq have a very mature set of libraries/theorems to reason about memory and integer arithmetic but the libraries they use to turn C code into coq data structures can't deal with C code constructs like "switch." Tools that do verification at the SMT level are frequently totally new languages, with limited/no interoperability with existing libraries, and selling those in the real world is hard.
It's unlikely that any of this will change in the near term because the community of people that care enough about their software reliability is very small and modestly funded. Additionally, making usable developer tools is an orthogonal skill from doing original research, and as a researcher, I both am not rewarded for making usable developer tools, and think that making usable developer tools is much, much harder than doing original research.
It sadly also depends a lot on the solver used and the way the problem was encoded in SMT. For a class in college I once tried to solve Fillomino puzzles using SMT. I programmed two solutions, one used a SAT encoding of Warshall's algorithm and another constructed spanning trees. One some puzzles the first solver required multiple hours whereas the second only needed a few seconds, while on other puzzles it was the complete opposite. My second encoding needed on hours for a puzzle which I could solve by hand in literally a few seconds. SAT and SMT solvers are extremely cool, but way incredibly unpredictable.
I had heard about Dafny but hadn't seen the tutorial!
> Additionally, making usable developer tools is an orthogonal skill from doing original research, and as a researcher, I both am not rewarded for making usable developer tools, and think that making usable developer tools is much, much harder than doing original research.
When you're saying they're orthogonal, are you effectively saying that researchers generally don't have 'strong programming skills' (as far as actually whacking out code). If so, how feasible would it be for someone who is not a researcher, but a good general software engineer, to work on the developer tools side of things?
Compilers, in particular optimising compilers are notoriously buggy, see John Regehr's blog. An old dream was to verify them. The great Robin Milner, who pioneered formal verification (like so much else), said in his 1972 paper Proving compiler correctness in a mechanized logic about all the proofs they left out "More than half of the complete proof has been machine checked, and we anticipte no difficulty with the reminder". Took a while before X. Leroy filled in the gaps. I though it would take a lot longer before we would get something as comprehensive as CakeML, indeed I had predicted this would only appear around 2025.
It sucks when that's in your type system
Agreed, and that is one of the reasons why type-based approaches to
program verification (beyond simplistic things like
Damas-Hindley-Milner) is not the right approach. Speedy dev tools are
vital. A better approach towards program verification is to go for
division of labour: use lightweight tools like type-inference and
automated testing in early and mid development and do full
verification only when the software and specifications are really
stable in an external system (= program logic with associated tools). making usable developer tools is much,
much harder than doing original research.
I don't really agree that the main remaining problems are of an UI/UX nature. The problem in program verification is
that ultimately almost everything you want to automate is NP-complete
or worse: typically (and simplifying a great deal) you need to show A
-> B where A is something you get from the program (e.g. weakest
pre-condition, or characteristic formula), and B is the specification.In the best case, deciding formulae like A -> B is NP-complete, but typically much worse. Moreover, program verification of non-trivial programs seems to trigger the hard cases of those NP-complete (or worse) problems naturally. Add to that the large size of the involved formulae (due to large programs), you have a major theoretical problem at hand, e.g. solve SAT in n^4, or find a really fast approximation algorithm. That's unlikely to happen any time soon.
We don't even know how effectively to parallelise SAT, or to make SAT fast on GPUs. Which is embarrassing, given how much of deep learning's recent successes boil down to gigantic parallel computation at Google scale. Showing that SAT is intrinsically not parallelisable, or even just GPUable (should either be true), looks like a difficult theoretical problem .
as a researcher, I both
am not rewarded for
I agree. But for the reasons outlined above, that is right: polishing UI/UX is something the commercial space can and should do. he community of people that care enough
about their software reliability is very
small and modestly funded.
This is really the key problem.
Industry doesn't really care about program correctness that much
(safe for some niche markets). The VCs of my acquaintance always tell me: "we'll fund your verification ideas when you can point to somebody who's already making money with verification". For most applications type-checking and solid testing can
get you to "sufficiently correct".You can think of program correctness like the speed of light. You can get arbitrarily close but the closer you get the more energy (cost) you have to expend. Type-checking and a good test suite already catch most of the low-handing fruit that the likes of Facebook and Twitter need to worry about . As of 2017, for all but the most safety critical programs, the cost of dealing with the remaining problems does is disproportionate in comparison with the benefits. Instagram or Whatsapp or Gmail are good enough already despite not been formally verified.
Cost/benefit will change only if the cost of formal verification is brought down, or the legal frameworks (liability laws) are changed so that software producers have to pay for faulty software (even when it's not an Airbus A350 autopilot). I know that some verification companies are lobbying governments for such legislative changes. Whether that's a good think, regulatory capture or something in-between, is an interesting question.
Another dimension of the problem is developer education. Most (>99%) of contemporary programmers lack the necessary background in logic even to think properly about program correctness. Just ask the average programmer about loop invariants and termination order. They won't be able to do this even for 3-line programs like GCD. This is not a surprise as there is no industry demand for this kind of knowledge, and will probably change with a change in demand.
Don't know much about it, but verum.com claims 50% reduction in development costs.