Compilers, in particular optimising compilers are notoriously buggy, see John Regehr's blog. An old dream was to verify them. The great Robin Milner, who pioneered formal verification (like so much else), said in his 1972 paper Proving compiler correctness in a mechanized logic about all the proofs they left out "More than half of the complete proof has been machine checked, and we anticipte no difficulty with the reminder". Took a while before X. Leroy filled in the gaps. I though it would take a lot longer before we would get something as comprehensive as CakeML, indeed I had predicted this would only appear around 2025.
It sucks when that's in your type system
Agreed, and that is one of the reasons why type-based approaches to
program verification (beyond simplistic things like
Damas-Hindley-Milner) is not the right approach. Speedy dev tools are
vital. A better approach towards program verification is to go for
division of labour: use lightweight tools like type-inference and
automated testing in early and mid development and do full
verification only when the software and specifications are really
stable in an external system (= program logic with associated tools). making usable developer tools is much,
much harder than doing original research.
I don't really agree that the main remaining problems are of an UI/UX nature. The problem in program verification is
that ultimately almost everything you want to automate is NP-complete
or worse: typically (and simplifying a great deal) you need to show A
-> B where A is something you get from the program (e.g. weakest
pre-condition, or characteristic formula), and B is the specification.In the best case, deciding formulae like A -> B is NP-complete, but typically much worse. Moreover, program verification of non-trivial programs seems to trigger the hard cases of those NP-complete (or worse) problems naturally. Add to that the large size of the involved formulae (due to large programs), you have a major theoretical problem at hand, e.g. solve SAT in n^4, or find a really fast approximation algorithm. That's unlikely to happen any time soon.
We don't even know how effectively to parallelise SAT, or to make SAT fast on GPUs. Which is embarrassing, given how much of deep learning's recent successes boil down to gigantic parallel computation at Google scale. Showing that SAT is intrinsically not parallelisable, or even just GPUable (should either be true), looks like a difficult theoretical problem .
as a researcher, I both
am not rewarded for
I agree. But for the reasons outlined above, that is right: polishing UI/UX is something the commercial space can and should do. he community of people that care enough
about their software reliability is very
small and modestly funded.
This is really the key problem.
Industry doesn't really care about program correctness that much
(safe for some niche markets). The VCs of my acquaintance always tell me: "we'll fund your verification ideas when you can point to somebody who's already making money with verification". For most applications type-checking and solid testing can
get you to "sufficiently correct".You can think of program correctness like the speed of light. You can get arbitrarily close but the closer you get the more energy (cost) you have to expend. Type-checking and a good test suite already catch most of the low-handing fruit that the likes of Facebook and Twitter need to worry about . As of 2017, for all but the most safety critical programs, the cost of dealing with the remaining problems does is disproportionate in comparison with the benefits. Instagram or Whatsapp or Gmail are good enough already despite not been formally verified.
Cost/benefit will change only if the cost of formal verification is brought down, or the legal frameworks (liability laws) are changed so that software producers have to pay for faulty software (even when it's not an Airbus A350 autopilot). I know that some verification companies are lobbying governments for such legislative changes. Whether that's a good think, regulatory capture or something in-between, is an interesting question.
Another dimension of the problem is developer education. Most (>99%) of contemporary programmers lack the necessary background in logic even to think properly about program correctness. Just ask the average programmer about loop invariants and termination order. They won't be able to do this even for 3-line programs like GCD. This is not a surprise as there is no industry demand for this kind of knowledge, and will probably change with a change in demand.