undefined | Better HN

0 pointsmafribe9y ago0 comments

Maybe I'm particularly easy to make happy.

Compilers, in particular optimising compilers are notoriously buggy, see John Regehr's blog. An old dream was to verify them. The great Robin Milner, who pioneered formal verification (like so much else), said in his 1972 paper Proving compiler correctness in a mechanized logic about all the proofs they left out "More than half of the complete proof has been machine checked, and we anticipte no difficulty with the reminder". Took a while before X. Leroy filled in the gaps. I though it would take a lot longer before we would get something as comprehensive as CakeML, indeed I had predicted this would only appear around 2025.

   It sucks when that's in your type system

Agreed, and that is one of the reasons why type-based approaches to program verification (beyond simplistic things like Damas-Hindley-Milner) is not the right approach. Speedy dev tools are vital. A better approach towards program verification is to go for division of labour: use lightweight tools like type-inference and automated testing in early and mid development and do full verification only when the software and specifications are really stable in an external system (= program logic with associated tools).

   making usable developer tools is much, 
   much harder than doing original research.

I don't really agree that the main remaining problems are of an UI/UX nature. The problem in program verification is that ultimately almost everything you want to automate is NP-complete or worse: typically (and simplifying a great deal) you need to show A -> B where A is something you get from the program (e.g. weakest pre-condition, or characteristic formula), and B is the specification.

In the best case, deciding formulae like A -> B is NP-complete, but typically much worse. Moreover, program verification of non-trivial programs seems to trigger the hard cases of those NP-complete (or worse) problems naturally. Add to that the large size of the involved formulae (due to large programs), you have a major theoretical problem at hand, e.g. solve SAT in n^4, or find a really fast approximation algorithm. That's unlikely to happen any time soon.

We don't even know how effectively to parallelise SAT, or to make SAT fast on GPUs. Which is embarrassing, given how much of deep learning's recent successes boil down to gigantic parallel computation at Google scale. Showing that SAT is intrinsically not parallelisable, or even just GPUable (should either be true), looks like a difficult theoretical problem .

    as a researcher, I both 
    am not rewarded for

I agree. But for the reasons outlined above, that is right: polishing UI/UX is something the commercial space can and should do.

   he community of people that care enough 
   about their software reliability is very 
   small and modestly funded.

This is really the key problem. Industry doesn't really care about program correctness that much (safe for some niche markets). The VCs of my acquaintance always tell me: "we'll fund your verification ideas when you can point to somebody who's already making money with verification". For most applications type-checking and solid testing can get you to "sufficiently correct".

You can think of program correctness like the speed of light. You can get arbitrarily close but the closer you get the more energy (cost) you have to expend. Type-checking and a good test suite already catch most of the low-handing fruit that the likes of Facebook and Twitter need to worry about . As of 2017, for all but the most safety critical programs, the cost of dealing with the remaining problems does is disproportionate in comparison with the benefits. Instagram or Whatsapp or Gmail are good enough already despite not been formally verified.

Cost/benefit will change only if the cost of formal verification is brought down, or the legal frameworks (liability laws) are changed so that software producers have to pay for faulty software (even when it's not an Airbus A350 autopilot). I know that some verification companies are lobbying governments for such legislative changes. Whether that's a good think, regulatory capture or something in-between, is an interesting question.

Another dimension of the problem is developer education. Most (>99%) of contemporary programmers lack the necessary background in logic even to think properly about program correctness. Just ask the average programmer about loop invariants and termination order. They won't be able to do this even for 3-line programs like GCD. This is not a surprise as there is no industry demand for this kind of knowledge, and will probably change with a change in demand.

0 comments

2 comments · 1 top-level

munin9y ago· 1 in thread

Thanks for the long reply! I don't have the time continue this. I mostly agree with what you say.

I do think that making verification tools easier is something that researchers could and should be thinking about. Probably not verification and logic researchers directly, but someone should be carefully thinking about it and systematically exploring how we can look at our programs and decide they do what we want them to do. I have some hope for the DeepSpec project to at least start us down that path.

I also have hope for type-level approaches where the typechecking algorithms are predictable enough to avoid the "Z3 in my type system" problem but expressive enough that you can get useful properties out of them. I think this is also a careful design space and another place that researchers lose because they don't think about usability. They just say "oh, well I'll generate these complicated SMT statements and discharge them with Z3, all it needs to work on are the programs for my paper and if it doesn't work on one of them, I'll find one it does work on and swap it out for that one." Why would you make a less expressive system if usability wasn't one of your goals?

mafribeOP9y ago

I'd be interested in your (brief if you have not time) suggestions what kind of novel interfaces you have in mind. I have spent too much time on the command line to have any creative thoughts about radically different interfaces for verification.

j / k navigate · click thread line to collapse