Even unit tests which don't involve proofs can be a problem sometimes because they lock down API inputs and outputs. It's already a major problem when a developer wants to change some code and they have to spend hours updating unit tests every time. When in small teams, heavy 100% coverage unit tests slow down productivity, possibly by 5x or 10x.
I imagine that adding proofs as part of the unit test suite would further increase the productivity cost overhead by a massive factor.
In software development, you should avoid treating the code like if it's special or perfect because it's not and it will need to be changed in the near future when requirements change (and they will always keep changing). So locking it down into a specific state with proofs all around it is a bad idea for the vast, vast majority of cases.
We need more wisdom in software development and that means taking a step back, looking at the big picture and asking questions like what are the negative side effects of each new methodology that we are introducing into our projects? There are always downsides, and we're crazy to pretend that every fancy new methodology is a silver bullet.
Generally don't test against internals much, except if an area is shown to be buggy and hard to exercise externally.
Prefer (well tested) libraries for generic functions/algorithms. If you have to implement one, put tests on it as if it was external API. Consider publishing it independently to make it external. If needs change, just drop the dependency.
My unit tests are basically "comments" for my code, that show me how the code is supposed to work. When I make changes, the tests will tell me if I broke the intentions of the initial developer.
Sure, when doing big changes, just updating and maintaining the tests is a lot of work, but so would maintaining documentation and comments be. So we have less documentation and fewer comments, and more tests.
If the code itself is written properly then the individual classes and objects used throughout should be high cohesion and easy to read without needing any comments or tests.
There are always cases where unit tests are important though like the parts of the code that deal with money.
But there is no benefit in adding unit tests if you're building a standard chat system for example; integration tests are sufficient in that case.
Also getting 100% test coverage is a complete waste of time if you don't actually test the right kinds of input and output parameters.
I mostly develop backends of web apps. In my experience not every web app has some code that needs to be explained. Picking the correct names is usually enough.
That said, unit tests really do what you write: provide examples, use cases and guidance.
So there is a possibility that changing the code only forces you to change theorem statements (that is, the spec), allowing reuse of the proof script itself. Now you are doing software engineering on proof scripts, and you face the same generality/flexibility problems you would face when writing unit tests. A well-written proof script would be flexible enough, and a hacky one would lock things down too much.
We have a lot of experience building programs that help us program (e.g. compilers, IDEs, build systems) We can move forward without falling into traps (low performance, high programmer knowledge required, overly rigid, more work required than gained in productivity)
There are no silver bullets but we can continue to improve.
It depends what kind of math and what they're doing though. Systems like Coq give you much more trustable code to put in your system, and higher degrees of trust for clients. But systems like Idris actually let you write code and use the proof engine to write trustworthy code.
Most of that "math", from the perspective of a programmer, is a small subset of graph theory, formal logic, and category theory useful for proving relationships and reflexivity so that the machine can help you.
> Even unit tests which don't involve proofs can be a problem sometimes because they lock down API inputs and outputs.
This is a much bigger problem than "hand written mocks are bad" (which they are, I agree). But it's quite out of scope to complain that because API endpoints are often sloppily deployed without versioning or introspectability, that all formal methods are wrong.
What's more, and I really want to push this point so I'll emphasize it: quickly producing broken code helps almost no one. If you're an artist playing with making a synthesizer whine? Maybe okay. If you're writing software intended to face the world and do real work? We can't just keep pretending we're all smart enough to do this without mechanical aid; we're NOT smart enough to do this without mechanical aid anymore. Programs integrate too many libraries and APIs.
> I imagine that adding proofs as part of the unit test suite would further increase the productivity cost overhead by a massive factor.
With Idris, it makes it so the program can write code for you. I've been learning it over the last 2 weeks and I'm totally fascinated by how much drudgery it takes out of total programming.
> In software development, you should avoid treating the code like if it's special or perfect
This is precisely what formal methods do! You challenge every piece of code you can and say, "Prove to me that what you've written doesn't have an obvious flaw!" You rewrite until both the human and the proof engine cannot see anymore. It is not sacred at that point (no one believes this), but at least you've acknowledged that you can write errors and started to mitigate them.
> There are always downsides, and we're crazy to pretend that every fancy new methodology is a silver bullet.
Actually, the entire industry has started to demand that developers do a lot more computation on their ends to make better software. The "prove more of your program and write total code" is just one side of the "do more work on your end" school. Another is, "Your software is in fact the output of this algorithm training from a dataset on 4 GPUs." We probably need a similar step forward for APIs (where APIs are invalid & unusable without a spec that governs them).
It is a change, but most commercial software engineers are handling user data that can be leveraged in disastrous every day. As you say, we need the wisdom to realize that it's difficult to guess the use case or security impact of the code we write on a daily basis, and we need to start taking that fact seriously. These heady days of being able to deny responsibility and shove out any software we want without consequence need to end.
We're starting to see a change in the economics. There's quite a way to go -- I hope we get there intelligently and not as some kind of thrashing reaction to giant disasters.
It's not so much that they're "all wrong", but when the majority of bugs you face are not logical or algorithmic in nature but driven by incorrect specifications or problems dealing with the outside world (including but not limited to sloppy APIs), then formal methods will impose a stupidly high cost for almost no benefit.
To take the famous Mars Climate Orbiter bug as an example - had the software been formally proven then it still would have gone down in a ball of fire because the software was built upon a faulty presumption. An integration test that exercised both the data and the code in a realistic environment would have saved it though, as would better spec communication between the two teams (e.g. some form of BDD). The chief investigator, Arthur Stephenson, even explicitly called out the lack of sufficient end to end testing.
I'm sure formal methods are great for proving that, e.g. a particular sorting algorithm always works, but frankly, sorting algos are the type of thing I import and just assume works and I've yet to see a bug in the wild that invalidates that assumption, whereas a faulty interaction between two poorly specified modules is a problem I see in teams on a near daily basis.
Right now the benefits of static verifiers is rather binary. Either your program (or code unit) is sucessfully verified or it isn't. And every time you make a change to verified code you risk losing its verified status.
Imagine instead writing code that is made "safe" using run-time asserts everywhere. Unoptimized, this code might be slow. And unreliable in the sense that a run-time assert could fail, requiring some kind of recovery or abort action. But it would still be "safe".
The run-time asserts are effectively the "program specifications", and the discarding of those asserts at the optimization stage is basically equivalent to traditional static verification. The difference being that not quite fully optimized "safe" code is generally more valuable than not quite fully verified (and therefore not fully "safe") code.
> These heady days of being able to deny responsibility and shove out any software we want without consequence need to end.
If the static verification community is imploring the development community to, with some effort, change its practices to improve code safety, would that community be willing to, with some effort, move the application of their verification logic to the optimization stage where it would be more practical for a wider range of applications?
It's worth noting that Ethereum's "fuel" model as a 1:1 correspondence with how Idris's mostly-total IO monad works. The "infinite fuel" bit can just be turned off and some (substantial and tricky, I admit) changes written to keep top-level state around and bam; timeboxed resume-able IO
Would the resources in the original post be worth looking into to build secure contracts?
How would such a proof checker know the intent of the code of not having backdoor/insecurity? How do you define security in a logical sense?
I wrote up what this could look like: https://pastebin.com/raw/ZAys0mYp
- Fundamental Proof Methods in Computer Science
- Handbook of Practical Logic and Automated Reasoning
- Software Abstractions
- The Little Prover
Yes, currently I have Athena, OCaml, Alloy, ACL2 (via Dr Racket/Dracula) all up and running to work through the exercises in these books. I'm very skeptical anything will come of this, but it interests me and I can see the value.
Hopefully the speaker's next book Formal Reasoning About Programs (https://github.com/achlipala/frap) will be a nice next step, I have Certified Programming with Dependent Types but have not cracked it yet. Also keeping an eye on Verified Functional Algorithms (https://softwarefoundations.cis.upenn.edu/vfa-current/index....).
Do you mean The Little Prover?
That's where I thought we were in 1983 - ten years away.
> That's where I thought we were in 1983 - ten years away.
Obligatory xkcd:
God I cringe every time someone posts that. It's like the "Big Bang Theory" of internet comics.
It has been used by a "mainstream programmer" to build a RSS feed reader.
See Lean.
A bit of self promotion, I touched on this in a recent blog post if you're interested in this sort of stuff: https://performancejs.com/post/hde6a33/JavaScript-in-2017:-Y...
The future is here, but not evenly distributed. ;)
So, one question: Adam gives one example of a proof where the initial automation fails because it does not consider a theorem that was previously proved. He also gives an example of some complex code where the proof is too tedious to do without lots of automation. What happens when your complex code changes in such a way that you need some extra lemmas but don't know what they are?
First, if some new features are added with associated properties, then some theorems/lemmas related to these properties will have to be added and proven. I don't think that part is problematic, because it's a "local" decision. This work is focused on the change.
The possible problems come from the impact of these changes to existing results. If you need to go and make other changes all over the place it wouldn't be scalable. But then automation and coq hint database can help. In the demo example you referred to Adam adds the lemma as a rewrite hint, and then it's picked by coq automation elsewhere. That's how local changes can automatically be leveraged globally by proof automation in existing proofs, with no changes (ok, not always. But that limits the impact).
The problem, it turns out that they included a new version of OCaml which was incompatible with the older version of GTK that was being used. Result: crash due to seg fault.
We really need both proofs and pragmatics. Proofs can give us confidence in the fundamentals when used appropriately for the processor, or for consensus algorithms like Raft, or key parts of the OS. Tests and statistical analysis can give us confidence that we haven't overlooked something crucial.
Some systems obviously aren't very appropriate for formal methods (take machine learning, for instance, there can be no formal spec for what a fraud looks like). There statistics and testing will have to suffice because we have clear value on average even if we cannot formally guarantee the system.
Other systems are much more appropriate for formal methods. I should hope that a heart monitor, CPU, voting machine or aircraft flight management system will have substantial amounts of formally proved code (I know that I am dreaming with the voting machines; let's settle for getting it down on paper). I don't want to collect a statistical sampling of whether an airplane works as we iteratively work out the bugs and specs in an agile fashion.