> But how do you get from untrusted, potentially invalid data into the world of trusted, guaranteed-valid data?
The main point is that you can write provably correct validation functions. Then your program can branch on the result of that function, and in the "success" branch you can process data further, as usual, assuming (justifiably) the validity of the data. It doesn't make any difference whether some particular piece of data was validated runtime or compile time, so far as the validation does what you want it to do and the types reflect the result in sufficient detail. Verified programming is mostly not about trying to check everything compile time, but rather checking that when our program eventually runs, it does the right thing. Take the following pseudo-Idris-Agda function:
boundsCheck : Nat -> (bound : Nat) -> Maybe (n : Nat ** n < bound)
boundsCheck n b with n <? b
| yes proof = Just (n , proof)
| no disproof = Nothing
This either returns the input natural number (if the input is within bounds) along with a proof of being in bound, or returns failure. The type of the function makes it
impossible to write a function body such that the returned number is not within the supplied bounds. Any subsequent code in the program may work with the output of this function and be utterly sure that the "n" in "Just (n, p)" observes the bound, and the may also safely omit any further checks for the same bound or greater bounds. Or here's a standard Idris function:
filter : (a -> Bool) -> Vect n a -> (n : Nat ** Vect n a)
filter p [] = (_ ** [])
filter p (x :: xs) with (filter p xs)
| (_ ** xs') = if p x then (_ ** x :: xs') else (_ ** xs')
This is a plain old filter, except that it works on a length-indexed list type "Vect". We have no idea (compile time) how long the resulting filtered Vect will be. So we return a dependent pair containing some Nat and a Vect such that its length is equal to that Nat.
You might think that this is rather boring, since we could have used a list type not indexed by its length, and then just computed the length whenever we needed it runtime. Except there is a big difference. It is impossible to return a value of type
(n : Nat ** Vect n a)
from
any function, such that in the value the Nat index does not equal the length of the vector! You could slap together a plain list and a Nat in a tuple, and then say: "This a list coupled with its length. Let's try to preserve this invariant by writing code carefully". You could also trying hiding the possibly unsafe internals behind some OOP class firewall, but even that doesn't protect you from writing the internals incorrectly. Dependent types can protect you even then; they only fail to prevent you from writing the wrong types in the first place.