Yeah, having running software forces you to prove a certain level of correctness and lets people test against it.
You can call a spec done arbitrarily and it's much harder to test that spec against any number of edge cases; and harder to visibly inspect "do these two specs get along" than "did this API call to this other place succeed?" "It doesn't compile" or "it throws an error" or "it doesn't do what it should" are all much more concrete and force you to confront "we might not understand the problem as well as we thought."