1. It does redo the minimization phase, but the actual execution is extremely fast, so this cost is minimal. Storing test outputs gets pretty expensive when you are running millions of tests, and since there are very few failures, recomputing this is worthwhile
2. Yes and no! the article talks about this a little, but the "heirloom" system does essentially this, and the "native" filesystem variant of Trinity runs the same Trinity tester code against a real filesystem. The "no" is due to the issues with randomized testing -- since any operation that you do can affect the RNG, the exact operation that is run for a particular seed can change if you swap any part of the system. For regression tests, the operation sequence can be put into a separate, non-randomized test.
3. both
4. Testing in Nucleus is a sliding scale from "unit-test-like" to "integration-test-like" -- Trinity is mocking plenty of functionality; CanopyCheck is simultaneously testing many different components. It would probably be more accurate to say that CanopyCheck is testing a smaller subset of components with much greater fidelity, and Trinity is testing as much of the sync engine is practical.