I have personally shipped "untested" models in production in situations where a "secret test set" does not exist. (Train on subset of data -> evaluate on different subset of data -> train again on entire dataset).
I do not consider myself to be insane.