> If I'm given this clean dataset with all of the features properly set in columns and data types labeled, I could spin up Azure or Google Cloud's ML capabilities and have them run gridsearch and optimize my model.
That's the fault of the test design allowing such techniques without scrutiny, not with the Notebook format.