The conclusion is pretty wrong though.
> Regardless of how complicated your program’s behavior is, if you write it as a neural network, the program remains interpretable. To know what your neural network actually does, just read the dataset.
This is not true at all? That’s like saying “all binaries are open source, just read the machine code”. Most datasets contain a reasonable amount of pollution (even the author ran into this). And if you let the model train on its own output it’s pretty easy for it to cheat: https://techcrunch.com/2018/12/31/this-clever-ai-hid-data-fr...
Moreover current ML techniques are pretty sample inefficient. So your dataset is likely to be much, much larger than the equivalent program. Right now we haven’t developed much tooling to help you map a sample bad input or behavior back to training data that might be relevant. So “just reading the data” seems like lots more work than debugging a modern program.
I do think we’ll get better at debugging this stuff in time, but I don’t think it’s currently true that ML systems are simpler or more interpretable than a corresponding imperative program.
My mental model (mostly == Karpathy's "software 2.0" one) is:
source code :: training data (and scoring function, if it's not simple MSE/MAE)
compiler :: training loop (and network arch :: compiler flags, maybe?)
compiled binary :: trained network (like the onnx file used in this demo)
So - trying to reverse-engineer a trained model would be analogous to trying to reverse-engineer a stripped binary, and reading the training dataset analogous to reading the source code.The interpretability advantage I'm claiming only applies to complicated programs: for sufficiently (and equivalently)-complicated programs, changing input->output behaviors specified explicitly in a training dataset feels much easier than trying to change the implicit input->output behavior of the corresponding zillion lines of source code.
--- Specific points:
> Most datasets have a reasonable amount of wrong input->output pairs
Definitely true! My counterpoint would be: most code contains a reasonable amount of bugs.
> it’s pretty easy for it to cheat
Hmm, I don't find that CycleGan article worrying: researchers trained a network to pass information through a constrained channel, and it succeeded (just not in the way they wanted). Quoting the article itself:
> this occurrence, far from illustrating some kind of malign intelligence inherent to AI, simply reveals a problem with computers that has existed since they were invented: they do exactly what you tell them to do.
> your dataset is likely to be much, much larger than the equivalent program.
I agree! But (again, for complicated programs) I am comfortable sweeping over a large dataset for errors (or asking a team of people to do so, or writing code to find errors, or even training tiny networks to find errors). I am not confident I could debug the equivalent zillion lines of source code just by reading it. The effect on input->output behavior from a single dataset example feels obvious, whereas the effect on input->output behavior from a single line of source code may depend on me understanding the other zillion lines...
> Right now we haven’t developed much tooling to help you map a sample bad input or behavior back to training data that might be relevant
Agreed (sadly)... though I think root-causing misbehaviors of complicated codebases is also painful.
> I don’t think it’s currently true that ML systems are simpler or more interpretable than a corresponding imperative program.
I agree for simple programs, but disagree for complicated ones; for sufficiently-complicated programs, I would much rather deal with the dataset representation.
Sure, but you have no idea how much weight any given sample contributes to the model. Maybe the behavior is included. Maybe not. The weight assigned could even vary drastically between runs with different random seeds!
I probably picked a bad paper with CycleGAN. My main point is that the mapping from data to behavior is non-trivial and there are lots of examples of this in the lit. Medical models that realize that the output variable is encoded in the text in the upper right of the image rather than attempting to analyze the image, etc.
Awesome idea and solid article!
Needs more recurrency?