And what it accidentally showed, was that NCD between
individual digits in the training set is a really terrible distance metric for classification.
You can do classification with KNN, which is obvious. You can also do classification with compression, which is less obvious, and neat. This approach tries to combine them in a way which doesn't work.