undefined | Better HN

0 pointsIKantRead2y ago0 comments

It's a good benchmark because it's so trivial.

Sure it's not great at differentiating between SotA techniques, but it's very useful for sanity checks like this one.

Even for SotA models, it's still useful to verify that you can get greater than 98% accuracy on MNIST, before exploring larger, more complex bench marks.

It certainly shouldn't be the only benchmark but it's a great place to start iterating on ideas.

0 comments

6 comments · 2 top-level

TimPC2y ago· 4 in thread

It's a bad benchmark because it's artificially clean. It's effectively a 2d dataset with no occlusions. So nearly everything you try on it will work, and many things you try on it won't scale to typical image problems. There are good 3d datasets with more realistic examples that are still fairly simplistic compared to the state of the art large datasets, but at least give you signal that your technique is robust to common problems in vision. MNIST is so simplistic that you encounter none of the typical problems in computer vision settings so it doesn't give you a good prediction of how good your technique is.

ghshephard2y ago

Can you imagine a model that performs very poorly on MNIST that would perform well in real-world computer visions problems? If you can't, then MNIST is a nice simple smoke test when assessing models.

tysam_and2y ago

That's what makes it a good benchmark.

It's a benchmark.

Not a real world problem.

That's why the traditional path has been MNIST -> CIFAR10 (optionally -> CIFAR100) -> ImageNet -> ????!?.

Because it gets gradually more complicated.

Your iteration time is the constraint to development progress.

Keep that down, and the bugs from your initial implementation will be significantly less impactful.

d3w4s92y ago

What's wrong with "artificially clean"? The goal of benchmarks is to compare and know whether one model is better than the other. There is never a "perfect" or "objective" benchmark. Different benchmarks may highlight advantages in certain models, which is a good thing, but there is absolutely nothing wrong with using MNIST as a dataset to give you a basic idea of how models perform.

TimPC2y ago

Artificially clean gives you too many false positives. Most researchers I know these days start on CIFAR10 which is much more real world and has far fewer false positive signals. A portion of the hype in CV is a paper showing a technique works on MNIST that then fails to go anywhere on any other dataset.

tbrownaw2y ago

So MNIST for ML models is kind of like FizzBuzz for humans doing software development.

j / k navigate · click thread line to collapse