One key exception is where the data is richly and hierarchically structured. Text, speech, and visual data falls under this category. In many cases here, variations of neural networks (deep neural nets/CNN's/RNN's/etc.) provide very dramatic improvements.
This study does have a couple limitations. The datasets used are very small & form a very biased selection of real-world applications of machine learning. It doesn't consider ensembles of different model types (which I'd expect to provide a consistent but marginal improvement over the results here).
Other approaches always have a place, and the skill is in knowing when they should be deployed.
In practice the solution is often to use a combination of multiple methods. Trees, support vector machines, multilayer perceptrons, gaussian kernels, bagging and boosting. In most applications you don't have choose. Combining the results of all of them together using a weighted average will out perform any of them individually. And in most cases, the whole is greater than the sum of its parts. Each classifier fits a given data set differently and provides its own perspective on the prediction problem. The goal isn't to choose the best one, but to find an ensemble of methods that best explain the patterns and relationships in the data.
There are many cases where resource and speed limitations dictate that only one classifier can be tuned and implemented, and in those situations it's good to know which one is 'best'. But when it's possible to build an ensemble out of many different methods it's almost always the best way to go.
Isn't that called the No Free Lunch Theorem?