That doesn't really diminish the results in my book. If you're trying to publish something it's basically assumed that you're going to try out several methods and show the one with better performance even if the performance difference is not statistically different.
As per the number of features and data points: In this field you generally don't have a ton of subjects to sample from and the high dimensional features are a natural result of the array based recordings. It should be possible to perform dimensional reduction on the data, however the ML methods are already implicitly doing that step so it's not necessarily that important.
My normal gripe is when the tested subjects have some data in the training fold and some data in the testing fold (even if the data points are separate). In those cases then the ML method can fit the statistics of a particular subject rather than the true target class (e.g. target movement of a cursor in a BCI). In this paper they explicitly are testing on a subject which was never trained. So, even though the data + particular supervised layer is going to budge the results around some, it should not be a night and day difference from what's expected in reality.