It's actually fine to compare an ensemble method (using weak base learners) to a single instance strong learner. In this way, you compare the benefits of combining the weak learners with the benefits of using a single classifier. I see where you're going with that, but comparing ensemble methods with a single classifier is often a useful measurement.
That's true when experiments are design to show gain in performance due to some aggregation technique. The mentioned article achieved that only for DT and the body of the article doesn't seem to focus on the effects of ensemble methods.