Not just Stockfish on modern hardware can evaluate fewer nodes than Deep Blue and still beat it left and right; in 1995 Fritz on a Pentium was able to beat Deep Thought II at the world chess championship. Deep Blue and its ancestors, with their custom hardware, were perhaps the "most brute force" of all chess engines.
Number of nodes searched is not the key metric for gauging how “smart” the algorithm is. You have less nodes searched but you only got there by having way more upfront processing.
We need some baseline to call it "brute force".
“I mean, it's cool that computers are getting even better at chess and all“
> direct comparison only makes sense with equivalent performance level
This makes no sense to me. 50% increase in performance can be compared to 50% increase in processing power to evaluate level of brute force-ness.
Computational complexity theory taught us that fundamental difficulty of solving specific types of problems does not always linearly scale with the size of the problems. I guess the same logic applies to the quality of the output?