Edit: having read the page for the nth time, I think I finally understand your point. The code using the cache instructions had an explicit barrier already, but it would be executed on the wrong thread.
I know nothing about the arm memory model, but likely the dsb sy barrier is a stronger barrier than needed for intercore communication, and it is needed for IO serialisation, for example with an mapped PCI device.
So yes, the article is clear and likely correct, I just failed to understand it fully originally.