Now you’re asking me technical details from more than a decade ago. My recollection is that you could map one of the caches between cores — there were uncached-write-through instructions. By reverse engineering the cache’s hash, you could write to a specific cache-line; the uc-write would push it up into the correct line and the “other core” could snoop that line from its side with a lazy read-and-clear. The whole thing was janky-AF, but way the hell faster than sending a message around the ring. (My recollection was that the three interlocking rings could make the longest-range message take hundreds of cycles.)