"Shared memory" is really more of a description of the memory model that is exposed to the programmer, rather than the hardware.
Under the hood, there are caches -- sometimes memory addresses live in a cache above you because you put them there, sometimes they live in a cache above you because a neighboring core that shares your cache put them there, sometimes they live in RAM, sometimes they live in another cache on your chip and you have to ask for them through the on-chip network. The advice I have been given (as a non-HFT guy) is just to try not to mess around to much with the temporal locality, pin threads to cores, and let the hardware handle the rest.