1) fully agreed but most HFT apps with the exception of really simple ones like market data feed handlers which can easily fit their working set into L2 anyway will be the only thing running on a host
2) mutual cache eviction by hash collisions is solvable with a number of tricks (although those methods are not easy and often wasteful). The "DDIO slice" issue used to be a problem back when Intel used ring topology for LLC. These days they are built as a mesh thus minimizing this effect.
3) CAT doesn't recognize threads or processes. COS (class of service) uses CPU cores for way-of-cache assignments
Recent micro-architectures like SKX or CLX have 11 ways of L3 and what often happens is 1-2 ways get assigned to cpu0 for non latency-critical workloads while the rest are assigned to latency-sensitive, isolated cores usually running a single user space thread each.