No, that is always the case. Attention is only about one third the ops and qk is a fraction of that. Outside of truly massive sequence lengths it doesn’t matter a whole lot, even though it’s nominally quadratic. It’s trivial to run the numbers on this - you only need to do it for one layer.