> Also in this case I admit I failed to have a theory on why your number is so off because giving out prefill numbers and claiming it's decode isn't in my book.
Maybe it's because it is not off? It's not terribly difficult to sum up all the matmul calculcations and number of bytes one needs to load and store per each layer in self-attention. My number could be off for a bit but it is certainly not terribly off.