It's a good call out re: tokens vs letters, but I think you might have misunderstood my point - you can't do it a token at a time unless the intermediate KV cache is stored after each token is generated.
This won't be the case in any non toy implementation, as it would be unneccessary and slow.