The keyboard animation happens on the touchdown event, whereas the letter is entered into the text box on the touchup event.
Between the two, more information might emerge about the touch - for example the exact shape of the touched area, and movement during the touch, etc.
I would guess the keyboard sees a down in one spot, and an up in a slightly different spot which falls into another letter.