The model is trained with a fixed number of tokens, I don't remember if the models I trained have sinusoidal embeddings or learnable positional embeddings, in the latter there would be no embedding to encode the position, in the former I think it would cause problems with the sinusoidal embedding layer as the sine and cosine would wrap around.