dx is changing in response to the user making inputs.
Imagine you are showing a different players character - at first the server tells us he is moving to the right by 5 pixels per frame. So we start moving him, in the absence of any more information we keep moving him, then we get a message from the server saying 'actually several frames a go the player decided to move at 4pixels per frame' we have rewind our predictions back to the when that happened and then replay with the new movement speed.
For the current frame I just took the first packet received from the server as the baseline frame.
So if you get a packet from the server saying "the game is currently at frame 100" that's your baseline frame, start counting from there.
You're right, if you're using TCP then you don't need to worry about out of order frames.
One of the issues with TCP is that if you are going over a slow connection to some players they will get behind the gameplay. With UDP packets if they are on a slow connection the packets will just be thrown away and they should keep up with the latest game state.
If you do a search "udp vs tcp games" you'll find a whole host of discussions.
e.g. http://gafferongames.com/networking-for-game-programmers/udp...
Personally I prefer udp - but then you have to be careful about packet sizes.
This is a pretty good library that lets you mostly forget about udp or tcp - http://enet.bespin.org/