When a packet is lost in a TCP stream, it must be retransmitted, and until the client receives it, all further packets are stuck in limbo (received by the client, but cannot be seen). For this reason, most games use UDP instead, and are designed so some packet loss is acceptable (if I received where you are at frame N, it no longer matters if I receive where you were at frame N-1).
Minecraft uses TCP, that's why in very populated servers occasionally you can see the whole world freeze for a second.
In the browser we have UDP with WebRTC data, the problem is that not all browsers support it, it may not work in some places and for now it's more difficult to work with.
An easy compromise is to use several TCP connections, so if one packet is lost, instead of having N packets stuck for a moment, you have 1/N where N is the number of connections. And if important packets are sent through all channels, the chance that they arrive too late is much lower.
Again, I'm not OP but I did similar things and explained what probably happens with this game.
There's also another important detail. Due to delayed ACKs on Windows and battery-preserving TCP stacks on most mobile devices/laptops, some of your packets will be delayed. Even with TCP_NODELAY. Desktop OSX/Linux do not display this behavior.
The only solution I have found, and it's a rather crude one, is to blast the server every 50ms with a 1-byte packet. It is ugly but it works well.
I tried to create a game similar to this (I spent 6 months on it). Very basic Geometry. Circles. I got the concept working well locally, but I ran into problems on how to deal with the networking. I got things running in Node with SocketIO and so on but there could be 400 players, each made of upto 8 moving pieces, and each can fire several times a second.
I spent a lot of time on a book, "Real time collision detection" and generally I have fond memories of the whole project and trying different partition and slicing up space.
However, how to package and deal with this snapshot of world "over the network" or for example "how to smooth any lag relative to the client"... this middle region was largely undocumented or ill accessible to me.
I will look forward to when/if you write any thoughts.
A lot of trial and error.