You can't just tell the computer that you're now interested in the display data of the other display - that would incur latency.
You'll have to be receiving the information for the second display the entire time, storing it in what is commonly called a framebuffer, aka a bitmap in memory somewhere.
Most types of display have such a thing backing them anyways - one way or another - since they need to remember what they're supposed to display.
I think you're concerned that because there's "buffers" there would be extra latency? There doesn't have to be.
It's much more likely that the handshaking was implemented by an incompetent organization that doesn't care about quality.