Using the flip model only eliminates the latency if DWM promotes your window to a hardware overlay. On Nvidia systems this is simply not supported, so the latency is always there and it's impossible to get rid of it. Maybe DWM supports overlays on Intel or AMD, I'm not sure. It would be interesting for someone to test this.
> There's still is going to be 1 frame of latency from the vsync though.
Vsync does not inherently require any extra latency. You can render as close to vsync as you like to reduce the latency an arbitrary amount. That's what VR compositors do. All you need to do is ensure you can't flip during scanout and you can't get tearing.