You would need a latency of something like 0.01ms or even less.
If I remember right such a low latency is basically impossible, so sound canceling headphones have to try to predict the sound before it happens.
There are large problems with simply the speed of sound - you don't really have time to emit the new sound, and if you move the mic farther from the speaker you don't accurately match the sound, and can make things worse.
That's why the headphones use fancy algorithms to try to guess what the sound will be, and that's why good ones cost so much, instead them being a simple analog circuit.