I tried implementing it, and the samples generated by the Teacher seem to suffer from mode collapse (as if the generator is ignoring the random vector z but not the label condition). Do you recall having that issue at some point?
I have to say I'm using a simpler generator than the one in the paper, and I'm not changing the learner architechture at each batch, only its weights.
Thanks!