I guess I think of SMT as the logical successor to barrel in a way : as long as you can store more state and have independent instruction streams you get a full(er) pipeline.
You're right though that GPU schedulers are more advanced that plain round robin since they work around stalls and issue insns from multiple thread / cycle when the backend can take it.
I have no experience with KNC/KNL, but I wish I did :-)