> You can't really schedule 100% load better
Sure you can. You can do math while another HT is waiting for memory. Sometimes you can even multiplex use of multiple ALUs or one HT can do integer and another can do floating point.
It's actually under high multithreaded load that HT shines, especially if that load is heterogenous or memory latency bound.