Think of it this way, the original Niagara (T1) was an in-order CPU. That is, instructions were executed in the order they occur in the program code. This is simple and power efficient but doesn't produce very good single thread performance, since the processor stalls if an instruction takes longer than expected. Say, a load instruction misses L1 cache and has to fetch the data from L2/L3/Lwhatever/memory. Now, one way to drive up the utilization of the CPU core is to add hardware threads. And the simplest way to do that? Well, just run an instruction from another available thread every cycle (that is, if a thread is blocked e.g. waiting for memory, skip it). So now you have a CPU that is still pretty small, simple and power efficient, but can still exploit memory level parallelism (i.e. have multiple outstanding memory ops in flight).
Now, the other approach, is that you have a CPU with out of order (OoO) execution. Meaning that the CPU contains a scheduler that handles a queue of instructions, and any instruction that has all its dependencies satisfied can be submitted for execution. And then later on a bunch of magic happens so that externally to the CPU it still looks like everything was executed in order like the program code specified. This is pretty good for getting good single thread performance, and can exploit some amount of MLP as well, e.g. if a bunch of instructions are waiting for a memory operation to complete, some other instructions can still proceed (perhaps executing a memory op themselves). So in this model the amount of MLP is limited by the inherent serial dependencies in the code, and on the length of the instruction queues that the scheduler maintains. The downside of this is that the OoO logic takes up quite a bit of chip area (making it more expensive), and also tends to be one of the more power-hungry parts of the chip. But, if you want good single-thread performance, that's the price you have to pay.. Anyway, now that you have this OoO CPU, what about adding hardware threads? Well, now that you already have all this scheduling logic, turns out it's relatively easy. Just "tag" each instruction with a thread ID, and let the scheduler sort it all out. So this is what is called Simultaneous Multi-Threading (SMT). So in a way it's a pretty different way of doing threading compared to the Niagara-style in-order processor. Also, since you already have all this OoO logic that is able to exploit some MLP within each thread, you don't need as many threads as the Niagara-style CPU to saturate the memory subsystem. So, this SMT style of threading is what you see in contemporary Intel x86 processors (they call it hyperthreading (HT)), IBM POWER, and now also AMD Zen cores.
As for benchmarks, I'm too lazy to search, but I'm sure you can find e.g. some speccpu results for Niagara.