undefined | Better HN

0 pointsCoolGuySteve7y ago0 comments

If you care about latency, a modern 8-or-more core x86 with its L1/L2 cache segmentation and penalized-but-shared L3 cache is almost as complex. It becomes even more complex if you use the CPU topology to make inferences hyperthreading shared caches or need to deal with the shared FPU on older AMD processors.

My understanding is that the largest difference is that some of the Cell cores had different opcodes that meant you could schedule some threads on some cores but not any thread on any core.

0 comments

2 comments · 1 top-level

petermcneeley7y ago· 1 in thread

I have written quite a bit of SPE code. The primary issue is that the SPE processor could only read/write to 256kB localized memory (without doing a DMA). So literally object orientated code doesnt even work (because of VTables). The c/c++ model is not designed for this type of architecture. Yes there were also limitations like vector only registers and memory alignment but the biggest issue was the local memory.

justrobert7y ago

Yep the SPU you end up spending so much time managing memory.

No cache they are just dumb processors. I find it funny they thought they can take ps2 vu0/vu1 and make it a processor.

j / k navigate · click thread line to collapse