Thats super helpful, thanks for the details. Makes sense now that PSNT is more of a transport/runtime format for the PS2 constraints than a quality hack.
Very cool that it supports bitnet too even if results are rough right now, feels like theres a lot of room to tune there over time.
when you do fix tok/sec, are you planning to post per-stage timings too (tokenizer, weight stream, matmul, samppling)? would be awesome to see where the biggest bottleneck is on real hw