So this reads to me as a programmer who isn't familiar with SIMD discovering one small part of why it exists and then writing an article about that as if it was a new idea outside the scope of normal SIMD usage.
It's nice that this may give exposure into some low level details to those unfamiliar, but it isn't an innovation.
Downvote me as much as you want for saying so but it is the literal reason why the registers and instructions exist.
You do understand that toys like this are meant to simulate the mind towards different ways of thinking, right? Great things are born from kennels of innovative thought.
HN has really gone downhill...
Not sure which "sourpuss comments" you mean. Perhaps a direct reply to one of those comments would be more helpful than this general one, which lumps together all fellow commenters.
I only saw comments from people to appreciate the effort, have lots of experience in similar areas, and share their knowledge about the pitfalls they see. All very helpful and polite, as far as I can see.
>The reason why we won't be considering other operating systems is because the System V ABI doesn't preserve any of the XMM registers between calls and puts the burden on the caller to save them on the stack. If you think about it, this sort of defeats the purpose of using a register buffer if we're always going to be pushing our bytes to memory in user space.
as opposed to windows? regardless of whether it's the caller/callee's job to preserve registers, the result is the same.
Nope, because SSE operations are explictly defined as setting the upper bits of the ymm/zmm registers to zero.
> as opposed to windows? regardless of whether it's the caller/callee's job to preserve registers, the result is the same.
There is a huge difference, with caller saved registers, the caller must compulsively save all registers it's using to the stack before it calls a function. With callee saved registers, if the callee doesn't use the registers, then it doesn't need to save them and a bunch of extra push/pops are saved.
It is optimal to have a mixture of caller and callee saved registers, so the compiler pick what type it uses for each function/variable.
There have been cases where autovectorization in compilers have produced good SIMD versions of code (which is still considered a hard thing to do in some cases), but was still slower during certain benchmarks that did high thread counts (ergo, absurd amounts of CPU time wasted on context switching), but beat non-vectorized in less loaded situations.
What immediately came to mind was perhaps you want to hide a secret key from entering memory. If this was done in kernel mode, you would be able to disable interrupts/task switching execute the "secret" stuff, and continue on your merry way...