That talk has 3194 views at the moment. That's pretty obscure.
> Using your same link, you can turn off the optimizer and see that it follows C calling conventions--mainly so the debugger can follow each step.
If you change -O0 to -O3 you will see a lot of stack traffic, that's true. This has nothing to do with calling conventions -- there are no calls generated for the intrinsics. Rather, you just see stack traffic because GCC doesn't try to do proper register allocation at -O0. Nothing to do with intrinsics.