You got input, output and calculations.
What else do you need?
In hardware, CPU instructions are read sequentially from memory. These instructions are pretty basic... add two numbers, load a data word from a certain memory address, jump to a new address if two numbers are equal, etc. Modern processors do have some pretty fancy instructions but what I said is still basically true. So those instructions are our primitives. The only way to make abstract procedures from those primitives, from the perspective of assembly programming, is to make a sequential series of these primitive instructions starting at a known address, and then branch to that address and read those instructions in order. When you abstract many times out, as is common in functional programming, there starts to be a lot to keep track of. When you evaluate a function that is very abstract, how it looks in hardware is a whole lot of branching and returning to and from different memory addresses. Not necessarily bad, but it's starting to look pretty different in code vs. in hardware. And if you branch to uncached ("unexpected") locations, you add latency when you have to fetch instructions from RAM. You also have to keep track of any data needed at a higher level of the function, which necessitates automated memory management, including garbage collection. These things can introduce a lot of overhead in the program, especially when you have things in it like deep recursion.
tl;dr: There's no such thing as stateless assembly programming.