Generational GCs surely do not do a single bump allocate for all local variables. How could the GC possibly know where each object starts and ends if it did it as a single allocation? Instead, it treats them all as individual allocations within an allocation buffer, which means bumping for each one separately. Yes, they will then get copied out if they survive long enough, but that’s not the same thing as avoiding the 10+ instructions per allocation.
It’s entirely possible I’m wrong when it comes to Truffle, but at a minimum it seems like you would need arenas for each size class, and then you’d have to bump each arena by the number of local variables of that size class. The stack can do better than that.