Copying data is sometimes faster than allocating it (and the TCMalloc-inspired Go allocator is quite quick.) Many small, allocations are inefficient in any language... There are other interesting points.
And anyway, a bump allocator with nursery will spend longer time vetting the nursery and copying long-lived objects. In Go this would not pay back, as the nursery is mostly staying on the stack without special tricks to please the escape analysis.
If you argued that Rust lifetimes system makes it easier to reason like the escape analysis would, I would agree, but you preferred to drop a tweet-size rant...
You're denying that the generational hypothesis holds for Go heap objects. I see no reason why this would be the case. In fact, Go's situation is very close to that of .NET, where the generational hypothesis certainly holds and therefore .NET has a generational garbage collector. This very article is evidence in favor of the generational hypothesis, since much of the optimizations boil down to reducing short-lived heap allocations.
Reducing allocations is about reducing the time needed to perform garbage collection.
That time is zero for stack based allocation and non-zero for any heap based allocation. Stack based allocation or memory pools etc. reduce the pressure on the GC. Having more GC runs means not only time spent in the GC, with generational GCs it also can mean premature aging of objects - while collecting the nursery is very fast, collecting older generations has a larger cost.
So, with any GC and language, the less heap allocations, the faster your code will run. (Unless of course, your program code gets much worse by trying to avoid heap allocations).
But yes, there is some benefit to reducing allocation. The biggest benefit to escape analysis doesn't really have anything to do with allocation at all. Rather it's that promoting objects to the stack enables SROA, which unlocks a lot more optimizations.
This article wasn't about that, though. These were short-lived benchmarks that are dominated by throughput. In those types of benchmarks, allocation speed matters much more than mark and sweep time. So, the best way to optimize for these sorts of workloads is to add generational garbage collection with bump allocation in the nursery.
Good question! Because this is on a hot code path, we were allocating new values faster than the garbage collector was able to remove them. Over extended periods of time, the server would trend toward running out of memory.