> However, a GC is a lot slower than manual memory management, which contrasts with the fact that most compiler activities are actually pretty low in overhead (now - it didn't used to be this way). Really, the only cost overhead left is the abstraction mismatch, and that is not too bad, when you compare to how bad humans are at writing assembly.
There is a triangle of GC performance; througput, latency (i.e. pause length), and memory overhead. Manual memory management will often be slower (in the throughput sense) than a throughput-tuned GC because:
1. Manual memory management typically precludes moving live data
2. Manual memory management often frees data as soon as it is dead
GC will often have faster allocations than manual memory management because #1 makes it possible to just use a pointer-increment for allocation. GC will often have faster freeing of data because of #2; in particular using a nursery with Cheney's algorithm makes it O(1) to free an arbitrary amount of data.
Where a throughput optimized GC falls down is in that any code that allocates may have an unpredictable amount of delay.
Also note that for video games, both typical GC and malloc/free are often too slow for per-frame data, so arena allocators are used, which sidestep #2, and allow a pointer-increment allocation without needing #1. This is specifically because there are a lot of objects with exactly the same bounds on their lifetime. Special-purpose algorithms will almost always trump general-purpose algorithms when run on the workload they are optimized for.