A Guide to the Go Garbage Collector (opens in new tab)

(tip.golang.org)

233 pointsibraheemdev3y ago21 comments

21 comments

13 comments · 4 top-level

cube22223y ago· 5 in thread

This is a really great guide! Nice to have something official and in-depth.

I have two tips I can share based on my experience optimizing OctoSQL[0].

First, some applications might have a fairly constant live heap size at any given point in time, but do a lot of allocations (like OctoSQL, where each processed record is a new allocation, but they might be consumed by a very-slowly-growing group by). In that case the GC threshold (which is based on the last live heap size) can be low and result in very frequent garbage collection runs, even though your application is using just megabytes of memory. In that case, using debug.SetGCPercent to modify that threshold at startup to be closer to 10x the live heap size will yield enormous performance benefits, while sacrificing very little memory.

Second, even if the CPU profiler tells you the GC is consuming a lot of time, that doesn't mean it's taking it away from your app, if it's single-threaded. `go tool trace` can give you a much better overview of how computationally intensive and problematic the GC really is, even though reading it takes some getting used to.

[0]: https://github.com/cube2222/octosql

kccqzy3y ago

> Second, even if the CPU profiler tells you the GC is consuming a lot of time, that doesn't mean it's taking it away from your app

I have experienced the same issue here. Our load balancer used CPU usage as a proxy for deciding how much traffic should be assigned when performing load balancing. When the app was written in Go, we consistently found that the GC is consuming a lot of CPU time even though all other metrics like request latency were very good, even in the microseconds range. This was the case even when the app was massively parallel with lots of goroutines. But the load balancer kept sloshing traffic around unnecessarily based on its observation that GC is consuming a lot of CPU time.

cube22223y ago

That does actually sound like it could be scenario one too.

If you have a lot of small requests, with only few requests active at the same time, but many requests per second overall, with each making a few allocations, you will have a small live heap size, while quickly reaching the threshold for another GC.

This way you get a lot of GC runs. Latency isn't affected too much because Go is quite good at keeping the stop-the-world's short. You might have interleaving application/stop-the-world in a 50/50 ratio of computation time (that's something you can diagnose very easily with go tool trace btw).

Having a higher GOGC threshold might help a lot there, since it will make stop-the-world's less frequent, while keeping their duration mostly unchanged (as that scales proportionally to live heap size).

That's obviously just a guess based on the limited info I have though.

1 more reply

tdudzik3y ago

> When the app was written in Go

Did you rewrite it to something else?

1 more reply

eatonphil3y ago

I'd love to read more about your experience profiling, how your techniques work.

cube22223y ago

Thanks, I'll try to whip up an article about it in the not-too-distant future.

Though I can tell that the biggest improvement to my profiling flow was adding a `--profile` flag to OctoSQL itself. This way I can easily create CPU/memory/trace profiles of whole OctoSQL command invocations, which makes experiments and debugging on weird inputs much quicker.

hsn9153y ago· 3 in thread

I wonder if generics would allow custom allocators.

I haven't tried it yet but it seems like an Arena/bump allocator for example should be possible now.

tapirl3y ago

Generics are totally helpless for runtime things.

I would be good for the official runtime to be designed in a plugin way, so that third parties may experiment their own implementations of some aspects of the runtime.

hsn9153y ago

    func Allocate[T](arena *Arena) *T {
       var bytes = arena.Bump(sizeof(T), alignof(T))
       return (*T) bytes
    }

Any reason why this would not work?

Maybe you need to cast to unsafe.Pointer or something before returning, but in theory this _should_ work.

1 more reply

philosopher12343y ago

Why can’t they just fork the runtime to experiment?

erik_seaberg3y ago· 1 in thread

Hm, I was hoping for a roadmap that would talk about supporting generations and more tuning options.

morelisp3y ago

Would generational support improve anything given a) 99% of the nursery is probably already on the stack, and b) using generations to inform any kind of compaction / relocation still seems out of the question?

`GOMEMLIMIT` described in the document is a new tuning option.

omginternets3y ago

Has anyone tried "gc_details": true in VSCode? I've just gone through the configuration steps, but I'm not seeing anything obvious. What should I be looking for?

EDIT: found it at the top of the file.

j / k navigate · click thread line to collapse

21 comments

13 comments · 4 top-level

cube22223y ago· 5 in thread

This is a really great guide! Nice to have something official and in-depth.

I have two tips I can share based on my experience optimizing OctoSQL[0].

[0]: https://github.com/cube2222/octosql

kccqzy3y ago

> Second, even if the CPU profiler tells you the GC is consuming a lot of time, that doesn't mean it's taking it away from your app

cube22223y ago

That does actually sound like it could be scenario one too.

That's obviously just a guess based on the limited info I have though.

1 more reply

tdudzik3y ago

> When the app was written in Go

Did you rewrite it to something else?

1 more reply

eatonphil3y ago

I'd love to read more about your experience profiling, how your techniques work.

cube22223y ago

Thanks, I'll try to whip up an article about it in the not-too-distant future.

hsn9153y ago· 3 in thread

I wonder if generics would allow custom allocators.

I haven't tried it yet but it seems like an Arena/bump allocator for example should be possible now.

tapirl3y ago

Generics are totally helpless for runtime things.

I would be good for the official runtime to be designed in a plugin way, so that third parties may experiment their own implementations of some aspects of the runtime.

hsn9153y ago

    func Allocate[T](arena *Arena) *T {
       var bytes = arena.Bump(sizeof(T), alignof(T))
       return (*T) bytes
    }

Any reason why this would not work?

Maybe you need to cast to unsafe.Pointer or something before returning, but in theory this _should_ work.

1 more reply

philosopher12343y ago

Why can’t they just fork the runtime to experiment?

erik_seaberg3y ago· 1 in thread

Hm, I was hoping for a roadmap that would talk about supporting generations and more tuning options.

morelisp3y ago

`GOMEMLIMIT` described in the document is a new tuning option.

omginternets3y ago

Has anyone tried "gc_details": true in VSCode? I've just gone through the configuration steps, but I'm not seeing anything obvious. What should I be looking for?

EDIT: found it at the top of the file.

j / k navigate · click thread line to collapse