Maps and Memory Leaks in Go (opens in new tab)

(teivah.medium.com)

133 pointsminaandrawos3y ago68 comments

68 comments

49 comments · 10 top-level

mjpa863y ago· 11 in thread

what happened to a memory leak being some memory that was allocated but had no reference to it so couldn't be freed? If you can copy the map and release it and the memory usage drops, there is no leak?

fnord1233y ago

That's the most used example of a memory leak but it is not the definition of a memory leak.

If you put data into a hash map and forget the key, you leaked.

saghm3y ago

This is also why valgrind classifies the leaks it reports with stuff like "still reachable" or "possibly still in use" (I might be remembering the exact phrasing incorrectly). It would be pretty hard to programmatically determine whether the memory that's still kept around was intended to be kept around or not, which is why valgrind supports generating "suppressions" (and specifying them in subsequent runs to be ignored).

titzer3y ago

This is the use case for weak maps, which both Java and JavaScript have. In the latter case, the map is not iterable, so one cannot observe JavaScript GC (through WeakMap at least).

simiones3y ago

A memory leak has always meant "this program keeps allocating more memory as it runs, even though it's not being asked to store anything new". That is equivalent to saying that a program has a memory leak when it fails to free memory that is no longer needed, not just memory that is no longer reachable.

For example, a common example of a memory leak is adding items to a "cache" without any mechanism that evicts items from the cache in any scenario. The "cache" is thus not a cache, but a memory leak (a common implementation of this leaking scenario is that items are put in a map, but never removed from the map).

Memory leak has never, as far as I know, referred to the specific case of memory that is no longer accessible from program code to be freed. In fact, this definition doesn't even make sense from a runtime system perspective, since even in C, the memory is actually always still reachable - from malloc()'s internal data structures.

twic3y ago

Those pretty much can't happen in garbage collected languages, so the usage of the term has been widened to include things like this. I agree it's a shame.

Roedy Green coined the name "packratting" for this modern kind of memory leak: https://www.mindprod.com/jgloss/packratting.html

simiones3y ago

Memory leaks have always meant "failing to free memory that is no longer needed".

Garbage collection literature often stresses the difference between "no longer needed" and "not reachable", noting that the former is not automatically enforceable (it amounts to solving the halting problem), but the latter is only a heuristic. So, the fact that garbage collectors can't prevent all memory leaks is always stressed by the literature.

1 more reply

ufo3y ago

I've also seen the term "memory bloat".

rob743y ago

Yeah, that makes the title pretty much clickbait, because a memory leak in a memory-safe language would really be a big deal...

TheDong3y ago

> a memory leak in a memory-safe language would really be a big deal...

It is not.

Let me show you a memory leak in the memory safe language, rust:

    let vec: Vec<u8> = Vec::with_capacity(1024);
    std::mem::forget(vec);

Let me show you a memory leak in the memory safe language, go:

    _ = time.Tick(1 * time.Second)

See the docs for time.Tick in the stdlib, which documents that calling it is a memory leak: https://pkg.go.dev/time@go1.19.3#Tick

You can also, if you want to leak memory in go, set the environment variable GOGC=off, and there you go, instant memory leak.

Practically any language, memory safe or otherwise, will let you create a memory leak.

EdwardDiego3y ago

Plenty easy to leak memory in memory-safe languages. I'm assuming we're including GC-ed languages in that set.

sigg33y ago

Go is just memory safe until you have a race, or so I have heard.

3 more replies

tapirl3y ago· 8 in thread

Yes, Go maps never shrink. This is good for most use cases in practice. Because in practice, map entry deletions happen seldom. And when map entry deletions are needed, users often hope maps don't shrink, to avoid potential later unnecessary memory allocations and entry moves. For example, I only do map entry deletions in one of my projects, In the project, I clear all entries of a map and re-use the map to avoid making new allocations. The current design satisfies my need well.

This is more an optimization than a memory leak.

To avoid the kind-of memory leak, just make a new map and discard the old one.

esprehn3y ago

That's easily resolved with an ensureCapacity(size) method, or the other way around with shrinkToFit(size).

maps being magic without a real interface means exposing new API surface is difficult though.

morelisp3y ago

A few times I have wished for a way to grow maps, but generally to shrink maps I don't think there's any big advantage to a built-in rather than your own function making a new one. (It would only cover the case where you know you want to shrink but don't know if you need to shrink because you don't know how many elements you removed or overestimated by, which I think is pretty unusual.)

tapirl3y ago

Personally, I am not against the proposal to add a "shrink(aMapOrSlice)` builtin function. In fact, there are more builtin functions on my wish list, such as "concat(slices)" and "decap(aSlice)".

It is just that the Go core team have their own philosophy.

jimbokun3y ago

The article gives a common place example where this could be an issue, so I don’t know what you mean by “in practice” here.

throwaway8943453y ago

The parent said “most” so both can be true: most maps don’t need to shrink and the example in the article is a valid case where shrinking a map would be desirable. This seems obvious so I’m confused by your confusion. :)

1 more reply

slantedview3y ago

> Yes, Go maps never shrink. This is good for most use cases in practice.

Why is it that every shortcoming of Go is spun by its advocates into something that is actually a good thing?

morelisp3y ago

In what languages do maps shrink? You may be surprised.

1 more reply

philosopher12343y ago

Why is it that haters of go refuse to engage with the actual argument?

Edit: rephrasing your comment “why is it that every time I say something misleading about go people disagree with me?”

twic3y ago· 7 in thread

FWIW i am pretty sure Java's HashMap has the same behaviour - it grows the table, but never shrinks it. Even if you call .clear(), it just clears out the table, rather than throwing the table away.

I imagine there are lots of scenarios in which this is what you want, because after emptying the map, you're going to re-fill it, and it saves reallocating the table. But it would be frustrating in a scenario when that isn't what you want.

If a map has this behaviour, i would say that the most important thing is that it should be clearly documented (Java's isn't). The second most important thing is that there should be a way to get round it - either a .clearHarder() method which throws away the table, or a .compact() method which downsizes it while retaining the content.

Gwypaas3y ago

Rust uses "shrink_to_fit()". Personally never had to use it, but you always end up scrolling by the backing allocation management for all the standard collections when looking through the docs.

> Shrinks the capacity of the map as much as possible. It will drop down as much as possible while maintaining the internal rules and possibly leaving some space in accordance with the resize policy.

https://doc.rust-lang.org/std/collections/struct.HashMap.htm...

And the docs makes it clear that "clear()" only removes the elements. Giving a hint of where to go next if you stumble upon the issue in the OP.

> "Clears the map, removing all key-value pairs. Keeps the allocated memory for reuse."

https://doc.rust-lang.org/std/collections/struct.HashMap.htm...

eptcyka3y ago

Wouldn't reallocating the map be very cheap when running in a JVM? Yes, it isn't something to do in the hot path, but surely it should be faster than a direct malloc(), right? I'm being very naive and ready to be proven wrong.

_old_dude_3y ago

The main difference with Go is that the table of a HashMap in Java is a table of pointers, not a table of structs, so the overhead to not shrink the table is less an issue.

EdwardDiego3y ago

You're right that Java's Hashmap only ever resizes upwards - the most common use case. However, if there's a need to remove that memory after clearing() then the typical use case is to do something that allows the GC to sort it.

E.g.,

    var map = new HashMap<Bla, Bla>();
    // Many things added to map
    // Many but not all things removed from map
    // to shrink it to size of remaining items
    map = new HashMap<>(map);

Yeah, it's not nice, but the people who wrote the JDK are far smarter than me, so I figure they optimised for the 99% use-case, not the 1%.

mariusor3y ago

I think that's exactly what TFA is proposing by recreating the map

    m := make(map[Bla]Bla)
    // add to map
    // mark map as ready for GC
    m = make(map[Bla]Bla)

foldr3y ago

>The second most important thing is that there should be a way to get round it - either a .clearHarder() method which throws away the table, or a .compact() method which downsizes it while retaining the content.

Wouldn't just creating a new map be a pretty good way of doing it? If the reduction in size is significant, then the cost of the new allocation will be small compared to the cost of all the allocations you've done to build the original map.

creata3y ago

> If a map has this behaviour, i would say that the most important thing is that it should be clearly documented

Iirc, most hash table implementations don't automatically shrink. More documentation is always nice, but isn't not automatically shrinking kind of the expected "default" behavior?

eschneider3y ago· 3 in thread

This all looks like reasonable implementation behavior which will give optimal runtime performance in most "common" cases. If one really wants or needs a map that'll free memory as it shrinks (and sure, for some folks, that'd be super useful) one's always free to just implement your own.

eptcyka3y ago

Go doesn't really allow you to create a hashmap with the same generic possibilities as the built in one.

ironick093y ago

Why is it? Or rather what limitations are there to enforced by go compiler that won’t allow someone to implement their own hash map that can free memory with the same generic possibilities, even with the recent introduction of generics?

1 more reply

jimbokun3y ago

Is that still true with the support Go added for generics?

1 more reply

masklinn3y ago· 3 in thread

> Also, in this case, the amount of required memory is less significant during peak times due to some optimizations to reduce the memory consumed.

Pretty sure it’s the same: because of collisions, maps always have a certain fraction of empty slots (much more so than vectors). I’ve not heard of Go maps being very high load factors, so they probably resize around 50% or so full. I didn’t check any of the numbers, I’ll assume 50% load factor, doubling on resize, and entries of {hash, key, value} with a word-sized (8 bytes) hash, but some and likely all of those are likely off.

Anyway with a load factor of 50% and doubling at any point you’d have 2n to 4n slots in the map total (depending whether you’re nearing resize or just went through one). And thus if you make entries smaller, you make all those overhead slots smaller as well.

hash + int + blob would be 8+8+128 144 bytes, a million of those is 144MB, x2 is 288 and x4 is 576, so the observed bounds are pretty close: IIRC Go maps use some form of separate chaining (though not “trivial” linked lists), I assume those do get reclaimed as entries are removed even if the map itself does not get shrunk, which would explain why memory consumption goes down as elements are removed from the initial map.

With a pointer each “entry” is 24 bytes instead, for a map overhead of 48~96MB (so clearly Go maps use either a higher load factor or a more efficient storage than the trivial scheme being assumed here, possibly both).

The actual data accounts for 128M plus some, there’s a 144M difference between the full and the emptied pointer-maps, which would account for some chain links being removed, and maybe some object headers / allocator overhead for the on-heap arrays.

LukeShu3y ago

> I've not heard of Go maps being very high load factors, so they probably resize around 50% or so full.

Go uses a load factor of 6.5 (expressed as average load of a bucket; where buckets have a maximum load of 8; i.e. 81% full) (ref: https://github.com/golang/go/blob/go1.19.3/src/runtime/map.g... )

> I’ll assume … doubling on resize

Indeed (ref: https://github.com/golang/go/blob/go1.19.3/src/runtime/map.g...)

morelisp3y ago

50% load factor would be incredibly low for any hash table. Go uses 6.5/8 per bucket right now I believe, or 81% if you want to be crass about it.

masklinn3y ago

> 50% load factor would be incredibly low for any hash table.

Might be that I’m more used to open addressing maps? IIRC circa 50 is a pretty good starting point for open addressing unless the collision resolution is specifically designed for that (e.g. swisstable). Not to say it can’t get higher, but for some the that’s where the performance starts going down (e.g. non-bucketed cuckoo hashing).

1 more reply

SeanLuke3y ago· 3 in thread

< 293 MB <-- After we remove 1 million elements

Let's be kind and assume that prior to removal, the map has just trebled in size and the extra space wasn't used. Doesn't this imply that a map has an overhead of about 100 bytes per key/value pair? How can this be so?

skywhopper3y ago

There's some some waste involved in not pre-allocating the map to begin with. Check the output of this variation (https://go.dev/play/p/vQwg3GajzXx -- n shrunk to 1,000 to stay within the playground's memory constraints) which fills the map again after the GC. You'll see the map doesn't grow back to the max size.

And if you specify the size of the map up front in the `make()` function, it never grows or shrinks (or not significantly): https://go.dev/play/p/AGW35kMMOc5

IIRC, what's happening is that whenever the map hits what was pre-allocated, it expands by a certain growth factor of the current size. So at some point between 1 and 1,000,000 a big chunk was allocated that went over what was necessary for 1,000,000 entries. In my testing it happened around i == 851_500:

   i == 851000: 258507 KB
   i == 852000: 479212 KB

SeanLuke3y ago

Hashtables often treble when they reach 50% utilization. I think I'm already factoring that in. It's still 100 bytes overhead!

EdiX3y ago

If you are referring to the change from 461 MB to 293 MB that's because he didn't call runtime.GC right after the insertion loop.

hknmtt3y ago· 3 in thread

i don't see any issue here. same behavior can be achieved with slices: foo := make([]int, 0, 1000) for { for k := range bar { foo = append(foo, k) } foo = foo[:0] }

the slice will grow as much as the largest dataset. map will be the same. you need to let go of it to be GCd and create a new one.

olliej3y ago

The point is that you can remove the entries from the map, and the map won't ever shrink. If you're using large value type in the map[1], the map's dead storage will be large - by a functionally unbound amount. Most sane collection libraries shrink their backing store after some sufficiently large portion becomes dead.

[1] I would argue a general purpose hash table/map should really switch to using a hash code=>index mapping automatically if the value type size is sufficiently large.

morelisp3y ago

> Most sane collection libraries shrink their backing store after some sufficiently large portion becomes dead.

If you exclude Java, and C++ and C# I think, stdlibs from being sane, sure.

1 more reply

rob743y ago

The map will shrink, just not by as much as you might expect. You could also argue that they have picked a pathological case for their example because they use 128 byte entries ("If a key or a value is over 128 bytes, Go won’t store it directly in the map bucket. Instead, Go stores a pointer to reference the key or the value" - which probably leads to less memory consumption).

sys_647383y ago· 1 in thread

I thought GO had garbage collection or is that just trash talk?

avianlyric3y ago

It does, and as the article mentions, it’ll collect the map elements. But it doesn’t collect the map infrastructure that grew to accommodate the elements, and doesn’t shrink when the elements are removed, because the map never stops referencing them.

cglong3y ago

I was going to complain that a bug should be filed for this, but one of the blog comments says this issue could be fixed by [1].

[1]: https://github.com/golang/go/issues/54766

1 more reply

ilyt3y ago

Huh, I didn't knew that... that might explain one slow "leak" I've been noticing in one of my apps.

Shame that quirk wasn't documented

j / k navigate · click thread line to collapse

68 comments

49 comments · 10 top-level

mjpa863y ago· 11 in thread

fnord1233y ago

That's the most used example of a memory leak but it is not the definition of a memory leak.

If you put data into a hash map and forget the key, you leaked.

saghm3y ago

titzer3y ago

This is the use case for weak maps, which both Java and JavaScript have. In the latter case, the map is not iterable, so one cannot observe JavaScript GC (through WeakMap at least).

simiones3y ago

twic3y ago

Those pretty much can't happen in garbage collected languages, so the usage of the term has been widened to include things like this. I agree it's a shame.

Roedy Green coined the name "packratting" for this modern kind of memory leak: https://www.mindprod.com/jgloss/packratting.html

simiones3y ago

Memory leaks have always meant "failing to free memory that is no longer needed".

1 more reply

ufo3y ago

I've also seen the term "memory bloat".

rob743y ago

Yeah, that makes the title pretty much clickbait, because a memory leak in a memory-safe language would really be a big deal...

TheDong3y ago

> a memory leak in a memory-safe language would really be a big deal...

It is not.

Let me show you a memory leak in the memory safe language, rust:

    let vec: Vec<u8> = Vec::with_capacity(1024);
    std::mem::forget(vec);

Let me show you a memory leak in the memory safe language, go:

    _ = time.Tick(1 * time.Second)

See the docs for time.Tick in the stdlib, which documents that calling it is a memory leak: https://pkg.go.dev/time@go1.19.3#Tick

You can also, if you want to leak memory in go, set the environment variable GOGC=off, and there you go, instant memory leak.

Practically any language, memory safe or otherwise, will let you create a memory leak.

EdwardDiego3y ago

Plenty easy to leak memory in memory-safe languages. I'm assuming we're including GC-ed languages in that set.

sigg33y ago

Go is just memory safe until you have a race, or so I have heard.

3 more replies

tapirl3y ago· 8 in thread

This is more an optimization than a memory leak.

To avoid the kind-of memory leak, just make a new map and discard the old one.

esprehn3y ago

That's easily resolved with an ensureCapacity(size) method, or the other way around with shrinkToFit(size).

maps being magic without a real interface means exposing new API surface is difficult though.

morelisp3y ago

tapirl3y ago

Personally, I am not against the proposal to add a "shrink(aMapOrSlice)` builtin function. In fact, there are more builtin functions on my wish list, such as "concat(slices)" and "decap(aSlice)".

It is just that the Go core team have their own philosophy.

jimbokun3y ago

The article gives a common place example where this could be an issue, so I don’t know what you mean by “in practice” here.

throwaway8943453y ago

1 more reply

slantedview3y ago

> Yes, Go maps never shrink. This is good for most use cases in practice.

Why is it that every shortcoming of Go is spun by its advocates into something that is actually a good thing?

morelisp3y ago

In what languages do maps shrink? You may be surprised.

1 more reply

philosopher12343y ago

Why is it that haters of go refuse to engage with the actual argument?

Edit: rephrasing your comment “why is it that every time I say something misleading about go people disagree with me?”

twic3y ago· 7 in thread

FWIW i am pretty sure Java's HashMap has the same behaviour - it grows the table, but never shrinks it. Even if you call .clear(), it just clears out the table, rather than throwing the table away.

Gwypaas3y ago

Rust uses "shrink_to_fit()". Personally never had to use it, but you always end up scrolling by the backing allocation management for all the standard collections when looking through the docs.

> Shrinks the capacity of the map as much as possible. It will drop down as much as possible while maintaining the internal rules and possibly leaving some space in accordance with the resize policy.

https://doc.rust-lang.org/std/collections/struct.HashMap.htm...

And the docs makes it clear that "clear()" only removes the elements. Giving a hint of where to go next if you stumble upon the issue in the OP.

> "Clears the map, removing all key-value pairs. Keeps the allocated memory for reuse."

https://doc.rust-lang.org/std/collections/struct.HashMap.htm...

eptcyka3y ago

_old_dude_3y ago

The main difference with Go is that the table of a HashMap in Java is a table of pointers, not a table of structs, so the overhead to not shrink the table is less an issue.

EdwardDiego3y ago

E.g.,

    var map = new HashMap<Bla, Bla>();
    // Many things added to map
    // Many but not all things removed from map
    // to shrink it to size of remaining items
    map = new HashMap<>(map);

Yeah, it's not nice, but the people who wrote the JDK are far smarter than me, so I figure they optimised for the 99% use-case, not the 1%.

mariusor3y ago

I think that's exactly what TFA is proposing by recreating the map

    m := make(map[Bla]Bla)
    // add to map
    // mark map as ready for GC
    m = make(map[Bla]Bla)

foldr3y ago

creata3y ago

> If a map has this behaviour, i would say that the most important thing is that it should be clearly documented

Iirc, most hash table implementations don't automatically shrink. More documentation is always nice, but isn't not automatically shrinking kind of the expected "default" behavior?

eschneider3y ago· 3 in thread

eptcyka3y ago

Go doesn't really allow you to create a hashmap with the same generic possibilities as the built in one.

ironick093y ago

1 more reply

jimbokun3y ago

Is that still true with the support Go added for generics?

1 more reply

masklinn3y ago· 3 in thread

> Also, in this case, the amount of required memory is less significant during peak times due to some optimizations to reduce the memory consumed.

LukeShu3y ago

> I've not heard of Go maps being very high load factors, so they probably resize around 50% or so full.

Go uses a load factor of 6.5 (expressed as average load of a bucket; where buckets have a maximum load of 8; i.e. 81% full) (ref: https://github.com/golang/go/blob/go1.19.3/src/runtime/map.g... )

> I’ll assume … doubling on resize

Indeed (ref: https://github.com/golang/go/blob/go1.19.3/src/runtime/map.g...)

morelisp3y ago

50% load factor would be incredibly low for any hash table. Go uses 6.5/8 per bucket right now I believe, or 81% if you want to be crass about it.

masklinn3y ago

> 50% load factor would be incredibly low for any hash table.

1 more reply

SeanLuke3y ago· 3 in thread

< 293 MB <-- After we remove 1 million elements

skywhopper3y ago

And if you specify the size of the map up front in the `make()` function, it never grows or shrinks (or not significantly): https://go.dev/play/p/AGW35kMMOc5

   i == 851000: 258507 KB
   i == 852000: 479212 KB

SeanLuke3y ago

Hashtables often treble when they reach 50% utilization. I think I'm already factoring that in. It's still 100 bytes overhead!

EdiX3y ago

If you are referring to the change from 461 MB to 293 MB that's because he didn't call runtime.GC right after the insertion loop.

hknmtt3y ago· 3 in thread

i don't see any issue here. same behavior can be achieved with slices: foo := make([]int, 0, 1000) for { for k := range bar { foo = append(foo, k) } foo = foo[:0] }

the slice will grow as much as the largest dataset. map will be the same. you need to let go of it to be GCd and create a new one.

olliej3y ago

[1] I would argue a general purpose hash table/map should really switch to using a hash code=>index mapping automatically if the value type size is sufficiently large.

morelisp3y ago

> Most sane collection libraries shrink their backing store after some sufficiently large portion becomes dead.

If you exclude Java, and C++ and C# I think, stdlibs from being sane, sure.

1 more reply

rob743y ago

sys_647383y ago· 1 in thread

I thought GO had garbage collection or is that just trash talk?

avianlyric3y ago

cglong3y ago

I was going to complain that a bug should be filed for this, but one of the blog comments says this issue could be fixed by [1].

[1]: https://github.com/golang/go/issues/54766

1 more reply

ilyt3y ago

Huh, I didn't knew that... that might explain one slow "leak" I've been noticing in one of my apps.

Shame that quirk wasn't documented

j / k navigate · click thread line to collapse