If you put data into a hash map and forget the key, you leaked.
For example, a common example of a memory leak is adding items to a "cache" without any mechanism that evicts items from the cache in any scenario. The "cache" is thus not a cache, but a memory leak (a common implementation of this leaking scenario is that items are put in a map, but never removed from the map).
Memory leak has never, as far as I know, referred to the specific case of memory that is no longer accessible from program code to be freed. In fact, this definition doesn't even make sense from a runtime system perspective, since even in C, the memory is actually always still reachable - from malloc()'s internal data structures.
Roedy Green coined the name "packratting" for this modern kind of memory leak: https://www.mindprod.com/jgloss/packratting.html
Garbage collection literature often stresses the difference between "no longer needed" and "not reachable", noting that the former is not automatically enforceable (it amounts to solving the halting problem), but the latter is only a heuristic. So, the fact that garbage collectors can't prevent all memory leaks is always stressed by the literature.
It is not.
Let me show you a memory leak in the memory safe language, rust:
let vec: Vec<u8> = Vec::with_capacity(1024);
std::mem::forget(vec);
Let me show you a memory leak in the memory safe language, go: _ = time.Tick(1 * time.Second)
See the docs for time.Tick in the stdlib, which documents that calling it is a memory leak: https://pkg.go.dev/time@go1.19.3#TickYou can also, if you want to leak memory in go, set the environment variable GOGC=off, and there you go, instant memory leak.
Practically any language, memory safe or otherwise, will let you create a memory leak.
This is more an optimization than a memory leak.
To avoid the kind-of memory leak, just make a new map and discard the old one.
maps being magic without a real interface means exposing new API surface is difficult though.
It is just that the Go core team have their own philosophy.
Why is it that every shortcoming of Go is spun by its advocates into something that is actually a good thing?
Edit: rephrasing your comment “why is it that every time I say something misleading about go people disagree with me?”
I imagine there are lots of scenarios in which this is what you want, because after emptying the map, you're going to re-fill it, and it saves reallocating the table. But it would be frustrating in a scenario when that isn't what you want.
If a map has this behaviour, i would say that the most important thing is that it should be clearly documented (Java's isn't). The second most important thing is that there should be a way to get round it - either a .clearHarder() method which throws away the table, or a .compact() method which downsizes it while retaining the content.
> Shrinks the capacity of the map as much as possible. It will drop down as much as possible while maintaining the internal rules and possibly leaving some space in accordance with the resize policy.
https://doc.rust-lang.org/std/collections/struct.HashMap.htm...
And the docs makes it clear that "clear()" only removes the elements. Giving a hint of where to go next if you stumble upon the issue in the OP.
> "Clears the map, removing all key-value pairs. Keeps the allocated memory for reuse."
https://doc.rust-lang.org/std/collections/struct.HashMap.htm...
E.g.,
var map = new HashMap<Bla, Bla>();
// Many things added to map
// Many but not all things removed from map
// to shrink it to size of remaining items
map = new HashMap<>(map);
Yeah, it's not nice, but the people who wrote the JDK are far smarter than me, so I figure they optimised for the 99% use-case, not the 1%. m := make(map[Bla]Bla)
// add to map
// mark map as ready for GC
m = make(map[Bla]Bla)Wouldn't just creating a new map be a pretty good way of doing it? If the reduction in size is significant, then the cost of the new allocation will be small compared to the cost of all the allocations you've done to build the original map.
Iirc, most hash table implementations don't automatically shrink. More documentation is always nice, but isn't not automatically shrinking kind of the expected "default" behavior?
Pretty sure it’s the same: because of collisions, maps always have a certain fraction of empty slots (much more so than vectors). I’ve not heard of Go maps being very high load factors, so they probably resize around 50% or so full. I didn’t check any of the numbers, I’ll assume 50% load factor, doubling on resize, and entries of {hash, key, value} with a word-sized (8 bytes) hash, but some and likely all of those are likely off.
Anyway with a load factor of 50% and doubling at any point you’d have 2n to 4n slots in the map total (depending whether you’re nearing resize or just went through one). And thus if you make entries smaller, you make all those overhead slots smaller as well.
hash + int + blob would be 8+8+128 144 bytes, a million of those is 144MB, x2 is 288 and x4 is 576, so the observed bounds are pretty close: IIRC Go maps use some form of separate chaining (though not “trivial” linked lists), I assume those do get reclaimed as entries are removed even if the map itself does not get shrunk, which would explain why memory consumption goes down as elements are removed from the initial map.
With a pointer each “entry” is 24 bytes instead, for a map overhead of 48~96MB (so clearly Go maps use either a higher load factor or a more efficient storage than the trivial scheme being assumed here, possibly both).
The actual data accounts for 128M plus some, there’s a 144M difference between the full and the emptied pointer-maps, which would account for some chain links being removed, and maybe some object headers / allocator overhead for the on-heap arrays.
Go uses a load factor of 6.5 (expressed as average load of a bucket; where buckets have a maximum load of 8; i.e. 81% full) (ref: https://github.com/golang/go/blob/go1.19.3/src/runtime/map.g... )
> I’ll assume … doubling on resize
Indeed (ref: https://github.com/golang/go/blob/go1.19.3/src/runtime/map.g...)
Might be that I’m more used to open addressing maps? IIRC circa 50 is a pretty good starting point for open addressing unless the collision resolution is specifically designed for that (e.g. swisstable). Not to say it can’t get higher, but for some the that’s where the performance starts going down (e.g. non-bucketed cuckoo hashing).
Let's be kind and assume that prior to removal, the map has just trebled in size and the extra space wasn't used. Doesn't this imply that a map has an overhead of about 100 bytes per key/value pair? How can this be so?
And if you specify the size of the map up front in the `make()` function, it never grows or shrinks (or not significantly): https://go.dev/play/p/AGW35kMMOc5
IIRC, what's happening is that whenever the map hits what was pre-allocated, it expands by a certain growth factor of the current size. So at some point between 1 and 1,000,000 a big chunk was allocated that went over what was necessary for 1,000,000 entries. In my testing it happened around i == 851_500:
i == 851000: 258507 KB
i == 852000: 479212 KBthe slice will grow as much as the largest dataset. map will be the same. you need to let go of it to be GCd and create a new one.
[1] I would argue a general purpose hash table/map should really switch to using a hash code=>index mapping automatically if the value type size is sufficiently large.
If you exclude Java, and C++ and C# I think, stdlibs from being sane, sure.
Shame that quirk wasn't documented