undefined | Better HN

0 pointsdevit1y ago0 comments

You can, but it turns out that, as one may intuitively expect, a GC is never needed unless implementing a VM for a GC-based language or an API that required GC like fd passing on unix domain sockets, and those generally want an ad-hoc GC instead tailored to whatever you are implementing.

Since it's not needed and it's massively worse than reference counting (assuming you only change reference counts when essential and use borrowing normally) due to the absurd behavior of scanning most of the heap at arbitrary times, there is no Rust GC crate in widespread use.

0 comments

20 comments · 5 top-level

tsimionescu1y ago· 12 in thread

Almost all GCs used in practice today only scan the set of live objects, which in normal operation is much smaller than the entire heap. They also allow much more efficient allocation and de-allocation.

The problems with GC are threefold, and why you might not want it in a systems language:

1. GC requires more memory than strictly necessary to be efficient (usually about 1.5x - 2x the amount you absolutely need). You're basically trading runtime efficiency for memory.

2. GC performance is harder to predict and reason about than certain other allocation strategies

3. GC languages tend to encourage excessive heap allocation for various reasons, ending up with much more junk than a typical Rust or C program that has a similar amount of entities

Note that item 2 is the one that's least understood. The best part about GCs is that they make heap allocation trivial, and they make de-allocation a no-op. In contrast, both malloc() and free() are extremely complex and costly operations. The GC does impose a cost on every pointer write, similar to (but typically less than) the overhead of Arc<T> over a T*, but that has a very uniform and predictable cost. The problem of unpredictability only comes in the collection phase, and is mostly related to (a) when the collection happens, (b) how much data actually has to be scanned (how many live objects are present on the heap and stack), and (c) what type of collection needs to happen (is it enough to collect from this thread's young generation, or do you need to collect all generations from all threads).

Note that many of these problems are in fact solvable, and there actually exist GCs with constant predictable collection times, suitable even for realtime applications (which malloc/free don't support). They are very sophisticated technology that no one is distributing for free though (e.g. you have to pay Azul for a realtime-compatible Java).

devbent1y ago

People often forget that the runtime of free is not trivially calculatable!

I've worked on tiny embedded systems (.net micro framework) where for a given usage pattern the GC was perfectly predictable, as it should be.

devitOP1y ago

What do you mean?

Assuming you use a set of slabs of fixed size objects and keep free objects in a linked list, both malloc and free are trivial O(1) operations.

Destructors with cascading deletions can take time bounded only by memory allocation, but you can solve that for instance by destroying them on a separate thread, or having a linked list of objects to be destroyed and destroying a constant number/memory size of them on each allocation.

4 more replies

ShroudedNight1y ago

'Costly' doesn't seem to require that O(k) be non-constant:

http://www.gii.upv.es/tlsf/

naasking1y ago

> 2. GC performance is harder to predict and reason about than certain other allocation strategies

I'm not sure what you mean by this being the least understood. It seems like it's very well understood: GC introduces latency if you want good throughput, or it reduces throughput if you want excellent latency (sub-100us is possible).

Of course, that doesn't mean you can predict what specific throughput or latency properties your specific program will have, except for GCs that have maximum bounded latency guarantees.

tsimionescu1y ago

The point is that it's very hard to predict how much of an impact the GC will have a priori. You can of course measure after the fact, and try to improve, but it's hard to architect your system in a way that you can more or less guarantee will get good GC performance (other than just not allocating any memory, of course). Malloc actually suffers from a similar problem: it's just hard to know a priori if your access patterns will work well with the internals of your allocator/GC.

2 more replies

andrepd1y ago

100us counts as excellent latency? x) that's half a million CPU cycles!

1 more reply

tome1y ago

> You're basically trading runtime efficiency for memory.

What do you mean? Aren't GCs both less efficient at run time and use more memory?

tsimionescu1y ago

No, the more memory you give them, the more efficient they are, and generally they are much more efficient than an equivalent program using malloc() + free() for every piece of memory that gets allocated.

Take the extreme case of a program where every allocation is permanent. The GC allocation will be much much faster than malloc() (GC allocation is normally just bumping a pointer to the end of the heap, while malloc() typically does a lot more work to segregate objects by size etc), and collection will never run. So, overall, the program will be significantly faster.

Edit: More realistically, a program that typically produces 1:1 live mmeory:junk, but occasionally spikes, will consume 0 GC time in normal operation (the GC will just let the junk sit there), while free() needs to run on all that junk as soon as it was created, otherwise it leaks forever.

Also, the fact that GCs can defer collection to later than the object going out of scope can often make them more efficient than a traditional free(). For example, if you have a large graph data structure that is now junk, free() needs to do work that's O(size of graph) to free all the nodes. A copying collector, the most popular design, never even looks at the graph, so it frees it in O(1). Edit: of course, the converse is also true: if the graph survives N collections, the copying GC has to do O(N * size) work, where malloc()/free() do nothing at all.

2 more replies

mananaysiempre1y ago

Worst case, 2× is actually a steal compared to a non-moving allocator such as malloc/free or RC built on top of that, which cannot[1] do better than about 1 + ½log₂(largest allocation / smallest allocation). For example, if you have allocations from 1K to 16K bytes, any malloc/free implementation can require at least 3× the memory you actually use; if from 16 to 16K, 6×; etc.

At this point I must mention that paged memory can get you just enough movability that catastrophic fragmentation of this kind is avoided[2] with high probability. Paging tricks can be seriously expensive on a modern multicore processor, though, so I’m not sure this is the way forward. (The paper report only 1% perf overhead for Firefox; I don’t know if that means Firefox is less multithreaded than it could be or what.)

[1] https://www.sqlite.org/malloc.html

[2] https://github.com/plasma-umass/mesh

1 more reply

nxobject1y ago

It depends what you mean by “efficient”: a world that controls when someone can free memory can be surprisingly efficient at dealing with the fundamental fragmentation problem in terms of overall churn over time (throughout) at the cost of space and immediate time (latency). Both are different forms of efficiency.

Manual memory management as a solution to the fragmentation problem trades that off, not knowing anything about when free might be called, and so has to lean towards optimising space and immediate time rather than throughout. But there’s still a memory manager behind the scenes that has to deal with fragmentation as well; there’s no get out of jail free card for that, and that complexity is still hidden.

(Helpful memory usage disciplines like arenas/pools have their desirable properties for the same reasons: it’s a discipline on when you free memory in order to avoid fragmentation.)

kaba01y ago

They are massively faster (as in, has better throughput) than ref counting. See my previous comment for more details.

1 more reply

nxobject1y ago

Side question: does Azul still sell custom silicon/designs to accelerate JVM?

kaba01y ago· 2 in thread

> Since it's not needed and it's massively worse than reference counting

Lol, what? Maybe don’t go asserting stuff you clearly know little about. Reference counting is a fine tradeoff for manual memory-managed languages, but it is absolutely smoked out of the water by a tracing GC on most counts. It’s almost like JVM, V8, etc engineers know a thing about the topic and don’t have RC for a good reason.

Tracing GC doesn’t burden the mutator threads with additional work, almost everything can be done in parallel, resulting in vastly better throughput. Imagine dropping the last reference to a huge graph, one can actually observe it when exiting a c++ program, it might hang for a few seconds before returning control to you, as all the destructors are recursively called, serially, on the program thread, literally pointer by pointer jumping across the heap, the very thing you are so afraid of. And I didn’t even get to the atomic part, bumping a number up or down with synchronization between CPUs is literally the slowest operation you can do on a modern machine. Tracing GCs elegantly avoid all these problems at the price of some memory overhead. None of these GC algorithms (yes, RC is a GC) is a silver bullet, but let’s not joke ourselves.

devitOP1y ago

That's because, unlike Rust, those languages with RC would have a lot of unnecessarily refcounted objects because they don't have value objects, do a whole lot of useless reference count updates because they don't have borrowing and always have to use atomics because they can't ensure that some objects are not shared between threads (and also would need a cycle collector in addition to the reference counting).

If you use reference counting properly in a well-designed language then it's obviously better than GC since it's rarely used, fast, simple, local and needs no arbitrary heuristics.

The destructor cascades are only a problem for latency and potential stack overflow and can be solved by having custom destructors for recursive structures that queue nodes for destruction, or using arena allocators if applicable.

kaba01y ago

So, RC is better than tracing GC, when it’s not used as memory management, and it is special cased everywhere.. got you!

Like, as I explicitly wrote, it is probably the correct choice for low-level languages close to the metal, that want easy compatibility with other languages through FFI. But the method itself has still got a much slower throughput than a tracing GC, when used in a similar manner. Anything else is a useless comparison, like is a bicycle better than a tank.

1 more reply

o11c1y ago· 1 in thread

For GC-based inner languages it's mandatory to mention that inter-language cycles are nasty. Life is much easier in a world of explicit ownership!

I curate a list of what kinds of ownership people actually want: https://gist.github.com/o11c/dee52f11428b3d70914c4ed5652d43f...

It's been 6 years since I first posted it publicly, and neither I nor anyone giving suggestions has ever actually found a use for GC.

zarzavat1y ago

IME the utility of GC is not for structs but for closures. It’s very easy to cause cycles when freely using closures in non-GC languages.

nickpsecurity1y ago

Many use GC languages for productivity in code where that’s more important. Far as slowdowns, there are concurrent and real-time (fixed timing) GC’s out there which reduce or avoid the problems you mention. JX OS also let you mix different GC’s for different components.

So, there’s a performance hit of some kind with optional tuning that takes no expertise. Many people would go for those tradeoffs for some applications. Especially if they got to keep using safe, no-GC code in their libraries with their efficiency benefits.

eru1y ago

> You can, but it turns out that, as one may intuitively expect, a GC is never needed unless [...]

Rust uses plenty of reference counting. You could replace most of that with a GC; depending on your GC implementation and your reference counting implementation, and your requirements in terms of latency and throughput and ability to handle cycles.

> Since it's not needed and it's massively worse than reference counting (assuming you only change reference counts when essential and use borrowing normally) due to the absurd behavior of scanning most of the heap at arbitrary times, there is no Rust GC crate in widespread use.

There are choices. You can have real time garbage collectors. And if you want to handle cycles, you need some kind of tracing anyway.

Also, if you only care about throughput and not about latency, garbage collection can easily be faster than reference counting.

If you don't allow cycles, you can also take advantage of that in your garbage collection. See how Erlang does it. (In Erlang, the garbage collector relocates your objects so that they are in topological order, so all references only point forwards in memory. You can combine that with generational gc, too, of course.)

j / k navigate · click thread line to collapse

0 comments

20 comments · 5 top-level

tsimionescu1y ago· 12 in thread

The problems with GC are threefold, and why you might not want it in a systems language:

1. GC requires more memory than strictly necessary to be efficient (usually about 1.5x - 2x the amount you absolutely need). You're basically trading runtime efficiency for memory.

2. GC performance is harder to predict and reason about than certain other allocation strategies

3. GC languages tend to encourage excessive heap allocation for various reasons, ending up with much more junk than a typical Rust or C program that has a similar amount of entities

devbent1y ago

People often forget that the runtime of free is not trivially calculatable!

I've worked on tiny embedded systems (.net micro framework) where for a given usage pattern the GC was perfectly predictable, as it should be.

devitOP1y ago

What do you mean?

Assuming you use a set of slabs of fixed size objects and keep free objects in a linked list, both malloc and free are trivial O(1) operations.

4 more replies

ShroudedNight1y ago

'Costly' doesn't seem to require that O(k) be non-constant:

http://www.gii.upv.es/tlsf/

naasking1y ago

> 2. GC performance is harder to predict and reason about than certain other allocation strategies

Of course, that doesn't mean you can predict what specific throughput or latency properties your specific program will have, except for GCs that have maximum bounded latency guarantees.

tsimionescu1y ago

2 more replies

andrepd1y ago

100us counts as excellent latency? x) that's half a million CPU cycles!

1 more reply

tome1y ago

> You're basically trading runtime efficiency for memory.

What do you mean? Aren't GCs both less efficient at run time and use more memory?

tsimionescu1y ago

2 more replies

mananaysiempre1y ago

[1] https://www.sqlite.org/malloc.html

[2] https://github.com/plasma-umass/mesh

1 more reply

nxobject1y ago

(Helpful memory usage disciplines like arenas/pools have their desirable properties for the same reasons: it’s a discipline on when you free memory in order to avoid fragmentation.)

kaba01y ago

They are massively faster (as in, has better throughput) than ref counting. See my previous comment for more details.

1 more reply

nxobject1y ago

Side question: does Azul still sell custom silicon/designs to accelerate JVM?

kaba01y ago· 2 in thread

> Since it's not needed and it's massively worse than reference counting

devitOP1y ago

If you use reference counting properly in a well-designed language then it's obviously better than GC since it's rarely used, fast, simple, local and needs no arbitrary heuristics.

kaba01y ago

So, RC is better than tracing GC, when it’s not used as memory management, and it is special cased everywhere.. got you!

1 more reply

o11c1y ago· 1 in thread

For GC-based inner languages it's mandatory to mention that inter-language cycles are nasty. Life is much easier in a world of explicit ownership!

I curate a list of what kinds of ownership people actually want: https://gist.github.com/o11c/dee52f11428b3d70914c4ed5652d43f...

It's been 6 years since I first posted it publicly, and neither I nor anyone giving suggestions has ever actually found a use for GC.

zarzavat1y ago

IME the utility of GC is not for structs but for closures. It’s very easy to cause cycles when freely using closures in non-GC languages.

nickpsecurity1y ago

eru1y ago

> You can, but it turns out that, as one may intuitively expect, a GC is never needed unless [...]

There are choices. You can have real time garbage collectors. And if you want to handle cycles, you need some kind of tracing anyway.

Also, if you only care about throughput and not about latency, garbage collection can easily be faster than reference counting.

j / k navigate · click thread line to collapse