Rare Are GC Talks (opens in new tab)

(furious-waterfall-55.heroku.com)

138 pointspat_shaughnessy13y ago39 comments

39 comments

34 comments · 12 top-level

kmm13y ago· 9 in thread

I don't see how a Copying GC can be faster than Mark and Sweep. Transferring a book may be easy, but copying an object can be quite costly. I can't imagine copying almost all objects everytime the GC runs.

waterhouse13y ago

Suppose that the program only needs about 1 MB of objects, but it will fill up 100 MB before performing a copying GC. In mark and sweep, you need to perform an operation for every garbage object (of which there are 99 MB), whereas with copying GC, you need only perform operations for every live object (of which there is 1 MB). Even if copying an object takes ten times as long as returning a chunk of memory to a free-list, the copying algorithm is faster. (Also, popping a free cell off a free-list is more expensive than incrementing an allocation pointer.) Of course, either way, the total work done for allocating objects and freeing them is O([number of objects allocated]).

Something like this is described in detail here: "Garbage Collection Can Be Faster Than Stack Allocation", Andrew Appel: http://www.cs.princeton.edu/~appel/papers/45.ps

silentbicycle13y ago

Not necessarily -- lazy sweeping only needs to do work proportional to live data (like copying GC), rather than proportional to dead data.

I have a pretty straightforward implementation of a lazy mark-sweep collector here (in C): https://github.com/silentbicycle/oscar

Basically, instead of marking live data, then sweeping all unmarked data, you just mark live data, then the next time a slot is requested, you step the allocator over the heap until it finds the next unmarked cell and re-use that. If most cells are dead, this happens very quickly. If you have to sweep over too many live cells (left deliberately vague), you grow the heap.

1 more reply

cwzwarich13y ago

The problem with applying Appel's argument in practice is that it only shows that copying garbage collection is better if you are willing to waste RAM. In practice, if you had a 1 MB working set and 100 MB of RAM, you would probably want to use most of the extra 99 MB for caching even slower levels of your storage hierarchy.

Moving garbage collection does have some other potential benefits; it can provide guarantees regarding fragmentation, and it can improve spatial locality of data.

1 more reply

skybrian13y ago

Copying is faster if the memory to be collected is mostly garbage. You only copy the good stuff and don't even look at the garbage. Many programs (particularly in scripting languages) allocate memory but only use it for a short time, so they create lots of garbage.

Often this is combined with generational garbage collection - new objects live for a short time in a temporary area, and then if they are still in use (not garbage), they get copied to longer-term storage.

jeffdavis13y ago

Copying also means moving to new addresses. What strategies are used to update the referencing pointers to the new location?

1 more reply

tomp13y ago

in OCaml, copying GC is used only in the young generation. functional languages allocate lots of shortly lived objects, so copying gc is a good fit, since allocation is the cheapest possible, and as only a few objecta survive a gc, the cost of copying them is low. Since the GC is generational, objecta get copied at most n times, for some n.

riffraff13y ago

the JVM (HotSpot) also uses this approach, the young generation is handled by a copying GC while the older one(s) use mark & sweep.

(to be fair, I believe this _was_ the default GC option, but it's not anymore, and there were at least 3 different young generation GC algorithms that used copying, so it's a bit more nuanced)

tikhonj13y ago

As others have pointed out, a copying GC only needs to access live objects and does not need to go through the garbage.

Another thing to keep in mind is that a copying GC maintains some favorable memory properties. In particular, it reduces fragmentation each time it is run: since you're only copying over live objects, all your objects remaining objects get put next to each other in memory. This is especially useful if you have relatively few objects interspersed with lots of garbage.

maximilianburke13y ago

Copying GC's may be slower to collect if you compare the speed of identifying live/dead objects versus object identification and copying of live objects but they have a significant improvement in the speed of allocation.

Copying GC's allow you to perform allocation by incrementing a pointer, whereas non-copying collectors force you to traverse all the free blocks in the heap looking for a suitable allocation spot, similar to what malloc(). As the heap fragments over time this cost becomes greater and greater.

Copying GCs can also improve memory locality during collection which a non-copying GC cannot do, giving a general performance improvement for the application.

ctide13y ago· 4 in thread

At the December 2008 Kyushu Ruby 01 Conference, I asked the audience "How many of you here have some interest in garbage collection?" Out of 200 people, only 3 raised their hands

Can you imagine how many people would raise their hands if you asked that question NOW at a ruby conference? Crazy how times change.

notatoad13y ago

can you explain a bit why that has changed? I don't really know anything about ruby, but I would have thought that being able to ignore abstractions like garbage collection was the reason that people used high-level languages. Does programming in ruby involve a lot of interaction with the garbage collector, or are you saying that a lot of the ruby community is actually becoming involved with writing and improving low-level language features?

suresk13y ago

My experience in this regard comes from Java, not Ruby, but I think the basic concepts are the same: You are correct in that, for a lot of uses, ignoring abstractions like GC is just fine.

If you are building something that uses a lot of memory, requires high throughput or low latency, or are getting into situations where GC is taking up a decent chunk of your CPU time, then it can pay to have a better understanding of the GC concepts so you can make beneficial changes to your application or to the GC parameters.

Plus, stuff like GC is kind of interesting by itself for some people. Some of my favorite sessions at JavaOne were always the GC and JVM internals ones, despite the fact that their immediate utility to me wasn't always obvious.

bri3d13y ago

The MRI Ruby 1.8 garbage collector was sub-optimal for the large web applications it ended up powering, due to both blocking / performance issues and a lack of predictability and instrumentation. In 2009-2011, many high-profile Ruby 1.8 implementation enhancements focused on garbage collection, as the MRI 1.8 GC became a major pain point for a lot of growing high-profile Ruby + Rails shops. Both Ruby Enterprise Edition and Twitter's Kiji were very popular MRI 1.8 enhancements that focused on improving the performance and visibility / instrumentation / debuggability of the garbage collector.

Now that JRuby is much more mature and YARV / Ruby 1.9 has enjoyed wide adoption, more of the Ruby community are able to ignore GC (or only tune it once) - but that simply wasn't possible with MRI 1.8 and a large-scale web app.

ctide13y ago

One of the biggest problems with the 1.8.x versions of MRI (or CRuby as it's referred to in this article) is that the GC is pretty slow and tends to get called a lot. A good explanation of the reasons behind that can be found here:

http://www.engineyard.com/blog/2010/mri-memory-allocation-a-...

As Rails got more popular and sites built on Rails got bigger, people started running into performance issues that could often be rooted down to GC being called over and over during a single web request (due to instantiation of lots of AR objects, causing lots of memory allocations which lead to GC being called.) This lead to people exploring other implementations of Ruby (REE, Rubinius, etc.) and more importantly, becoming a lot more familiar with how GC works in MRI and how to program around it's shortcomings.

alinajaf13y ago· 4 in thread

Interesting that there's no mention of generational GC in this article. Anyone with a bit more expertise care to comment on why not?

maximilianburke13y ago

I'm guessing it's either because of scope or because generational collectors are a type of copying collector. Instead of having just from/to spaces that the objects ping-pong between as they fill up, a generational collector will copy objects from one heap generation to the next as they fill. Once the oldest generation (tenured space) reaches capacity it cannot trickle down the live objects to another generation so some other method, like a mark+sweep+compacting method, is used for collecting it. You could probably have a half space tenured generation but half space collectors are not really space-efficient without resorting to a bunch of virtual memory mapping trickery, and the tenured space tends to be large.

msbarnett13y ago

It isn't mentioned because the author chose not to include it.

Whether that's because this isn't meant to be a complete survey of the field and the author didn't think it fit the introductory nature of the piece, or because it didn't fit within the time limits of the talk this article is based on, or because Ruby doesn't do generational optimizations and the author saw no reason to wander down that path, isn't a question that expertise in gc algorithms gives much insight into.

jimm13y ago

Yeah, I noticed that. I first heard of generational GC in association with Smalltalk VMs and I think that's where they made their debut.

cwzwarich13y ago

The first generational garbage collector was for Lisp:

http://web.media.mit.edu/~lieber/Lieberary/GC/Realtime/Realt...

gue5t13y ago· 3 in thread

The reference counting metaphor leaves out a critical aspect of reference-counting: cyclical structures and how to collect them. Books never write their names in other books so this doesn't appear in his discussion, though it's one of the more important complications for implementations.

electrograv13y ago

Honestly, this article is really bad and probably harmful to new programmers. I recommend against taking anything in it to be accurate, much less complete.

The author misses vitally key issues (like the inherent dangers of circular references in reference counting "GC"). I wouldn't even call myself slightly knowledgeable in GC, yet blatant holes in the clumsy analogies of this article are obvious to me. As someone who is not even slightly knowledgable in GC, it scares me that people are learning from someone who has massive information holes in the explanation -- holes in topics that every programmer on earth should know.

On top of all that, the only "Cons" that are mentioned of RC are absurd: "It's annoying if you forget to release your references." Wow. Profound stuff there - bugs and incorrect code is annoying. This isn't even a problem if your language supports RAII, in which it's impossible to "forget" in the first place.

Groxx13y ago

I sincerely doubt people are going to read this, and then immediately write and ship a garbage collector. For a square-zero general-idea overview to someone who has zero concept of what a GC does, it's more than accurate enough.

It could probably be about 1/2 the size, as it shoots off in weirdly-complex directions for such an introductory text. And it could probably introduce e.g. reference cycles, because they're a real problem with reference counting. But it's a clearer and simpler text by far than I generally see on this topic.

marblar13y ago

Extend the metaphor: Mr. B also has to write his name on people who he thinks he will need in the future, lest they be fired.

silentbicycle13y ago· 1 in thread

Wilson's highly readable "Uniprocessor Garbage Collection Techniques" is a far better overview: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.138.... You could write a decent GC with just the info in that paper.

It doesn't cover parallel or real-time garbage collection (which both get far more complicated); for those, you want Jones et. al's _The Garbage Collection Handbook_ (http://www.amazon.com/The-Garbage-Collection-Handbook-Manage...) and plenty of time to explore its bibliography. (His older book is also good, but doesn't cover those topics.)

mattyb13y ago

That is a good paper. Thanks for the pointer!

adamnemecek13y ago· 1 in thread

Articles like this are exactly why I come to HN.

pat_shaughnessyOP13y ago

Agreed! Also, beyond the great tech content there's something charming about this one.

drostie13y ago

One of the most fun discussions of the complications of reference counting and how nuanced you have to be comes from the Python docs:

http://docs.python.org/extending/extending.html#reference-co...

It relates "a true story. An older version of Python contained variants of this bug and someone spent a considerable amount of time in a C debugger to figure out why his __del__() methods would fail" in a considerable amount of detail.

kenmazy13y ago

Another great GC discussion for LuaJit: http://wiki.luajit.org/New-Garbage-Collector

cronin10113y ago

In the article he states "[for Copying] Conservative GC is unable to determine the difference between pointer and integer values. If an integer value was overwritten the result would be disastrous. It is likely that the result of 1+1 would change occasionally."

However reading through the RHG, it is shown that Ruby allocates memory for objects in 20 byte blocks and that this results in all pointers therefore being multiples of four. This is handy as it allows the direct usage of literals such as FixNum (an always odd 'pointer' that you shift right by one bit to access the value) and Symbol (a 'pointer' that always has 0xff as the trailing byte that you shift right 1 byte to access the unique ID) without requiring object creation.

With this in mind, can someone enlighten me as to why Copying could not be used inside Ruby? It seems as though it would be trivial for the GC to differentiate between literals and pointers as otherwise they would not be much use as literals.

davnola13y ago

I recommend Elise Huard's recent talk on Ruby GC: http://skillsmatter.com/podcast/home/ruby-bin-men

(Also, I'd love to see the Hungarian Folk Dance interpretation of different GC approaches http://t.co/l8ADbEQR)

grimlck13y ago

nice introduction, but it seems to be missing some really important concepts

- generational GC

- concurrent GC, the parts of GC which can be concurrent (which depends on your algorithm), and the tradeoffs needed to let your program run concurrently with the GC without stopping the world.

zem13y ago

very nice article. given its likely target audience, though, he really needs to explain explicitly why mark-and-sweep can't compact as it goes, rather than leaving a fragmented heap.

j / k navigate · click thread line to collapse

39 comments

34 comments · 12 top-level

kmm13y ago· 9 in thread

waterhouse13y ago

Something like this is described in detail here: "Garbage Collection Can Be Faster Than Stack Allocation", Andrew Appel: http://www.cs.princeton.edu/~appel/papers/45.ps

silentbicycle13y ago

Not necessarily -- lazy sweeping only needs to do work proportional to live data (like copying GC), rather than proportional to dead data.

I have a pretty straightforward implementation of a lazy mark-sweep collector here (in C): https://github.com/silentbicycle/oscar

1 more reply

cwzwarich13y ago

Moving garbage collection does have some other potential benefits; it can provide guarantees regarding fragmentation, and it can improve spatial locality of data.

1 more reply

skybrian13y ago

jeffdavis13y ago

Copying also means moving to new addresses. What strategies are used to update the referencing pointers to the new location?

1 more reply

tomp13y ago

riffraff13y ago

the JVM (HotSpot) also uses this approach, the young generation is handled by a copying GC while the older one(s) use mark & sweep.

(to be fair, I believe this _was_ the default GC option, but it's not anymore, and there were at least 3 different young generation GC algorithms that used copying, so it's a bit more nuanced)

tikhonj13y ago

As others have pointed out, a copying GC only needs to access live objects and does not need to go through the garbage.

maximilianburke13y ago

Copying GCs can also improve memory locality during collection which a non-copying GC cannot do, giving a general performance improvement for the application.

ctide13y ago· 4 in thread

At the December 2008 Kyushu Ruby 01 Conference, I asked the audience "How many of you here have some interest in garbage collection?" Out of 200 people, only 3 raised their hands

Can you imagine how many people would raise their hands if you asked that question NOW at a ruby conference? Crazy how times change.

notatoad13y ago

suresk13y ago

My experience in this regard comes from Java, not Ruby, but I think the basic concepts are the same: You are correct in that, for a lot of uses, ignoring abstractions like GC is just fine.

bri3d13y ago

ctide13y ago

http://www.engineyard.com/blog/2010/mri-memory-allocation-a-...

alinajaf13y ago· 4 in thread

Interesting that there's no mention of generational GC in this article. Anyone with a bit more expertise care to comment on why not?

maximilianburke13y ago

msbarnett13y ago

It isn't mentioned because the author chose not to include it.

jimm13y ago

Yeah, I noticed that. I first heard of generational GC in association with Smalltalk VMs and I think that's where they made their debut.

cwzwarich13y ago

The first generational garbage collector was for Lisp:

http://web.media.mit.edu/~lieber/Lieberary/GC/Realtime/Realt...

gue5t13y ago· 3 in thread

electrograv13y ago

Honestly, this article is really bad and probably harmful to new programmers. I recommend against taking anything in it to be accurate, much less complete.

Groxx13y ago

marblar13y ago

Extend the metaphor: Mr. B also has to write his name on people who he thinks he will need in the future, lest they be fired.

silentbicycle13y ago· 1 in thread

mattyb13y ago

That is a good paper. Thanks for the pointer!

adamnemecek13y ago· 1 in thread

Articles like this are exactly why I come to HN.

pat_shaughnessyOP13y ago

Agreed! Also, beyond the great tech content there's something charming about this one.

drostie13y ago

One of the most fun discussions of the complications of reference counting and how nuanced you have to be comes from the Python docs:

http://docs.python.org/extending/extending.html#reference-co...

kenmazy13y ago

Another great GC discussion for LuaJit: http://wiki.luajit.org/New-Garbage-Collector

cronin10113y ago

davnola13y ago

I recommend Elise Huard's recent talk on Ruby GC: http://skillsmatter.com/podcast/home/ruby-bin-men

(Also, I'd love to see the Hungarian Folk Dance interpretation of different GC approaches http://t.co/l8ADbEQR)

grimlck13y ago

nice introduction, but it seems to be missing some really important concepts

- generational GC

- concurrent GC, the parts of GC which can be concurrent (which depends on your algorithm), and the tradeoffs needed to let your program run concurrently with the GC without stopping the world.

zem13y ago

very nice article. given its likely target audience, though, he really needs to explain explicitly why mark-and-sweep can't compact as it goes, rather than leaving a fragmented heap.

j / k navigate · click thread line to collapse