Something like this is described in detail here: "Garbage Collection Can Be Faster Than Stack Allocation", Andrew Appel: http://www.cs.princeton.edu/~appel/papers/45.ps
I have a pretty straightforward implementation of a lazy mark-sweep collector here (in C): https://github.com/silentbicycle/oscar
Basically, instead of marking live data, then sweeping all unmarked data, you just mark live data, then the next time a slot is requested, you step the allocator over the heap until it finds the next unmarked cell and re-use that. If most cells are dead, this happens very quickly. If you have to sweep over too many live cells (left deliberately vague), you grow the heap.
Moving garbage collection does have some other potential benefits; it can provide guarantees regarding fragmentation, and it can improve spatial locality of data.
Often this is combined with generational garbage collection - new objects live for a short time in a temporary area, and then if they are still in use (not garbage), they get copied to longer-term storage.
(to be fair, I believe this _was_ the default GC option, but it's not anymore, and there were at least 3 different young generation GC algorithms that used copying, so it's a bit more nuanced)
Another thing to keep in mind is that a copying GC maintains some favorable memory properties. In particular, it reduces fragmentation each time it is run: since you're only copying over live objects, all your objects remaining objects get put next to each other in memory. This is especially useful if you have relatively few objects interspersed with lots of garbage.
Copying GC's allow you to perform allocation by incrementing a pointer, whereas non-copying collectors force you to traverse all the free blocks in the heap looking for a suitable allocation spot, similar to what malloc(). As the heap fragments over time this cost becomes greater and greater.
Copying GCs can also improve memory locality during collection which a non-copying GC cannot do, giving a general performance improvement for the application.
Can you imagine how many people would raise their hands if you asked that question NOW at a ruby conference? Crazy how times change.
If you are building something that uses a lot of memory, requires high throughput or low latency, or are getting into situations where GC is taking up a decent chunk of your CPU time, then it can pay to have a better understanding of the GC concepts so you can make beneficial changes to your application or to the GC parameters.
Plus, stuff like GC is kind of interesting by itself for some people. Some of my favorite sessions at JavaOne were always the GC and JVM internals ones, despite the fact that their immediate utility to me wasn't always obvious.
Now that JRuby is much more mature and YARV / Ruby 1.9 has enjoyed wide adoption, more of the Ruby community are able to ignore GC (or only tune it once) - but that simply wasn't possible with MRI 1.8 and a large-scale web app.
http://www.engineyard.com/blog/2010/mri-memory-allocation-a-...
As Rails got more popular and sites built on Rails got bigger, people started running into performance issues that could often be rooted down to GC being called over and over during a single web request (due to instantiation of lots of AR objects, causing lots of memory allocations which lead to GC being called.) This lead to people exploring other implementations of Ruby (REE, Rubinius, etc.) and more importantly, becoming a lot more familiar with how GC works in MRI and how to program around it's shortcomings.
Whether that's because this isn't meant to be a complete survey of the field and the author didn't think it fit the introductory nature of the piece, or because it didn't fit within the time limits of the talk this article is based on, or because Ruby doesn't do generational optimizations and the author saw no reason to wander down that path, isn't a question that expertise in gc algorithms gives much insight into.
http://web.media.mit.edu/~lieber/Lieberary/GC/Realtime/Realt...
The author misses vitally key issues (like the inherent dangers of circular references in reference counting "GC"). I wouldn't even call myself slightly knowledgeable in GC, yet blatant holes in the clumsy analogies of this article are obvious to me. As someone who is not even slightly knowledgable in GC, it scares me that people are learning from someone who has massive information holes in the explanation -- holes in topics that every programmer on earth should know.
On top of all that, the only "Cons" that are mentioned of RC are absurd: "It's annoying if you forget to release your references." Wow. Profound stuff there - bugs and incorrect code is annoying. This isn't even a problem if your language supports RAII, in which it's impossible to "forget" in the first place.
It could probably be about 1/2 the size, as it shoots off in weirdly-complex directions for such an introductory text. And it could probably introduce e.g. reference cycles, because they're a real problem with reference counting. But it's a clearer and simpler text by far than I generally see on this topic.
It doesn't cover parallel or real-time garbage collection (which both get far more complicated); for those, you want Jones et. al's _The Garbage Collection Handbook_ (http://www.amazon.com/The-Garbage-Collection-Handbook-Manage...) and plenty of time to explore its bibliography. (His older book is also good, but doesn't cover those topics.)
http://docs.python.org/extending/extending.html#reference-co...
It relates "a true story. An older version of Python contained variants of this bug and someone spent a considerable amount of time in a C debugger to figure out why his __del__() methods would fail" in a considerable amount of detail.
However reading through the RHG, it is shown that Ruby allocates memory for objects in 20 byte blocks and that this results in all pointers therefore being multiples of four. This is handy as it allows the direct usage of literals such as FixNum (an always odd 'pointer' that you shift right by one bit to access the value) and Symbol (a 'pointer' that always has 0xff as the trailing byte that you shift right 1 byte to access the unique ID) without requiring object creation.
With this in mind, can someone enlighten me as to why Copying could not be used inside Ruby? It seems as though it would be trivial for the GC to differentiate between literals and pointers as otherwise they would not be much use as literals.
(Also, I'd love to see the Hungarian Folk Dance interpretation of different GC approaches http://t.co/l8ADbEQR)
- generational GC
- concurrent GC, the parts of GC which can be concurrent (which depends on your algorithm), and the tradeoffs needed to let your program run concurrently with the GC without stopping the world.