<html><head></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><div><div>Apologies in advance for contributing to an obviously and increasingly off-topic thread, but this kind of FUD about GC is a pet peeve of mine.</div><div><br></div><div>On May 6, 2011, at 10:04 AM, Neal Becker wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><div><a href="http://gcc.gnu.org/ml/gcc/2002-08/msg00552.html">http://gcc.gnu.org/ml/gcc/2002-08/msg00552.html</a><br></div></blockquote></div><br><div>Counterpoint: &lt;<a href="http://lwn.net/Articles/268783/">http://lwn.net/Articles/268783/</a>&gt;. &nbsp;Sorry Linus, sometimes correctness matters more than performance.</div><div><br></div><div>But, even the performance argument is kind of bogus. &nbsp;See, for example, this paper on real-time garbage collection: &lt;<a href="http://domino.research.ibm.com/comm/research_people.nsf/pages/dgrove.ecoop07.html">http://domino.research.ibm.com/comm/research_people.nsf/pages/dgrove.ecoop07.html</a>&gt;. &nbsp;That's just one example of an easy-to-find solution to a problem that Linus holds up as unsolved or unsolvable. &nbsp;There are solutions to pretty much all of the problems that Linus brings up. &nbsp;One of these solutions is even famously implemented by CPython! &nbsp;The CPython "string +=" idiom optimization fixes at least one case of the "you tend to always copy the node" antipattern Linus describes, and lots of languages (especially Scheme and derivatives, IIRC) have very nice optimizations around this area. &nbsp;One could argue that any functional language without large pools of mutable state (i.e. Erlang) is a massive optimization for this case.</div><div><br></div><div>Another example: the "dirty cache" problem Linus talks about can be addressed by having a GC that cooperates with the VMM: &lt;<a href="http://www.cs.umass.edu/~emery/pubs/f034-hertz.pdf">http://www.cs.umass.edu/~emery/pubs/f034-hertz.pdf</a>&gt;.</div><div><br></div><div>And the "re-using stuff as fast as possible" thing is exactly the kind of problem that generational GCs address. &nbsp;When you run out of space in cache, you reap your first generation before you start copying stuff. &nbsp;One of the key insights of generational GC is that you'll usually reclaim enough (in this case, cache-local) memory that you can keep going for a little while. &nbsp;You don't have to read a super fancy modern paper on this, Wikipedia explains nicely: &lt;<a href="http://en.wikipedia.org/wiki/Garbage_collection_(computer_science)#Generational_GC_.28ephemeral_GC.29">http://en.wikipedia.org/wiki/Garbage_collection_(computer_science)#Generational_GC_.28ephemeral_GC.29</a>&gt;. &nbsp;Of course if you don't tune your GC at all for your machine-specific cache size, you won't see this performance benefit play out.</div><div><br></div><div>I don't know if there's a programming language and runtime with a real-time, VM-cooperating garbage collector that actually exists today which has all the bells and whistles required to implement an OS kernel, so I wouldn't give the Linux kernel folks <i>too</i>&nbsp;much of a hard time for still using C;&nbsp;but there's nothing wrong with the idea in the abstract. &nbsp;The performance differences between automatic and manual GC are dubious at best, and with a really good GC and a language that supports it, GC tends to win big. &nbsp;When it loses, it loses in ways which can be fixed in one area of the code (the GC) rather than millions of tiny fixes across your whole codebase, as is the case with strategies used by manual collection algorithms.</div><div><br></div><div>The assertion that "modern hardware" is not designed for big data-structure pointer-chasing is also a bit silly. &nbsp;On the contrary, modern hardware has evolved staggeringly massive caches, specifically <i>because</i> large programs (whether they're GC'd or not) tend to do lots of this kind of thing, because there's a certain level of complexity beyond which one can no longer avoid it. &nbsp;It's <i>old</i>&nbsp;hardware, with tiny caches (that were, by virtue of their tininess, closer to the main instruction-processing silicon), that was optimized for the "carefully stack-allocating everything in the world to conserve cache" approach.</div><div><br></div><div>You can see this pretty clearly by running your favorite Python benchmark of choice on machines which are similar except for cache size. &nbsp;The newer machine, with the bigger cache, will run Python considerably faster, but doesn't help the average trivial C benchmark that much - or, for that matter, Linux benchmarks.</div><div><br></div><div>-glyph</div><div><br></div></body></html>