Queue cleanup
Lawrence D'Oliveiro
ldo at geek-central.gen.new_zealand
Sat Sep 4 22:33:03 EDT 2010
In message <7x7hj2kyd6.fsf at ruckus.brouhaha.com>, Paul Rubin wrote:
> Lawrence D'Oliveiro <ldo at geek-central.gen.new_zealand> writes:
>
>> In message <7xmxs2uez1.fsf at ruckus.brouhaha.com>, Paul Rubin wrote:
>>
>>> GC's for large systems generally don't free (or even examine) individual
>>> garbage objects. They copy the live objects to a new contiguous heap
>>> without ever touching the garbage, and then they release the old heap.
>>
>> And suddenly you’ve doubled the memory requirements. And on top of that,
>> since you’re moving the valid objects into different memory, you’re
>> forcing cache misses on all of them as well.
>
> A minimal naive implementation indeed doubles the memory requirements,
> but from a Python perspective where every integer takes something like
> 24 bytes already, even that doesn't seem so terrible.
Doubling 24 is less terrible than doubling 4 or 8?? You’re kidding, right?
> More sophisticated implementations use multiple small heaps or other
> tricks.
More and more complications to patch up the idea. At some point, you have to
admit there is something fundamentally flawed about the whole concept.
> The new heap is filled sequentially so accesses to it will have good
> locality.
Unfortunately, that‘s not how locality of reference works. It doesn’t matter
whether the objects being accessed are close together in memory or far apart
(not with modern fully-associative caches, anyway), what does matter is the
frequency distribution of references, namely that the vast majority of
references are to a tiny minority of objects.
Your generational garbage collector completely breaks this assumption, by
regularly forcing an access to every single object in the heap. Cache-
thrashing, anyone?
> It's also the case that programs with very large memory consumption tend
> to use most of the memory for large arrays that don't contain pointers
> (think of a database server with a huge cache). That means the gc
> doesn't really have to think about all that much of the memory.
But your generational garbage collector still has to copy those very large
objects to the new heap, with all the cache-hostile consequences therefrom.
By the way, isn’t this the opposite of the array-of-pointers example you
were using earlier to try to cast reference-counting in a bad light? It
seems to me a reference count would work very well for such a large, simple
object.
>> This is the continuing problem with garbage collection: all the attempts
>> to make it cheaper just end up moving the costs somewhere else.
>
> Same thing with manual allocation. That moves the costs off the
> computer and onto the programmer. Not good, most of the time.
Unfortunately, your argument falls down. It is a truism that hardware costs
continue to come down, while programmers remain expensive. As I said before,
computing performance has improved by something like five orders of
magnitude over the last half-century. This has rendered all kinds of
techniques, like high-level languages, dynamic memory allocation,
stacks, hardware floating-point, memory protection and so on, which were
once considered controversial because of their expense, cheap enough to
become commonplace.
But not garbage collection. This is because of the asymmetric way in which
hardware has become faster: the biggest improvements have been in the parts
that were already the fastest to begin with (the CPU), while RAM speeds have
improved much less, and backing-store speeds least of all. Hence the need
for intermediate layers of cache to bridge the gap. But the effectiveness of
that caching depends crucially on certain assumptions about the runtime
behaviour of the programs: and garbage collection breaks those assumptions.
> Really, I'm no gc expert, but the stuff you're saying about gc is quite
> ill-informed. You might want to check out some current literature.
You may want to enlighten yourself by meditating on this seeming paradox of
modern computing hardware: memory is cheap, but accessing memory is
expensive.
More information about the Python-list
mailing list