a huge shared read-only data in parallel accesses -- How? multithreading? multiprocessing?

Klauss klaussfreire at gmail.com
Thu Jan 7 17:18:34 EST 2010


On Dec 31 2009, 6:36 pm, garyrob <gary... at mac.com> wrote:
> One thing I'm not clear on regarding Klauss' patch. He says it's
> applicable where the data is primarily non-numeric. In trying to
> understand why that would be the case, I'm thinking that the increased
> per-object memory overhead for reference-counting would outweigh the
> space gains from the shared memory.
>
> Klauss's test code stores a large number of dictionaries which each
> contain just 3 items. The stored items are strings, but short ones...
> it looks like they take up less space than double floats(?).
>
> So my understanding is that the point is that the overhead for the
> dictionaries is big enough that the patch is very helpful even though
> the stored items are small. And that the patch would be less and less
> effective as the number of items stored in each dictionary became
> greater and greater, until eventually the patch might do more use more
> space for reference counting than it saved by shared memory.

Not really.
The real difference is that numbers (ints and floats) are allocated
out of small contiguous pools. So even if a great percentage of those
objects would remain read-only, there's probably holes in those pools
left by the irregular access pattern during initialization, and those
holes would be written to eventually as the pool gets used.

In essence, those pools aren't read-only for other reasons than
reference counting.

Dictionaries, tuples and lists (and many other types) don't exhibit
that behavior.



More information about the Python-list mailing list