[Python-Dev] Int FutureWarnings and other 2.4 TODOs
Tim Peters
tim.one at comcast.net
Fri Dec 5 13:05:39 EST 2003
[Tim]
>> BTW, using pymalloc instead wouldn't eliminate that allocated memory
>> is never returned to the OS. pymalloc grabs large "arenas" from
>> malloc(), and never free()s them. The fundamental difference is
>> that pymalloc is type-agnostic: space gotten from pymalloc for,
>> say, a dict, can later be reused for, say, a string. pymalloc's
>> freelists are segregated by raw memory size rather than by type, and
>> a given chunk of pymalloc memory may even move from one size list to
>> another (any number of times).
[Guido]
> But there aren't a lot of other objects that fit in the size of an
> int, right?
That's true, but doesn't matter <wink>.
> I know of no other types that are <= 12 bytes on a 32-bit machine, and
> assuming pymalloc rounds up, only one that is <= 16 (float). There
> may be some obscure ones (super may be only 16 bytes too) but they'll
> never need that many instances to fill the void left behind by range
> (10000000).
>
> (I'm assuming that pymalloc doesn't join adjacent small blocks to form
> a larger block
That's the part that's missing -- it does, but cleverly and efficiently.
> -- that way lies a full malloc reimplementation, and what's the
> point of that...)
Large "arenas" are cut into many equal-sized "pools", and pools are in turn
cut into equal-size objects. All objects within a given pool have the same
size. When all the objects in a given pool are freed, the entire pool is
available for reuse by objects of any size. There's a little pressure
favoring reusing an entirely-free pool for objects of the same size as it
was last used for, because that saves some overhead (for example, some
size-dependent info is computed and stored in the pool header, and that part
can be skipped if the pool is used for objects of the same size as last
time). pymalloc will change the size associated with an entirely-free pool
before it will ask for another arena, though.
So, e.g., if ints used pymalloc, range(10000000) would carve out enough
pools to hold 10 million ints, but when range(10000000) went away almost all
those pools would become entirely free, and so available for reuse by
objects of any size. The coalescing here is *not* done on an
object-by-object basis. Instead the pool header keeps a count of the number
of slots in that pool currently allocated. When an object in the pool is
freed, that count is decremented. If the count becomes 0 then, the entire
pool can be recycled without further ado, in constant time. If the count
does not become 0 then, the newly-freed object is put at the head of a
singly-linked list of free objects within the pool, which is also a
constant-time operation (IOW, the objects within a pool act a lot like the
dedicated int freelist; if the pool is recycled to hold objects of a
different size, the freelist links within it are simply ignored and
overwritten).
A more-general malloc has to muck around asking whether the objects on
either side of a newly-freed object are also free, so that it can coalesce
them. pymalloc never does that. OTOH, pymalloc is emphatically not a
general-purpose allocator: it doesn't even try to deal with requests for
more than about 256 bytes (those get passed on to the system malloc). It
relies (for performance) on that the vast bulk of memory requests Python
makes are for small chunks of memory. Vladimir did a great job of designing
this! Ignoring cache effects, the dedicated int freelist is faster. I'm
not sure it's always faster if cache effects are considered too (it's
complex, and pymalloc is very cache-conscious).
More information about the Python-Dev
mailing list