memory leak with large list??

Sun Jan 26 12:05:01 EST 2003

[Bengt Richter]
> Why do you need to duplicate the type pointer for all those? I.e., if
> you allocated space in decent-size arrays of object representations
> without type pointer, and just reserved a header slot in front of the
> array, ...

[John Machin]
> You'll have to pardon me if my brain appears fried (possible; 40
> degrees (Celsius) here yesterday) but I don't understand Bengt's
> question, nor Tim's answer.
>
> Shouldn't the answer be: (a) You can't do that with the Python "list"
> container, which can contain objects of *any* type (b) You can do that
> with a different container, if you restrict all the objects in the
> container to being of the one type (c) That has been done, for many
> types; see the "array" module.

Bengt is describing a space-saving trick that relies on address calculation
to deduce the type of an object.  Suppose instead of having a pointer to a
type object at the start of every Python object, we had no per-object type
pointers at all.  Instead all of memory is carved into (say) 128KB chunks,
each aligned (this part is crucial) at an address evenly divisible by 128K.
Only the first 4 bytes of a chunk contain a type pointer.  Each chunk can
hold objects of only a single type, and all objects in a chunk share the
4-byte type pointer at the start of the chunk.  To get from an address p to
its type, then, you mask off the low bits to get to the closest-preceding
128KB boundary, and read up the 4 bytes at that address to get a pointer to
the type object.

Python doesn't do this, and it can't be done sticking to standard C:  the
address calculations required rely on architecture assumptions that aren't
always true (for example, word-addressed machines screw it up a little, and
tagged architectures using low-order address bits for metadata screw it
royally).  Python's small-object allocator nevertheless does "something like
it", carving large chunks of storage into aligned "pools", and storing
common pool info at pool-aligned addresses.  The pool info for an object is
later gotten via address calculation.

If Python is ported to a box where that fails, the Python small-object
allocator can be disabled via config fiddling.  It's also the case that,
even when it's enabled, Python doesn't insist than anyone (meaning 3rd-party
extension modules) use Python's small-object allocator.  If they want to
allocate extension objects with the platform malloc, or their own malloc, or
whatever, Python doesn't care.  Forcing everyone to use Python's allocator
wouldn't fly, and if getting a pointer to a type object *required* address
calculation, Python would have to insist that everyone use Python's memory
layer.

All in all, it's a lot easier if people just buy more RAM <0.9 wink>.