[Python-3000] PyUnicodeObject implementation
James Y Knight
foom at fuhm.net
Mon Sep 8 02:23:21 CEST 2008
On Sep 7, 2008, at 6:55 PM, Guido van Rossum wrote:
>> One possibility that occurs to me is to use a PyVarObject variant
>> that
>> allocates space for an additional void pointer before the variable
>> sized
>> section of the object. The builtin type would leave that pointer
>> NULL,
>> but subtypes could perform the second allocation needed to populate
>> it.
>>
>> The question is whether the 4-8 bytes wasted per object would be
>> worth
>> the fact that only one memory allocation would be needed.
>
> I believe that 4-8 bytes is more than the overhead of an extra memory
> allocation from the obmalloc heap. It is probably about the same as
> the overhead for a memory allocation from the regular malloc heap. So
> for short strings (of which there are often a lot) it would be more
> expensive; for longer objects it would probably work out just about
> the same.
>
> There could be a different approach though, whereby the offset from
> the start of the object to the start of the character array wasn't a
> constant but a value stored in the class object. (In fact,
> tp_basicsize could probably be used for this.) It would slow down
> access to the characters a bit though -- a classic time-space
> trade-off that would require careful measurement in order to decide
> which is better.
Given that you can, today, subclass str in Python, without wasting an
extra 4/8 bytes of memory, or adding anything new to the class object,
why wouldn't anyone who really wanted to make a hypothetical optimized
subclass just use the same mechanism (putting your additional data
*after* the character data) to subclass it in C?
It may be a little tricky, but not exactly rocket science, and given
that all these C subclasses of str are so far hypothetical, just
leaving it as "it's possible" seems perfectly reasonable...
James
More information about the Python-3000
mailing list