[Python-3000] PyUnicodeObject implementation

James Y Knight foom at fuhm.net
Mon Sep 8 02:23:21 CEST 2008


On Sep 7, 2008, at 6:55 PM, Guido van Rossum wrote:
>> One possibility that occurs to me is to use a PyVarObject variant  
>> that
>> allocates space for an additional void pointer before the variable  
>> sized
>> section of the object. The builtin type would leave that pointer  
>> NULL,
>> but subtypes could perform the second allocation needed to populate  
>> it.
>>
>> The question is whether the 4-8 bytes wasted per object would be  
>> worth
>> the fact that only one memory allocation would be needed.
>
> I believe that 4-8 bytes is more than the overhead of an extra memory
> allocation from the obmalloc heap. It is probably about the same as
> the overhead for a memory allocation from the regular malloc heap. So
> for short strings (of which there are often a lot) it would be more
> expensive; for longer objects it would probably work out just about
> the same.
>
> There could be a different approach though, whereby the offset from
> the start of the object to the start of the character array wasn't a
> constant but a value stored in the class object. (In fact,
> tp_basicsize could probably be used for this.) It would slow down
> access to the characters a bit though -- a classic time-space
> trade-off that would require careful measurement in order to decide
> which is better.


Given that you can, today, subclass str in Python, without wasting an  
extra 4/8 bytes of memory, or adding anything new to the class object,  
why wouldn't anyone who really wanted to make a hypothetical optimized  
subclass just use the same mechanism (putting your additional data  
*after* the character data) to subclass it in C?

It may be a little tricky, but not exactly rocket science, and given  
that all these C subclasses of str are so far hypothetical, just  
leaving it as "it's possible" seems perfectly reasonable...

James


More information about the Python-3000 mailing list