[Python-3000] PyUnicodeObject implementation

Stefan Behnel stefan_ml at behnel.de
Tue Sep 9 10:31:33 CEST 2008


Greg Ewing <greg.ewing <at> canterbury.ac.nz> writes:
> > We create a new struct for the type that contains the parent-struct
> > as first field, and then we add the new attributes of the new type behind
> > that.
> 
> I seem to remember there's a field in the type called tp_basicsize
> that's meant to indicate how big the base part of the struct is,
> with any variable-size part placed after it.
> 
> If a variable-size type always uses this field to find the variable
> data, it seems to me that the usual scheme for subclassing should
> still work, with the extra fields existing in between those of the
> base class and the new position of the variable data.
> 
> Does Py_Unicode not take notice of this field? If not, maybe that's
> something that should be fixed.

Look at the layout of PyStringObject. The last entry is a

    char* ob_sval[1]

The only purpose of that entry is to point to the buffer. That's also 
exploited by PyString_AS_STRING(), a macro that translates to the pointer 
deref "s->ob_sval". Subtypes that declare their own members will have them run 
into ob_sval.

As you noted, a general solution for this problem would be to replace 
PyString_AS_STRING() and the future PyUnicode_AS_DATA() (and, well, all 
occurrences of "->ob_sval" in the CPython source code) by

    (s + s->tp_basicsize)

But that would have the same impact on all string data access operations as 
noted by Martin. I expect that this could be done for the new PyUnicode type 
in Py3. The performance impact is relatively small and it removes the C 
subclassing problem, so that may be considered a reasonable trade-off.

Regarding Cython (and Pyrex), however, it doesn't solve the problem in general 
for the existing Py2 versions that Cython supports (starting from 2.3), so a 
portable solution implemented by Cython would still be best.

Stefan




More information about the Python-3000 mailing list