[Python-3000] PyUnicodeObject implementation
Stefan Behnel
stefan_ml at behnel.de
Tue Sep 9 10:31:33 CEST 2008
Greg Ewing <greg.ewing <at> canterbury.ac.nz> writes:
> > We create a new struct for the type that contains the parent-struct
> > as first field, and then we add the new attributes of the new type behind
> > that.
>
> I seem to remember there's a field in the type called tp_basicsize
> that's meant to indicate how big the base part of the struct is,
> with any variable-size part placed after it.
>
> If a variable-size type always uses this field to find the variable
> data, it seems to me that the usual scheme for subclassing should
> still work, with the extra fields existing in between those of the
> base class and the new position of the variable data.
>
> Does Py_Unicode not take notice of this field? If not, maybe that's
> something that should be fixed.
Look at the layout of PyStringObject. The last entry is a
char* ob_sval[1]
The only purpose of that entry is to point to the buffer. That's also
exploited by PyString_AS_STRING(), a macro that translates to the pointer
deref "s->ob_sval". Subtypes that declare their own members will have them run
into ob_sval.
As you noted, a general solution for this problem would be to replace
PyString_AS_STRING() and the future PyUnicode_AS_DATA() (and, well, all
occurrences of "->ob_sval" in the CPython source code) by
(s + s->tp_basicsize)
But that would have the same impact on all string data access operations as
noted by Martin. I expect that this could be done for the new PyUnicode type
in Py3. The performance impact is relatively small and it removes the C
subclassing problem, so that may be considered a reasonable trade-off.
Regarding Cython (and Pyrex), however, it doesn't solve the problem in general
for the existing Py2 versions that Cython supports (starting from 2.3), so a
portable solution implemented by Cython would still be best.
Stefan
More information about the Python-3000
mailing list