[Python-Dev] PEP 393: Special-casing ASCII-only strings

Nick Coghlan ncoghlan at gmail.com
Fri Sep 16 00:42:25 CEST 2011


On Fri, Sep 16, 2011 at 7:39 AM, "Martin v. Löwis" <martin at v.loewis.de> wrote:
> Thinking about this, the following may work:
> - ASCIIObject: state, length, hash, wstr*, data follow
> - SingleBlockUnicode: ASCIIObject, wstr_len,
>                      utf8*, utf8_len, data follow
> - UnicodeObject: SingleBlockUnicode, data pointer, no data follow
>
> This is essentially your proposal, except that the wstr_len is dropped for
> ASCII strings, and that it uses nested structs.
>
> The single-block variants would always be "ready", the full unicode object
> is ready only if the data pointer is set.

In your "UnicodeObject" here, is the 'data pointer' the
any/latin1/ucs2/ucs4 union from the original structure definition?

Also, what are the constraints on the "SingleBlockUnicode"? Does it
only hold strings that can be represented in latin1? Or can the size
of the individual elements be more than 1 byte?

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


More information about the Python-Dev mailing list