[Python-3000] string C API

Marcin 'Qrczak' Kowalczyk qrczak at knm.org.pl
Sat Sep 16 11:53:51 CEST 2006

Greg Ewing <greg.ewing at canterbury.ac.nz> writes:

> That places a burden on all creators of strings to ensure
> that they are in the minimal format, which could be
> inconvenient for some operations, e.g. taking a substring
> could require making an extra pass to re-code the data.

Yes, but taking a substring already requires a linear time wrt. the
length of the substring.

Allocation a string from a C array of wide characters (which
determines the format from the contents) will be written once and
called as a function.

Most strings are ASCII, so most of the time there is no need to check
whether the substring could become even narrower.

> It would also preclude the possibility of representing
> a substring as a view.

If views were implemented on the level of C pointers, then views would
not have the property of being in the canonical representation wrt.
character width. It's still valuable I think to use a more compact
representation if it would affect most strings.

> I don't see any great advantage given by this restriction
> anyway.

Keeping the canonical representation is not very important. It just
ensures that the advantage of having a more compact representation
taken as often as possible, even if the string has been cut from
another string which contained a wide character.

   __("<         Marcin Kowalczyk
   \__/       qrczak at knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/

More information about the Python-3000 mailing list