[Python-3000] string C API

Josiah Carlson jcarlson at uci.edu
Sat Sep 16 10:22:43 CEST 2006


Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> 
> Josiah Carlson wrote:
> > Because all text objects are internally
> > represented in its minimal 'encoding', equal text objects will always be
> > in the same encoding.
> 
> That places a burden on all creators of strings to ensure
> that they are in the minimal format, which could be
> inconvenient for some operations, e.g. taking a substring
> could require making an extra pass to re-code the data.

If Martin says it's not a big deal, I'm not really all that concerned.


> It would also preclude the possibility of representing
> a substring as a view.

It doesn't preclude views.  Every operation works as before, only now
one would need to compare contents even on unequal-width code points.


> I don't see any great advantage given by this restriction
> anyway. So you could tell two strings were unequal in
> some cases if they happened to have different storage
> formats, but there would still be plenty of cases
> where you did have to compare them. Doesn't look like
> a big deal to me.

It is ultimately about space savings, and in the case of names (since
all will be 8-bit), perhaps even a bit faster to look up in the
interning table (I believe it is easier to hash 8 chars than 8 shorts).

 - Josiah



More information about the Python-3000 mailing list