[Python-3000] string C API
Josiah Carlson
jcarlson at uci.edu
Sat Sep 16 10:22:43 CEST 2006
Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
>
> Josiah Carlson wrote:
> > Because all text objects are internally
> > represented in its minimal 'encoding', equal text objects will always be
> > in the same encoding.
>
> That places a burden on all creators of strings to ensure
> that they are in the minimal format, which could be
> inconvenient for some operations, e.g. taking a substring
> could require making an extra pass to re-code the data.
If Martin says it's not a big deal, I'm not really all that concerned.
> It would also preclude the possibility of representing
> a substring as a view.
It doesn't preclude views. Every operation works as before, only now
one would need to compare contents even on unequal-width code points.
> I don't see any great advantage given by this restriction
> anyway. So you could tell two strings were unequal in
> some cases if they happened to have different storage
> formats, but there would still be plenty of cases
> where you did have to compare them. Doesn't look like
> a big deal to me.
It is ultimately about space savings, and in the case of names (since
all will be 8-bit), perhaps even a bit faster to look up in the
interning table (I believe it is easier to hash 8 chars than 8 shorts).
- Josiah
More information about the Python-3000
mailing list