[Python-Dev] PEP 393: Flexible String Representation

Antoine Pitrou solipsis at pitrou.net
Tue Jan 25 00:20:45 CET 2011


Le mardi 25 janvier 2011 à 00:07 +0100, "Martin v. Löwis" a écrit :
> >> I'd like to propose PEP 393, which takes a different approach,
> >> addressing both problems simultaneously: by getting a flexible
> >> representation (one that can be either 1, 2, or 4 bytes), we can
> >> support the full range of Unicode on all systems, but still use
> >> only one byte per character for strings that are pure ASCII (which
> >> will be the majority of strings for the majority of users).
> > 
> > For this kind of experiment, I think a concrete attempt at implementing
> > (together with performance/memory savings numbers) would be much more
> > useful than an abstract proposal.
> 
> I partially agree. An implementation is certainly needed, but there is
> nothing wrong (IMO) with designing the change before implementing it.
> Also, several people have offered to help with the implementation, so
> we need to agree on a specification first (which is actually cheaper
> than starting with the implementation only to find out that people
> misunderstood each other).

I'm not sure it's really cheaper. When implementing you will probably
find out that it makes more sense to change the meaning of some fields,
add or remove some, etc. You will also want to try various tweaks since
the whole point is to lighten the footprint of unicode strings in common
workloads.

So, the only criticism I have, intuitively, is that the unicode
structure seems to become a bit too large. For example, I'm not sure you
need a generic (pointer, size) pair in addition to the
representation-specific ones.

Incidentally, to slightly reduce the overhead the unicode objects,
there's this proposal: http://bugs.python.org/issue1943

Regards

Antoine.




More information about the Python-Dev mailing list