[Python-Dev] PEP 393: Flexible String Representation

Stefan Behnel stefan_ml at behnel.de
Sat Jan 29 08:48:18 CET 2011


"Martin v. Löwis", 24.01.2011 21:17:
> I have been thinking about Unicode representation for some time now.
> This was triggered, on the one hand, by discussions with Glyph Lefkowitz
> (who complained that his server app consumes too much memory), and Carl
> Friedrich Bolz (who profiled Python applications to determine that
> Unicode strings are among the top consumers of memory in Python).
> On the other hand, this was triggered by the discussion on supporting
> surrogates in the library better.
>
> I'd like to propose PEP 393, which takes a different approach,
> addressing both problems simultaneously: by getting a flexible
> representation (one that can be either 1, 2, or 4 bytes), we can
> support the full range of Unicode on all systems, but still use
> only one byte per character for strings that are pure ASCII (which
> will be the majority of strings for the majority of users).
>
> You'll find the PEP at
>
> http://www.python.org/dev/peps/pep-0393/

After much discussion, I'm +1 for this PEP. Implementation and benchmarks 
are pending, but there are strong indicators that it will bring relief for 
the memory overhead of most applications without leading to a major 
degradation performance-wise. Not for Python code anyway, and I'll try to 
make sure Cython extensions won't notice much when switching to CPython 3.3.

Martin, this is a smart way of doing it.

Stefan



More information about the Python-Dev mailing list