[Python-Dev] PEP 393: Flexible String Representation
mal at egenix.com
Tue Jan 25 23:43:52 CET 2011
I'll comment more on this later this week...
>From my first impression, I'm
not too thrilled by the prospect of making the Unicode implementation
more complicated by having three different representations on each
I also don't see how this could save a lot of memory. As an example
take a French text with say 10mio code points. This would end up
appearing in memory as 3 copies on Windows: one copy stored as UCS2 (20MB),
one as Latin-1 (10MB) and one as UTF-8 (probably around 15MB, depending
on how many accents are used). That's a saving of -10MB compared to
today's implementation :-)
"Martin v. Löwis" wrote:
> I have been thinking about Unicode representation for some time now.
> This was triggered, on the one hand, by discussions with Glyph Lefkowitz
> (who complained that his server app consumes too much memory), and Carl
> Friedrich Bolz (who profiled Python applications to determine that
> Unicode strings are among the top consumers of memory in Python).
> On the other hand, this was triggered by the discussion on supporting
> surrogates in the library better.
> I'd like to propose PEP 393, which takes a different approach,
> addressing both problems simultaneously: by getting a flexible
> representation (one that can be either 1, 2, or 4 bytes), we can
> support the full range of Unicode on all systems, but still use
> only one byte per character for strings that are pure ASCII (which
> will be the majority of strings for the majority of users).
> You'll find the PEP at
> For convenience, I include it below.
Professional Python Services directly from the Source (#1, Jan 25 2011)
>>> Python/Zope Consulting and Support ... http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
::: Try our new mxODBC.Connect Python Database Interface for free ! ::::
eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
Registered at Amtsgericht Duesseldorf: HRB 46611
More information about the Python-Dev