[Python-Dev] PEP 393: Flexible String Representation

"Martin v. Löwis" martin at v.loewis.de
Thu Jan 27 22:42:39 CET 2011


>>From my first impression, I'm
> not too thrilled by the prospect of making the Unicode implementation
> more complicated by having three different representations on each
> object.

Thanks, added as a concern.

> I also don't see how this could save a lot of memory. As an example
> take a French text with say 10mio code points. This would end up
> appearing in memory as 3 copies on Windows: one copy stored as UCS2 (20MB),
> one as Latin-1 (10MB) and one as UTF-8 (probably around 15MB, depending
> on how many accents are used). That's a saving of -10MB compared to
> today's implementation :-)

As others have pointed out: that's not how it works. It actually *will*
save memory, since the alternative representations are optional.

Regards,
Martin


More information about the Python-Dev mailing list