[I18n-sig] Unicode surrogates: just say no!

M.-A. Lemburg mal@lemburg.com
Tue, 26 Jun 2001 15:08:33 +0200


Toby Dickenson wrote:
> 
> On Tue, 26 Jun 2001 04:51:38 -0400, Guido van Rossum
> <guido@digicool.com> wrote:
> 
> >I see only one remaining argument against choosing 3 over 2: FUD about
> >disk and promary memory space usage.
> 
> In previous discussion about unifying plain strings an unicode
> strings, someone (I forget who, sorry) proposed that a unified string
> type that would store its data in arrays of either 1 or 2 byte
> elements (depending what was efficient for each string) but provide a
> unified interface independant of storage option.
> 
> Could the same option be used to support an option E, individual
> strings use UCS-4 if they have to, but otherwise gain the space
> advantages of UCS-2?

This makes the implementation more complicated: e.g. SRE
would then have to be provided in three flavours: 8-bit, 16-bit
and 32-bit. Same for most of the codecs.

Maintenance will become a nightmare, the Python interpreter will
put on wheight and we will probably not gain much w/r to overall
memory usage (external storage will use one of the encodings
which can be chosen on an per-application basis).
 
-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/