[Python-Dev] Re: Re: Re: AlternativeImplementation forPEP292:SimpleString Substitutions

Gareth McCaughan gmccaughan at synaptics-uk.com
Thu Sep 9 10:39:41 CEST 2004


Marc-Andre Lemburg wrote:

> In todays globalized world, the only sane way to deal with
> different scripts is through Unicode, which is why I
> believe that text data should eventually always be stored in
> Unicode objects - regardless of whether it takes more memory
> or not.
> 
> (If you compare development time to prices of a few GB extra
> RAM, the effort needed to maintain text in non-Unicode
> formats simply doesn't pay off anymore.)

This is not as obvious as it seems, because the "few GB
extra RAM" is a price paid by everyone who *uses* the
software. Granted, it's quite common for software to be
only run ever on one or two machines in the company where
it was developed, but not all software is used that way.

Also: the price of "a few GB extra RAM" is not always as
low as it seems. If adding 2GB means moving from 3GB to
5GB, it may mean replacing the CPU and the OS.

That said, I strongly agree that all textual data should
be Unicode as far as the developer is concerned; but, at
least in the USA :-), it makes sense to have an optimized
representation that saves space for ASCII-only text, just
as we have an optimized representation for small integers.
(The benefit is potentially much greater in that case,
though.)

-- 
g




More information about the Python-Dev mailing list