[Python-Dev] Internationalization Toolkit
M.-A. Lemburg
mal@lemburg.com
Wed, 10 Nov 1999 14:13:10 +0100
Jean-Claude Wippler wrote:
>
> Greg Stein wrote:
> [MAL:]
> > > The downside of using UTF16: it is a variable length format,
> > > so iterations over it will be slower than for UCS4.
> >
> > Bzzt. May as well go with UTF-8 as the internal format, much like Perl
> > is doing (as I recall).
>
> Ehm, pardon me for asking - what is the brief rationale for selecting
> UCS2/4, or whetever it ends up being, over UTF8?
UCS-2 is the native format on major platforms (meaning straight
fixed length encoding using 2 bytes), ie. interfacing between
Python's Unicode object and the platform APIs will be simple and
fast.
UTF-8 is short for ASCII users, but imposes a performance
hit for the CJK (Asian character sets) world, since UTF8 uses
*variable* length encodings.
--
Marc-Andre Lemburg
______________________________________________________________________
Y2000: 51 days left
Business: http://www.lemburg.com/
Python Pages: http://www.lemburg.com/python/