[Python-Dev] just say no...

Fredrik Lundh fredrik@pythonware.com
Fri, 12 Nov 1999 12:23:24 +0100


> Besides, the Unicode object will have a buffer containing the
> <default encoding> representation of the object, which, if all goes
> well, will always hold the UTF-8 value.

<rant>

over my dead body, that one...

(fwiw, over the last 20 years, I've implemented about a
dozen image processing libraries, supporting loads of
pixel layouts and file formats.  one important lesson
from that is to stick to a single internal representation,
and let the application programmers build their own
layers if they need to speed things up -- yes, they're
actually happier that way.  and text strings are not
that different from pixel buffers or sound streams or
scientific data sets, after all...)

(and sticks and modes will break your bones, but you
know that...)

> RE engines etc. can then directly work with this buffer.

sidebar: the RE engine that's being developed for this
project can handle 8-bit, 16-bit, and (optionally) 32-bit
text buffers. a single compiled expression can be used
with any character size, and performance is about the
same for all sizes (at least on any decent cpu).

> > I expect either would work well.  It's at least curious that Perl and Tcl
> > both went with UTF-8 -- does anyone think they know *why*?  I don't.  The
> > people here saying UCS-2 is the obviously better choice are all from the
> > Microsoft camp <wink>.

(hey, I'm not a microsofter.  but I've been writing "i/o
libraries" for various "object types" all my life, so I do
have strong preferences on what works, and what
doesn't...  I use Python for good reasons, you know ;-)

</rant>

thanks.  I feel better now.

</F>