cached encoding (Re: [Python-Dev] Internationalization Toolkit)

Fredrik Lundh fredrik@pythonware.com
Wed, 10 Nov 1999 09:24:16 +0100


Guido van Rossum <guido@CNRI.Reston.VA.US> wrote:
> One specific question: in you discussion of typed strings, I'm not
> sure why you couldn't convert everything to Unicode and be done with
> it.  I have a feeling that the answer is somewhere in your case study
> -- maybe you can elaborate?

Marc-Andre writes:

    Unicode objects should have a pointer to a cached (read-only) char
    buffer <defencbuf> holding the object's value using the current
    <default encoding>.  This is needed for performance and internal
    parsing (see below) reasons. The buffer is filled when the first
    conversion request to the <default encoding> is issued on the object.

keeping track of an external encoding is better left
for the application programmers -- I'm pretty sure that
different application builders will want to handle this
in radically different ways, depending on their environ-
ment, underlying user interface toolkit, etc.

besides, this is how Tcl would have done it.  Python's
not Tcl, and I think you need *very* good arguments
for moving in that direction.

</F>