[I18n-sig] Re: [Python-Dev] Pre-PEP: Python Character Model

Martin v. Loewis martin@loewis.home.cs.tu-berlin.de
Thu, 8 Feb 2001 01:27:29 +0100


> I'm not clear on the status of the concept of "default charater set."
> First, I think you mean "default character encoding". 

Both encoding and character set, yes. I disagree with the notion that
any encoding is a Unicode encoding, since not all encodings can
represent all of Unicode; nor where they originally designed to encode
Unicode.

> Second, I thought that that idea was removed from user-view at
> least, wasn't it?

Yes, unless you modify sitecustomize.py.

> I was thinking that we would use that slot to hold the
> char->ord->char conversion (which you can interpret as Latin-1 or
> not depending on your philosophy).

I would interpret it that way. What do you do about t# conversions,
then?

> The documentation says that the PyString_AsString and PyString_AS_STRING
> buffers must never be modified. I forgot that the "real" protocol is
> that that buffer can be modified. We'll need to copy its contents back
> to the Unicode string before the next operation that uses the Unicode
> value. Not rocket science but somewhat tedious.

This scheme is easy to break; the application could hold onto the
pointer and start using the object already. It remains to be seen
whether existing code would break; this I can only speculate about as
I don't know the exact scheme that you have in mind.

Regards,
Martin