[I18n-sig] Re: Pre-PEP: Python Character Model

M.-A. Lemburg mal@lemburg.com
Thu, 08 Feb 2001 14:26:02 +0100

Paul Prescod wrote:
> "Martin v. Loewis" wrote:
> >
> > ...
> >
> > So every s and s# conversion would trigger a copying of the
> > string. How is that implemented? Currently, every Unicode object has a
> > reference to a string object that is produced by converting to the
> > default character set. Would it grow another reference to a string
> > object that is carrying the Latin-1-conversion?
> I'm not clear on the status of the concept of "default charater set."
> First, I think you mean "default character encoding". Second, I thought
> that that idea was removed from user-view at least, wasn't it? I was
> thinking that we would use that slot to hold the char->ord->char
> conversion (which you can interpret as Latin-1 or not depending on your
> philosophy).

The extra slot is a merely needed to implement s and s# conversions
since these pass back references to a real C char buffer. Let's
*not* do more of those...
> > Certainly. Applications expect to write to the resulting memory, and
> > expect to change the underlying string; this is valid only if one had
> > been passing NULL to PyString_FromStringAndSize.
> The documentation says that the PyString_AsString and PyString_AS_STRING
> buffers must never be modified. I forgot that the "real" protocol is
> that that buffer can be modified. We'll need to copy its contents back
> to the Unicode string before the next operation that uses the Unicode
> value. Not rocket science but somewhat tedious.

Paul, please have a look at the es and es# conversions -- I think
these do what you have in mind here. Writing to buffers returned
by s or s# is never permitted, you'd have to use w# to get at
a writeable C buffer.

Marc-Andre Lemburg
Company:                                        http://www.egenix.com/
Consulting:                                    http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/