[Python-Dev] New Py_UNICODE doc

Nicholas Bastin nbastin at opnet.com
Sun May 8 20:44:00 CEST 2005


On May 8, 2005, at 5:28 AM, Martin v. Löwis wrote:

> Nicholas Bastin wrote:
>> All of my proposals for what to change the documention to have been
>> shot down by Martin.  If someone has better verbiage that they'd like
>> to see, I'd be perfectly happy to patch the doc.
>
> I don't look into the specific wording - you speak English much better
> than I do. What I care about is that this part of the documentation
> should be complete and precise. I.e. statements like "should not make
> assumptions" might be fine, as long as they are still followed by
> a precise description of what the code currently does. So it should
> mention that the representation can be either 2 or 4 bytes, that
> the strings "ucs2" and "ucs4" can be used to select one of them,
> that it is always 2 bytes on Windows, that 2 bytes means that non-BMP
> characters can be represented as surrogate pairs, and so on.

It's not always 2 bytes on Windows.  Users can alter the config options 
(and not unreasonably so, btw, on 64-bit windows platforms).

This goes to the issue that I think people don't understand that we 
have to assume that some users will build their own Python.  This will 
result in 2-byte Python's on RHL9, and 4-byte python's on windows, both 
of which have already been claimed in this discussion to not happen, 
which is untrue.  You can't build a binary extension module on windows 
and assume that Py_UNICODE is 2 bytes, because that's not enforced in 
any way.  The same is true for 4-byte Py_UNICODE on RHL9.

--
Nick



More information about the Python-Dev mailing list