[Python-Dev] New Py_UNICODE doc

"Martin v. Löwis" martin at v.loewis.de
Sun May 8 11:15:45 CEST 2005


Nicholas Bastin wrote:
>> -1. This breaks existing documentation and usage, and provides only
>> minimum value.
> 
> 
> Have you been missing this conversation?  UTF-16 is *WHAT PYTHON
> CURRENTLY IMPLEMENTS*.  The current documentation is flat out wrong. 
> Breaking that isn't a big problem in my book.

The documentation I refer to is the one that says the equivalent of

'configure takes an option --enable-unicode, with the possible
values "ucs2", "ucs4", "yes" (equivalent to no argument),
and  "no" (equivalent to --disable-unicode)'

*THIS* documentation would break. This documentation is factually
correct at the moment (configure does indeed take these options),
and people rely on them in automatic build processes. Changing
configure options should not be taken lightly, even if they
may result from a "wrong mental model". By that rule, --with-suffix
should be renamed to --enable-suffix, --with-doc-strings to
--enable-doc-strings, and so on. However, the nitpicking that
underlies the desire to rename the option should be ignored
in favour of backwards compatibility.

Changing the documentation that goes along with the option
would be fine.

> It provides more than minimum value - it provides the truth.

No. It is just a command line option. It could be named
--enable-quirk=(quork|quark), and would still select UTF-16.
Command line options provide no truth - they don't even
provide statements.

>> With --enable-unicode=ucs2, Python's Py_UNICODE does *not* start
>> supporting the full Unicode ccs the same way it supports UCS-2.
> 
> I can't understand what you mean by this.  My point is that if you
> configure python to support UCS-2, then it SHOULD NOT support surrogate
> pairs.  Supporting surrogate paris is the purvey of variable width
> encodings, and UCS-2 is not among them.

So you suggest to renaming it to --enable-unicode=utf16, right?
My point is that a Unicode type with UTF-16 would correctly
support all assigned Unicode code points, which the current
2-byte implementation doesn't. So --enable-unicode=utf16 would
*not* be the truth.

Regards,
Martin


More information about the Python-Dev mailing list