[Python-Dev] New Py_UNICODE doc
Nicholas Bastin
nbastin at opnet.com
Sun May 8 20:40:40 CEST 2005
On May 8, 2005, at 5:15 AM, Martin v. Löwis wrote:
> 'configure takes an option --enable-unicode, with the possible
> values "ucs2", "ucs4", "yes" (equivalent to no argument),
> and "no" (equivalent to --disable-unicode)'
>
> *THIS* documentation would break. This documentation is factually
> correct at the moment (configure does indeed take these options),
> and people rely on them in automatic build processes. Changing
> configure options should not be taken lightly, even if they
> may result from a "wrong mental model". By that rule, --with-suffix
> should be renamed to --enable-suffix, --with-doc-strings to
> --enable-doc-strings, and so on. However, the nitpicking that
> underlies the desire to rename the option should be ignored
> in favour of backwards compatibility.
>
> Changing the documentation that goes along with the option
> would be fine.
That is exactly what I proposed originally, which you shot down.
Please actually read the contents of my messages. What I said was
"change the configure option and related documentation".
>> It provides more than minimum value - it provides the truth.
>
> No. It is just a command line option. It could be named
> --enable-quirk=(quork|quark), and would still select UTF-16.
> Command line options provide no truth - they don't even
> provide statements.
Wow, what an inane way of looking at it. I don't know what world you
live in, but in my world, users read the configure options and suppose
that they mean something. In fact, they *have* to go off on their own
to assume something, because even the documentation you refer to above
doesn't say what happens if they choose UCS-2 or UCS-4. A logical
assumption would be that python would use those CEFs internally, and
that would be incorrect.
>>> With --enable-unicode=ucs2, Python's Py_UNICODE does *not* start
>>> supporting the full Unicode ccs the same way it supports UCS-2.
>>
>> I can't understand what you mean by this. My point is that if you
>> configure python to support UCS-2, then it SHOULD NOT support
>> surrogate
>> pairs. Supporting surrogate paris is the purvey of variable width
>> encodings, and UCS-2 is not among them.
>
> So you suggest to renaming it to --enable-unicode=utf16, right?
> My point is that a Unicode type with UTF-16 would correctly
> support all assigned Unicode code points, which the current
> 2-byte implementation doesn't. So --enable-unicode=utf16 would
> *not* be the truth.
The current implementation supports the UTF-16 CEF. i.e., it supports
a variable width encoding form capable of representing all of the
unicode space using surrogate pairs. Please point out a code point
that the current 2 byte implementation does not support, either
directly, or through the use of surrogate pairs.
--
Nick
More information about the Python-Dev
mailing list