[Python-Dev] New Py_UNICODE doc
"Martin v. Löwis"
martin at v.loewis.de
Mon May 9 06:59:59 CEST 2005
Nicholas Bastin wrote:
>> Changing the documentation that goes along with the option
>> would be fine.
>
>
> That is exactly what I proposed originally, which you shot down. Please
> actually read the contents of my messages. What I said was "change the
> configure option and related documentation".
What I mean is "change just the documentation, do not change the
configure option". This seems to be different from your proposal,
which I understand as "change both the configure option and the
documentation".
> Wow, what an inane way of looking at it. I don't know what world you
> live in, but in my world, users read the configure options and suppose
> that they mean something. In fact, they *have* to go off on their own
> to assume something, because even the documentation you refer to above
> doesn't say what happens if they choose UCS-2 or UCS-4. A logical
> assumption would be that python would use those CEFs internally, and
> that would be incorrect.
Certainly. That's why the documentation should be improved. Changing
the option breaks existing packaging systems, and should not be done
lightly.
> The current implementation supports the UTF-16 CEF. i.e., it supports a
> variable width encoding form capable of representing all of the unicode
> space using surrogate pairs. Please point out a code point that the
> current 2 byte implementation does not support, either directly, or
> through the use of surrogate pairs.
Try to match regular expression classes for non-BMP characters:
>>> re.match(u"[\u1234]",u"\u1234").group()
u'\u1234'
works fine, but
>>> re.match(u"[\U00011234]",u"\U00011234").group()
u'\ud804'
gives strange results.
Regards,
Martin
More information about the Python-Dev
mailing list