[issue5127] Use Py_UCS4 instead of Py_UNICODE in unicodectype.c
report at bugs.python.org
Thu Jul 8 11:16:03 CEST 2010
Marc-Andre Lemburg <mal at egenix.com> added the comment:
Ezio Melotti wrote:
> Ezio Melotti <ezio.melotti at gmail.com> added the comment:
> [This should probably be discussed on python-dev or in another issue, so feel free to move the conversation there.]
> The current implementation considers printable """all the characters except those characters defined in the Unicode character database as following categories are considered printable.
> * Cc (Other, Control)
> * Cf (Other, Format)
> * Cs (Other, Surrogate)
> * Co (Other, Private Use)
> * Cn (Other, Not Assigned)
> * Zl Separator, Line ('\u2028', LINE SEPARATOR)
> * Zp Separator, Paragraph ('\u2029', PARAGRAPH SEPARATOR)
> * Zs (Separator, Space) other than ASCII space('\x20')."""
> We could also arbitrary exclude all the non-BMP chars, but that shouldn't be based on the availability of the fonts IMHO.
Without fonts, you can't print the code points, even if the Unicode
database defines the code point as not having one of the above
classes. And that's probably also the reason why the Unicode
database doesn't define a printable property :-)
I also find the use of Zl, Zp and Zs in the definition somewhat
arbitrary: whitespace is certainly printable. This also doesn't
match the isprint() C lib API:
"A printable character is any character that is not a control character."
>> Note that Python3 will send printable code points as-is to the
>> console, so whether or not a code point is considered printable
>> should take the common availability of fonts being able to display
>> the code point into account. Otherwise, a user would just see a
>> square box instead of the much more useful escape sequence
> If the concern is about the usefulness of repr() in the console, note that on the Windows terminal trying to display most of the characters results in an error (see #5110), and that makes repr() barely usable.
> ascii() might be an alternative if the user wants to see the escape sequence instead of a square box.
That's a different problem, but indeed also related to the
printable property which was introduced as part of the Unicode repr()
change: if the console encoding cannot represent
the printable code points, you get an error.
I was never a fan of the Unicode repr() change to begin with. The
repr() of an object should work in almost all cases. Being able to
read the repr() of an object in clear text is only secondary.
IMHO, allowing all printable code points to pass through unescaped
was not beneficial. We have str() for getting readable representations
of objects. Anyway, we're stuck with it now, so have to work
around the issues...
Python tracker <report at bugs.python.org>
More information about the Python-bugs-list