[issue2630] repr() should not escape non-ASCII characters

Atsuo Ishimoto report at bugs.python.org
Tue Jun 3 13:00:52 CEST 2008


Atsuo Ishimoto <ishimoto at gembook.org> added the comment:

Thank you for your review! 
I filed a new patch just before I see your comments.

On Tue, Jun 3, 2008 at 7:13 PM, Georg Brandl <report at bugs.python.org> wrote:
>
> Georg Brandl <georg at python.org> added the comment:
>
> Review:
>
> * Why is an empty string not printable? In any case, the empty string
> should be among the test cases for isprintable().

Well, my intuition came from str.islower() was wrong. An empty string is
printable, of cource.

> * Why not use PyUnicode_DecodeASCII instead of
> PyUnicode_FromEncodedObject? It should be a bit faster.
>

Okay, thank you.

> * If old-style string formatting gets "%a", .format() must get a "!a"
> specifier.
>
I added the format string in my latest patch.

> * The ascii() and repr() tests should be expanded so that both test the
> same set of objects, and the expected differences. Are there tests for
> failing cases?
>

Okay, thank you.

> * This is just "return ascii" (in builtin_ascii):
> +       if (ascii == NULL)
> +           return NULL;
> +
> +       return ascii;

Fixed in my latest patch.

>
> * For PyBool_FromLong(1) and PyBool_FromLong(0) there is Py_RETURN_TRUE
> and Py_RETURN_FALSE. (You're not to blame, the rest of unicodeobject.c
> seems to use them too, probably a legacy.)

Okay, thank you.

>
> * There appear to be some space indentations in tab-indented files like
> bltinmodule.c and vice versa (unicodeobject.c).
>

I think bltinmodule.c is fixed with latest patch, but I don't know what
is correct indentation for unicodeobject.c. I guess latest patch is
acceptable.

> * C docs/isprintable() docs: The spec
> +   Characters defined in the Unicode character database as "Other"
> +   or "Separator" other than ASCII space(0x20) are not considered
> +   printable.
> is unclear, better say "All character except those ... are considered
> printable".
>
> * ascii() docs:
> +   the non-ASCII
> +   characters in the string returned by :func:`ascii`() are hex-escaped
> +   to generate a same string as :func:`repr` in Python 2.
>
> should be
>
> "the non-ASCII characters in the string returned by :func:`repr` are
> backslash-escaped (with ``\x``, ``\u`` or ``\U``) to generate ...".
>

Okay, thank you.

> * makeunicodedata: len(list(n for n in names if n is not None)) could
> better be expressed as sum(1 for n in names if n is not None).

I don't want to change here, because this is reversion of rev 63378.

> One more thing: with r63891 the encoding and errors arguments for the
> creation of sys.stderr were made configurable; you'll have to adapt the
> patch so that it defaults to backslashescape but can be overridden by
> PYTHONIOENCODING.

I think sys.stderr should be default to 'backslashreplace' always. I'll
post a messege to Py3k-list later.

>
> Otherwise, the patch is fine IMO. (I'm surprised that only so few tests
> needed adaptation, that's a sign that we're not testing Unicode enough.)
>

Thank you very much! I'll file new patch soon.

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue2630>
_______________________________________


More information about the Python-bugs-list mailing list