[Python-3000] Fwd: UPDATED: PEP 3138- String representation in Python 3000
Atsuo Ishimoto
ishimoto at gembook.org
Thu May 29 08:40:22 CEST 2008
On Tue, May 27, 2008 at 10:06 AM, Jim Jewett <jimjjewett at gmail.com> wrote:
>> * Characters defined in the Unicode character database as "Separator"
>> (Zl, Zp, Zs) other than ASCII space(0x20).
>
> Please put in a note that Zl and Zp refer only to two specific
> unicode characters, not to what most people think of as line
> separators or paragraph markers.
Thank you for suggestion.
>
>> * Backslash-escape quote characters(apostrophe, ') and add quote
>> character at the beginning and the end.
>
> Do you just mean the two ASCII quotation marks that python uses?
No, just an apostrophe(') as current Python.
>
> As written, I wondered whether it would include backquote or guillemet.
Proposal to change repr() for these character is not included in this
PEP, although I don't know what guillemet is.
>
>> - Add ``'%a'`` string format operator. ``'%a'`` converts any python
>> object to string using ``repr()`` and then hex-escape all non-ASCII
>> characters. ``'%a'`` operator generates same string as ``'%r'`` in
>> Python 2.
>
> Then why not keep the old %r, and add a new one for the unicode repr?
>
repr() and "%r" should be consistent with object's __repr()__ function.
> Is it again because of the bug where str([..., mystr, ...]) ends up
> doing repr on mystr?
I don't think it a bug, as other people described.
>
>> - Add ``ascii()`` builtin function. ``ascii()`` converts any python
>> object to string using ``repr()`` and then hex-escape all non-ASCII
>> characters. ``ascii()`` generates same string as ``repr()`` in Python 2.
>
> The problem isn't that I want to be able to write code that acts the
> old way; the problem is that I want to ensure all code running on my
> system acts the old way.
> Adding an ascii() function doesn't help.
I can understand your worry to possible code breakage, but still I
think this PEP is right thing for Python 3000. ascii() may make
porting code to Python 3000 easier a bit.
>
>> Strings to be printed for debugging are not only contained by lists or
>> dicts, but also in many other types of object. File objects contain a
>> file name in Unicode, exception objects contain a message in Unicode,
>> etc. These strings should be printed in readable form when repr()ed.
>> It is unlikely to be possible to implement a tool to print all
>> possible object types.
>
> You could go a long way (particularly in Py3k, where everything
> inherits from object) by changing the builtin containers, and changing
Changing builtin containers is not sufficient, so the way would be too
long to be practical. Do you wish to override __repr__() method of all
types you encounter?
>> - Make the encoding used by ``unicode_repr()`` adjustable, and make
>> current ``repr()`` as default.
>
>> With adjustable ``repr()``, result of ``repr()`` is unpredictable and
>> would make impossible to write correct code involving ``repr()``.
>
> No more so than 3138. The setting of repr is predictable on a given
> system. (Even if you make it a changeable during a single run, it is
> predictable by checking first.) Across systems, the 3138 proposal is
> already unpredictable, because you don't know which systems will apply
> backslash-replace on which characters (and on which runs).
>
In this PEP, result of repr() is perfectly predictable. The repr()
generates exactly same string among systems. But in general, strings
printed to console, whether generated by repr() or not, are less
predictable. Some characters in the string may be backslash-escaped,
may be replaced by '?' or may raise exception depending on user's
configuration.
More information about the Python-3000
mailing list