[Python-3000] PEP 3138- String representation in Python 3000

Thu May 29 22:00:35 CEST 2008

Jim Jewett writes:
 > On 5/26/08, Stephen J. Turnbull <stephen at xemacs.org> wrote:
 > > Jim Jewett writes:
 > 
 > >   > The only reason for this change is that __repr__ gets used when
 > >   > __str__ *should* be used instead.
 > 
 > > That's not what the advocates say.
 > 
 > I still haven't seen a use case where it *should* be using repr *and*
 > needs to print outside of ASCII.

I suggest that's because you rarely (if ever) read program or program
*textual* input or output that's not written in ASCII.

 > >  Now, I agree with you about what's "safe".  However, in a text-
 > >  processing application in a Japanese environment, that's hardly
 > >  useful, and our Japanese programmer can argue that in his environment,
 > >  printing all of Unicode *is* safe.
 > 
 > I think he or she will still be wrong, because of confusables -- it is
 > just that "unsafe" characters are far more rare (since byte value
 > alone isn't a problem) and the cost of not printing non-ASCII
 > characters is higher.

AFAIK confusables in strings are generally not a problem, that's part
of what I mean by "environment".  If they are, then you probably need
to set up special controls in the environment anyway, and Python
giving you Unicode escapes instead of glyphs is redundant.

 > > I don't use it myself other than as a way of diagnosing bugs in
 > >  programs I write or maintain; in personal practice, I'm in your camp.
 > >  But my understanding is that there is often an intermediate level,
 > >  such as a website admin, who needs *some* of the precision of repr()
 > >  such as escaped representation of whitespace, but also needs to be
 > >  able read most of the output.
 > 
 > Could someone who does need this explain more?

I don't think that's useful.  See below.

 > I don't understand needing *exactly* whitespace escaped, but not, say,
 > stray characters from scripts you've never used, even though the rest
 > of the page *is* in an expected script.

Of course *everybody* wants *stray* characters escaped!  The problem is
that to a Japanese, the 21000 kanji are *not* stray characters.  To a
Korean, the 21000 kanji and the 11000 Hangul are not stray
characters.  Etc.

So the first question is "can repr()'s printable repertoire usefully
be made locale-dependent?", and the answer is emphatically "no".
(I'm pretty sure that's a pronouncement from Guido, I could look it up
later.)

The next question is "what is the most useful compromise?", and the
candidates are "ASCII" and "all of Unicode".  You want the former, and
the 5.7 billion people whose native language is not American English
want the latter.  I don't know about the other 300 million
Americans.<wink>