[Python-Dev] str() for interpreter output

Tim Peters tim_one@email.msn.com
Sun, 9 Apr 2000 15:42:19 -0400


[Ping]
> No, what i said is what i said.
>
> Let's try this again:
>
>     repr() is not for the machine.

Ping, believe me, I heard that the first 42 times <wink>.  If it wasn't
clear before, I'll spell it out:  we don't agree on this, and I didn't agree
with Donn Cave when he first went down this path.  repr() is a noble attempt
to be usable by both human and machine.

> The documentation for __repr__ says:
>
>     __repr__(self)  Called by the repr() built-in function and by
>     string conversions (reverse quotes) to compute the "official"
>     string representation of an object.  This should normally look
>     like a valid Python expression that can be used to recreate an
>     object with the same value.

Additional docs are in the Built-in Functions section of the Library Ref
(for repr() and str()).

> It only suggests that the output "normally look like a valid
> Python expression".  It doesn't require it, and certainly doesn't
> imply that __repr__ should be the standard way to turn an object
> into a platform-independent serialization.

Alas, the docs for repr and str are vague to the point of painfulness.
Guido's *intent* is more evident in later c.l.py posts, and especially in
what the implementation *does*:  for at least all of ints, longs, floats,
complex numbers and strings, and dicts, lists and tuples composed of those
recursively, the 1.6 repr produces a faithful and platform-independent
eval'able string composed of 7-bit ASCII printable characters.

For floats and complex numbers, bit-for-bit reproducibility relies on the
assumption that the platforms are IEEE-754, but all current Windows, Mac and
Unix platforms (even Psion's EPOC32) *are*.  So when you later say

> There are two goals at odds here: readability and serialization.
> You can't have both,

sorry, but the 1.6 repr() implementation already meets both goals for a
great many builtin types (as well as for dozens of classes & types I've
implemented, and likely hundreds of classes & types others have
implemented -- and there would be twice as many if people weren't abusing
repr() to do what str() was intended to do so that the interactive prompt
hehaves reasonably).

> If you are using repr(), it's because you are expecting a human to
> look at the thing at some point.

Often, yes.  More often it's because I expect a human to *edit* it (dump
repr to a text file, fiddle it, then read it back in and eval it -- poor
man's database), which they can't reasonably be expected to do with a
pickle.  Often also it's just a way to send a data structure in email,
without needing to attach tedious instructions for how to use pickle to
decipher it.

>> pickles are unreadable by humans; that's why repr() is often preferred.

> Precisely.  You just said it yourself: repr() is for humans.

*Partly*, yes.  You assume an either/or here that I reject:  repr() works
best when it's designed for both == as Python itself does whenever possible.

> That is why repr() cannot be mandated as a serialization mechanism.

I haven't suggested to mandate it.  It's a goal, and one which is often
achievable, and appreciated when it is achieved.  Nobody expects repr() to
capture the state of an open file object -- but then they don't expect
pickle to do that either <wink>.

> There are two goals at odds here: readability and serialization.
> You can't have both, so you must prioritize.  Pickles are more
> about serialization than about readability; repr is more about
> readability than about serialization.

Pickles are more about *efficient* machine serialization, sacrificing all
readability to run as fast as possible.  Sometimes that's the best choice;
other times not.

> repr() is the interpreter's way of communicating with the human.

It is *a* way, sure, but for things like NumPy arrays and Rationals (and
probably also for IEEE doubles) it's rarely the *best* way.

> It makes sense that e.g. the repr() of a string that you see
> printed by the interpreter looks just like what you would type
> in to produce the same string,

Yes, that's repr's job.  But it's often *not* what the interactive user
*wants*.  You don't want it either!  You later say

> Right Out if it means that
>
>     eval(what_the_interpreter_prints_for(x)) == x
>
> no longer holds for objects composed of the basic built-in types.

and that implies the shortest string the prompt can display for 3.1416 -
3.141 is 0.0005999999999999339 (see reply to Christian for details on that
example).  Do you really want to get that string at the prompt?  If you have
a NumPy array with a million elements, do you really want the interpreter to
display all of them -- and in ~17 different widths?  If you're using one of
my Rational classes, do you really want to see a ratio of multi-thousand
digit longs instead of a nice 12-digit floating approximation?

I use the interactive prompt a *lot* -- the current behavior plain sucks,
starting about 10 minutes after you finish the Python Tutorial <0.7 wink>.

> And no, even if you argue that we need to have something else,
> whatever you want to call it, it's not called 'str'.

Yes, I've said repeatedly that both str() and repr() are unsuitable.  That's
where SSCTSOOS started, as str() is *more* suitable for more people more of
the time than is repr() -- but still isn't enough.

> ...
> Or, to put it another way: to write Python, it is required that
> you understand how to read and write escaped strings.  Either
> you learn just that, or you learn that plus another, different
> way to read escaped-strings-as-printed-by-the-interpreter.  The
> second case clearly requires you to learn and remember more.

You need to learn whatever it takes to get the job done.  Since the current
alternatives do not get the job done, yes, if anything is ever introduced
that *does* get the job done, there's more to learn.  Complexity isn't
necessarily evil; gratuitous complexity is evil.

> ...
> (However, characters below 0x20 are definitely dangerous to the terminal,
> and would have to be escaped regardless.)

They're no danger on any platform I use, and at least in MS-DOS they're
mapped to useful graphics characters.  Python has no way to know what's
dangerous, and gets in the way by trying to guess.  Even if x does have
control characters that are dangerous, the user will get screwed as soon as
they do

    print x

unless you want (the implied) str() to start escaping "dangerous" characters
too.  Safety and usefulness are definitely at odds here, and I favor
usefulness.  If they want saftey, let 'em use Java <wink>.

> Getting it passed down as str() seems okay to me.  Making it
> the default action, in my (naturally) subjective opinion, is
> Right Out if it means that
>
>     eval(what_the_interpreter_prints_for(x)) == x
>
> no longer holds for objects composed of the basic built-in types.

Whereas in my daily use, this property is usually a *wrong* thing to shoot
for at an interactive prompt (but is a great thing for repr() to shoot for).
When I want eval'ability, it's just a pair of backticks away; by default,
I'd rather see something *friendly*.  If I type "ping" at the prompt, I
don't want to see a second-by-second account of your entire life history
<wink>.

the-best-thing-to-do-with-most-info-is-to-suppress-it-ly y'rs  - tim