[Python-3000] Please don't kill the % operator...

Thu Aug 16 16:53:01 CEST 2007

> >     Alex> The PEP abstract says this proposal will replace the '%' operator,

> skip at pobox.com:
> > I hope this really doesn't happen.  printf-style formatting has a long
> > history both in C and Python and is well-understood.  Its few limitations
> > are mostly due to the binary nature of the % operator, not to the power or
> > flexibility of the format strings themselves.  In contrast, the new format
> > "language" seems to have no history (is it based on features in other
> > languages?  does anyone know if it will actually be usable in common
> > practice?) and at least to the casual observer of recent threads on this
> > topic seems extremely baroque.
> >
> > Python has a tradition of incorporating the best ideas from other languages.
> > String formatting is so common that it doesn't seem to me we should need to
> > invent a new, unproven mechanism to do this.

On 8/16/07, Georg Brandl <g.brandl at gmx.net> wrote:
> Not to mention the pain of porting %-style format strings and % formatting
> to {}-style format strings and .format() in Py3k.

There are many aspects to this. First of all, the discussion of PEP
3101 is taking too long, and some of the proposals are indeed outright
scary. I have long stopped following it -- basically I only pay
attention when Talin personally tells me that there's a new proposal.
I *think* that with Monday's breakthrough we're actually close, but
that apparently doesn't stop a few folks from continuing the heated
discussion.

Second, most aspects of the proposal have actually been lifted from
other languages. The {...} notation is common in many web templating
languages and also in .NET. In .NET, for example, you can write {0},
{1} etc. to reference positional parameters just like in the PEP. I
don't recall if it supports {x} to reference to parameters by name,
but that's common in web templating languages. The idea of allowing
{x.name} and {x[key]} or {x.name[1]} also comes from web templating
languages. In .NET, if you hav additional formatting requirements, you
can write {0,10} to format parameter 0 with a minimum width of 10, and
{0,-10} to right-align it. In .NET you can also write {0:xxx} where
xxx is a mini-language used to express more details; this is used to
request things like hex output, or the many variants of formatting
floats.

While we're not copying .NET *exactly*, most of the basic ideas are
very similar; the discussion at this point is mostly about the
type-specific mini-languages. My proposal is to use *exactly the same
mini-language as used in 2.x %-formatting* (but without the '%'
character), in particular: {0:10.3f} will format a float in a field of
10 characters wide with 3 digits behind the decimal point, and {0:08x}
will format an int in hex with 0-padding in a field 8 characters wide.
For strings, you can write {0:10.20s} to specify a min width of 10 and
a max width of 20 (I betcha you didn't even know you could do this
with %10.20s :-). The only added wrinkle is that you can also write
{0!r} to *force* using repr() on the value. This is similar to %r in
2.x. Of course, non-numeric types can define their own mini-language,
but that's all advanced stuff. (The concept of type-specific
mini-languages is straight from .NET though.)

I'm afraid there's an awful lot of bikeshedding going on trying to
improve on this, e.g. people want the 'f' or 'x' in front, but I think
time to the first alpha release is so close that we should stop
discussing this and start implementing. (Fortunately at least one
person has already implemented most of this.)

Much of the earlies discussion was also terribly misguided because of
an earlier assumption that the mini-language should coerce the type.
This caused endless confusion about what to do with types that have
their own __format__ override. In the end we (I) wisely decided that
the object's __format__ always wins and numeric types will just have
to support the same mini-language by convention. The user of the
format() method won't care about any of this.

Now on to the transition. On the one hand I always planned this to
*replace* the old %-formatting syntax, which has a number of real
problems: "%s" % x raises an exception if x happens to be a tuple, and
you have to write "%s" % (x,) to format an object if you aren't sure
about its type; also, it's very common to forget the trailing 's' in
"%(name)s" % {'name': ...}.

On the other hand it's too close to the alpha 1 release to fix all the
current uses of %. (In fact it would be just short of a miracle if a
working format() implementation made it into 3.0a1 at all. But I
believe in miracles.)

The mechanical translation is relatively straightforward when the
format string is given as a literal, and this part is well within the
scope of the 2to3 tool (someone just has to write the converter). The
problems come, however, when formatting strings are passed around in
variables or arguments. We can't very well assume that every string
that happens to contain a % sign is a format string, and we can't
assume that every use of the % operator is a formatting operator,
either. Talin has jokingly proposed to translate *all* occurrences of
x%y into _legacy_percent(x, y) which would be a function that does
on-the-fly translation of format strings if x is a string, and returns
x%y if it isn't, but that doesn't sound attractive at all.

I don't know what percentage of %-formatting uses a string literal on
the left; if it's a really high number (high 90s), I'd like to kill
%-formatting and go with mechanical translation; otherwise, I think
we'll have to phase out %-formatting in 3.x or 4.0.

I hope this takes away some of the fears; and gives the PEP 3101 crowd
the incentive to stop bikeshedding and start coding!

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)