[Python-Dev] PEP 3101 Update

Fri May 19 10:41:33 CEST 2006

Guido van Rossum wrote:
> On 5/6/06, Talin <talin at acm.org> wrote:
> 
>> I've updated PEP 3101 based on the feedback collected so far.
> 
> [http://www.python.org/dev/peps/pep-3101/]
> 
> I think this is a step in the right direction.

Cool, and thanks for the very detailed feedback.

> I wonder if we shouldn't borrow more from .NET. I read this URL that
> you referenced:
> 
> http://msdn.microsoft.com/library/en-us/cpguide/html/cpconcompositeformatting.asp 
> 
> 
> They have special syntax to support field width, e.g. {0,10} formats
> item 0 in a field of (at least) 10 positions wide, right-justified;
> {0,-10} does the same left-aligned. This is done independently from

We already have that now, don't we? If you look at the docs for "String 
Formatting Operations" in the library reference, it shows that a 
negative sign on a field width indicates left justification.

> the type-specific formatting. (I'm not proposing that we use .NET's
> format specifiers after the colon, but I'm also no big fan for keeping
> the C specific stuff we have now; we should put some work in designing
> something with the same power as the current %-based system for floats
> and ints, that would cover it.)

Agreed. As you say, the main work is in handling floats and ints, and 
everything else can either be formatted as plain str(), or use a custom 
format specifier syntax (as in my strftime example.)

> .NET's solution for quoting { and } as {{ and }} respectively also
> sidesteps the issue of how to quote \ itself -- since '\\{' is a
> 2-char string containing one \ and one {, you'd have to write either
> '\\\\{0}' or r'\\{0}' to produce a single literal \ followed by
> formatted item 0. Any time there's the need to quadruple a backslash I
> think we've lost the battle. (Or you might search the web for Tcl
> quoting hell. :-)
> 
> I'm fine with not having a solution for doing variable substitution
> within the format parameters. That could be done instead by building
> up the format string with an extra formatting step: instead of
> "{x:{y}}".format(x=whatever, y=3) you could write
> "{{x,{y}}}".format(y=3).format(x=whatever). (Note that this is subtle:
> the final }}} are parsed as } followed by }}. Once the parser has seen
> a single {, the first } it sees is the matching closing } and adding
> another } after it won't affect it. The specifier cannot contain { or
> } at all.

There is another solution to this which is equally subtle, although 
fairly straightforward to parse. It involves defining the rules for 
escapes as follows:

    '{{' is an escaped '{'
    '}}' is an escaped '}', unless we are within a field.

So you can write things like {0:10,{1}}, and the final '}}' will be 
parsed as two separate closing brackets, since we're within a field 
definition.

 From a parsing standpoint, this is unambiguous, however I've held off 
on suggesting it because it might appear to be ambiguous to a casual reader.

> I like having a way to reuse the format parsing code while
> substituting something else for the formatting itself.
> 
> The PEP appears silent on what happens if there are too few or too
> many positional arguments, or if there are missing or unused keywords.
> Missing ones should be errors; I'm not sure about redundant (unused)
> ones. On the one hand complaining about those gives us more certainty
> that the format string is correct. On the other hand there are some
> use cases for passing lots of keyword parameters (e.g. simple web
> templating could pass a fixed set of variables using **dict). Even in
> i18n (translation) apps I could see the usefulness of allowing unused
> parameters

I am undecided on this issue as well, which is the reason that it's not 
mentioned in the PEP (yet).

> On the issue of {a.b.c}: like several correspondents, I don't like the
> ambiguity of attribute vs. key refs much, even though it appears
> useful enough in practice in web frameworks I've used. It seems to
> violate the Zen of Python: "In the face of ambiguity, refuse the
> temptation to guess."
> 
> Unfortunately I'm pretty lukewarm about the proposal to support
> {a[b].c} since b is not a variable reference but a literal string 'b'.
> It is also relatively cumbersome to parse. I wish I could propose
> {a+b.c} for this case but that's so arbitrary...

Actually, it's not all that hard to parse, especially given that there 
is no need to deal with the 'nested' case.

I will be supplying a Python implementation of the parser along with the 
PEP. What I would prefer not to supply (although I certainly can if you 
feel it's necessary) is an optimized C implementation of the same 
parser, as well as the implementations of the various type-specific 
formatters.

> Even more unfortunately, I expect that dict key access is a pretty
> important use case so we'll have to address it somehow. I *don't*
> think there's an important use case for the ambiguity -- in any
> particular situation I expect that the programmer will know whether
> they are expecting a dict or an object with attributes.
> 
> Hm, perhaps {a at b.c} might work? It's not an existing binary operator.
> Or perhaps # or !.

[] is the most intuitive syntax by far IMHO. Let's run it up the 
flagpole and see if anybody salutes :)

> It's too late to think straight so this will have to be continued...

One additional issue that I would like some feedback on:

The way I have set up the API for writing custom formatters (not talking 
about the __format__ method here) allows the custom formatter object to 
examine the entire output string, not merely the part that it is 
responsible for; And moreover, the custom formatter is free to modify 
the entire string. So for example, a custom formatter could tabify or 
un-tabify all previous text within the string.

The API could be made slightly simpler by eliminating this feature. The 
reason that I added it was specifically so that custom formatters could 
perform column-specific operations, like the old BASIC function that 
would print spaces up to a given column. Having generated my share of 
reports back in the old days (COBOL programming in the USAF), I thought 
it might be useful to have the ability to do operations based on the 
absolute column number.

Currently the API specifies that a custom formatter is passed an array 
object, and the custom formatter should append its data to the end of 
the array, but it is also free to examine and modify the rest of the array.

If I were to remove this feature, then instead of using an array, we'd 
simply have the custom formatter return a string like __format__ does.

So the question is - is the use case useful enough to keep this feature? 
What do people think of the use of the Python array type in this case?

-- Talin