[Python-Dev] PEP 3101 Update
Talin
talin at acm.org
Fri May 19 10:41:33 CEST 2006
Guido van Rossum wrote:
> On 5/6/06, Talin <talin at acm.org> wrote:
>
>> I've updated PEP 3101 based on the feedback collected so far.
>
> [http://www.python.org/dev/peps/pep-3101/]
>
> I think this is a step in the right direction.
Cool, and thanks for the very detailed feedback.
> I wonder if we shouldn't borrow more from .NET. I read this URL that
> you referenced:
>
> http://msdn.microsoft.com/library/en-us/cpguide/html/cpconcompositeformatting.asp
>
>
> They have special syntax to support field width, e.g. {0,10} formats
> item 0 in a field of (at least) 10 positions wide, right-justified;
> {0,-10} does the same left-aligned. This is done independently from
We already have that now, don't we? If you look at the docs for "String
Formatting Operations" in the library reference, it shows that a
negative sign on a field width indicates left justification.
> the type-specific formatting. (I'm not proposing that we use .NET's
> format specifiers after the colon, but I'm also no big fan for keeping
> the C specific stuff we have now; we should put some work in designing
> something with the same power as the current %-based system for floats
> and ints, that would cover it.)
Agreed. As you say, the main work is in handling floats and ints, and
everything else can either be formatted as plain str(), or use a custom
format specifier syntax (as in my strftime example.)
> .NET's solution for quoting { and } as {{ and }} respectively also
> sidesteps the issue of how to quote \ itself -- since '\\{' is a
> 2-char string containing one \ and one {, you'd have to write either
> '\\\\{0}' or r'\\{0}' to produce a single literal \ followed by
> formatted item 0. Any time there's the need to quadruple a backslash I
> think we've lost the battle. (Or you might search the web for Tcl
> quoting hell. :-)
>
> I'm fine with not having a solution for doing variable substitution
> within the format parameters. That could be done instead by building
> up the format string with an extra formatting step: instead of
> "{x:{y}}".format(x=whatever, y=3) you could write
> "{{x,{y}}}".format(y=3).format(x=whatever). (Note that this is subtle:
> the final }}} are parsed as } followed by }}. Once the parser has seen
> a single {, the first } it sees is the matching closing } and adding
> another } after it won't affect it. The specifier cannot contain { or
> } at all.
There is another solution to this which is equally subtle, although
fairly straightforward to parse. It involves defining the rules for
escapes as follows:
'{{' is an escaped '{'
'}}' is an escaped '}', unless we are within a field.
So you can write things like {0:10,{1}}, and the final '}}' will be
parsed as two separate closing brackets, since we're within a field
definition.
From a parsing standpoint, this is unambiguous, however I've held off
on suggesting it because it might appear to be ambiguous to a casual reader.
> I like having a way to reuse the format parsing code while
> substituting something else for the formatting itself.
>
> The PEP appears silent on what happens if there are too few or too
> many positional arguments, or if there are missing or unused keywords.
> Missing ones should be errors; I'm not sure about redundant (unused)
> ones. On the one hand complaining about those gives us more certainty
> that the format string is correct. On the other hand there are some
> use cases for passing lots of keyword parameters (e.g. simple web
> templating could pass a fixed set of variables using **dict). Even in
> i18n (translation) apps I could see the usefulness of allowing unused
> parameters
I am undecided on this issue as well, which is the reason that it's not
mentioned in the PEP (yet).
> On the issue of {a.b.c}: like several correspondents, I don't like the
> ambiguity of attribute vs. key refs much, even though it appears
> useful enough in practice in web frameworks I've used. It seems to
> violate the Zen of Python: "In the face of ambiguity, refuse the
> temptation to guess."
>
> Unfortunately I'm pretty lukewarm about the proposal to support
> {a[b].c} since b is not a variable reference but a literal string 'b'.
> It is also relatively cumbersome to parse. I wish I could propose
> {a+b.c} for this case but that's so arbitrary...
Actually, it's not all that hard to parse, especially given that there
is no need to deal with the 'nested' case.
I will be supplying a Python implementation of the parser along with the
PEP. What I would prefer not to supply (although I certainly can if you
feel it's necessary) is an optimized C implementation of the same
parser, as well as the implementations of the various type-specific
formatters.
> Even more unfortunately, I expect that dict key access is a pretty
> important use case so we'll have to address it somehow. I *don't*
> think there's an important use case for the ambiguity -- in any
> particular situation I expect that the programmer will know whether
> they are expecting a dict or an object with attributes.
>
> Hm, perhaps {a at b.c} might work? It's not an existing binary operator.
> Or perhaps # or !.
[] is the most intuitive syntax by far IMHO. Let's run it up the
flagpole and see if anybody salutes :)
> It's too late to think straight so this will have to be continued...
One additional issue that I would like some feedback on:
The way I have set up the API for writing custom formatters (not talking
about the __format__ method here) allows the custom formatter object to
examine the entire output string, not merely the part that it is
responsible for; And moreover, the custom formatter is free to modify
the entire string. So for example, a custom formatter could tabify or
un-tabify all previous text within the string.
The API could be made slightly simpler by eliminating this feature. The
reason that I added it was specifically so that custom formatters could
perform column-specific operations, like the old BASIC function that
would print spaces up to a given column. Having generated my share of
reports back in the old days (COBOL programming in the USAF), I thought
it might be useful to have the ability to do operations based on the
absolute column number.
Currently the API specifies that a custom formatter is passed an array
object, and the custom formatter should append its data to the end of
the array, but it is also free to examine and modify the rest of the array.
If I were to remove this feature, then instead of using an array, we'd
simply have the custom formatter return a string like __format__ does.
So the question is - is the use case useful enough to keep this feature?
What do people think of the use of the Python array type in this case?
-- Talin
More information about the Python-Dev
mailing list