[Python-Dev] PEP 3101 implementation vs. documentation

Sat Jun 11 11:16:37 CEST 2011

On Sat, Jun 11, 2011 at 7:15 AM, Ben Wolfson <wolfson at gmail.com> wrote:
[snip very thorough analysis]

To summarise (after both the above post and the discussion on the tracker)

The current str.format implementation differs from the documentation
in two ways:

1. It ignores the presence of an unclosed index field when processing
a replacement field (placing additional restrictions on allowable
characters in index strings).
2. Replacement fields that appear in name specifiers are processed by
the parser for brace-matching purposes, but not substituted

More accurate documentation would state that:

1. Numeric name fields start with a digit and are terminated by any
non-numeric character.

2. An identifier name field is terminated by any one of:
    '}' (terminates the replacement field, unless preceded by a
matching '{' character, in which case it is ignored and included in
the string)
    '!' (terminates name field, starts conversion specifier)
    ':' (terminates name field, starts format specifier)
    '.' (terminates current name field, starts new name field for subattribute)
    '[' (terminates name field, starts index field)

3. An index field is terminated by one of:
    '}' (terminates the replacement field, unless preceded by a
matching '{' character, in which case it is ignored and included in
the string)
    '!' (terminates index field, starts conversion specifier)
    ':' (terminates index field, starts format specifier)
    ']' (terminates index field, subsequent character will determine next field)

This existing behaviour can certainly be documented as such, but is
rather unintuitive and (given that '}', '!' and ']' will always error
out if appearing in an index field) somewhat silly.

So, the two changes that I believe Ben is proposing would be as follows:

1. When processing a name field, brace-matching is suspended. Between
the opening '{' character and the closing '}', '!' or ':' character,
additional '{' characters are ignored for matching purposes.
2. When processing an index field, all special processing is suspended
until the terminating ']' is reached

The rules for name fields would then become:

1. Numeric fields start with a digit and are terminated by any
non-numeric character.

2. An identifier name field is terminated by any one of:
    '}' (terminates the replacement field)
    '!' (terminates identifier field, starts conversion specifier)
    ':' (terminates identifier field, starts format specifier)
    '.' (terminates identifier field, starts new identifier field for
subattribute)
    '[' (terminates identifier field, starts index field)

3. An index field is terminated by ']' (subsequent character will
determine next field)

That second set of rules is *far* more in line with the behaviour of
the rest of the language than the status quo, so unless the difficulty
of making the str.format mini-language parser work that way is truly
prohibitive, it certainly seems worthwhile to tidy up the semantics.

The index field behaviour should definitely be fixed, as it poses no
backwards compatibility concerns. The brace matching behaviour should
probably be left alone, as changing it would potentially break
currently valid format strings (e.g. "{a{0}}".format(**{'a{0}':1})
produces '1' now, but would raise an exception if the brace matching
rules were changed).

So +1 on making the str.format parser accept anything other than ']'
inside an index field and turn the whole thing into an ordinary
string, -1 on making any other changes to the brace-matching
behaviour.

That would leave us with the following set of rules for name fields:

1. Numeric fields start with a digit and are terminated by any
non-numeric character.

2. An identifier name field is terminated by any one of:
    '}' (terminates the replacement field, unless preceded by a
matching '{' character, in which case it is ignored and included in
the string)
    '!' (terminates identifier field, starts conversion specifier)
    ':' (terminates identifier field, starts format specifier)
    '.' (terminates identifier field, starts new identifier field for
subattribute)
    '[' (terminates identifier field, starts index field)

3. An index field is terminated by ']' (subsequent character will
determine next field)

Note that brace-escaping currently doesn't work inside name fields, so
that should also be fixed:

>>> "{0[{{]}".format({'{':1})
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: unmatched '{' in format
>>> "{a{{}".format(**{'a{':1})
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: unmatched '{' in format

As far as I can recall, the details of this question didn't come up
when PEP 3101 was developed, so the PEP isn't a particularly good
source to justify anything in relation to this - it is best to
consider the current behaviour to just be the way it happened to be
implemented rather than a deliberate design choice.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia