[Python-ideas] new format spec for iterable types
Wolfgang Maier
wolfgang.maier at biologie.uni-freiburg.de
Wed Sep 9 15:41:56 CEST 2015
Thanks for all the feedback!
Just to summarize ideas and to clarify what I had in mind when proposing
this:
1)
Yes, I would like to have this work with any (or at least most)
iterables, not just with my own custom type that I used for illustration.
So having this handled by the format method rather than each object's
__format__ method could make sense. It was just simple to implement it
in Python through the __format__ method.
Why did I propose * as the first character of the new format spec string?
Because I think you really need some token to state unambiguously[1]
that what follows is a format specification that involves going through
the elements of the iterable instead of working on the container object
itself. I thought that * is most intuitive to understand because of its
use in unpacking.
[1] unfortunately, in my original proposal the leading * can still be
ambiguous because *<, *> *= and *^ could mean element joining with <, >,
= or ^ as separators or aligning of the container's formatted string
representation using * as the fill character.
Ideally, the * should be the very first thing inside a replacement field
- pretty much as suggested by Oscar - and should not be part of the
format spec. This is not feasible through a format spec handled by the
__format__ method, but through a modified str.format method, i.e.,
that's another argument for this approach. Examples:
'foo {*name:<sep>} bar'.format(name=<expr>)
'foo {*0:<sep>} bar {1}'.format(x, y)
'foo {*:<sep>} bar'.format(x)
2)
As for including an additional format spec to apply to the elements of
the iterable:
I decided against including this in the original proposal to keep it
simple and to get feedback on the general idea first.
The problem here is that any solution requires an additional token to
indicate the boundary between the <separator> part and the element
format spec. Since you would not want to have anyone's custom format
spec broken by this, this boils down to disallowing one reserved
character in the <separator> part, like in Oscar's example:
'foo {*name:<sep>:<fmt>} bar'.format(name=<expr>)
where <sep> cannot contain a colon.
So that character would have to be chosen carefully (both : and | are
quite readable, but also relatively common element separators I guess).
In addition, the <separator> part should be non-optional (though the
empty string should be allowed) to guarantee the presence of the
delimiter token, which avoids accidental splitting of lonely element
format specs into a "<sep>" and <fmt> part:
# format the elements of name using <fmt>, join them using <sep>
'foo {*name:<sep>:<fmt>} bar'.format(name=<expr>)
# format the elements of name using <fmt>, join them using ''
'foo {*name::<fmt>} bar'.format(name=<expr>)
# a syntax error
'foo {*name:<fmt>} bar'.format(name=<expr>)
On the other hand, these restriction do not look too dramatic given the
flexibility gain in most situations.
So to sum up how this could work:
If str.format encounters a leading * in a replacement field, it splits
the format spec (i.e. everything after the first colon) on the first
occurrence of the <sep>|<fmt> separator (possibly ':' or '|') and does,
essentially:
<sep>.join(format(e, <fmt>) for e in iterable)
Without the *, it just works the current way.
3)
Finally, the alternative idea of having the new functionality handled by
a new !converter, like:
"List: {0!j:,}".format([1.2, 3.4, 5.6])
I considered this idea before posting the original proposal, but, in
addition to requiring a change to str.format (which would need to
recognize the new token), this approach would need either:
- a new special method (e.g., __join__) to be implemented for every type
that should support it, which is worse than for my original proposal or
- the str.format method must react directly to the converter flag, which
is then no different to the above solution just that it uses !j instead
of *. Personally, I find the * syntax more readable, plus, the !j syntax
would then suggest that this is a regular converter (calling a special
method of the object) when, in fact, it is not.
Please correct me, if I misunderstood something about this alternative
proposal.
Best,
Wolfgang
More information about the Python-ideas
mailing list