Mailman 3 new format spec for iterable types - Python-ideas

newer
Re: [Python-ideas] [Python-Dev]...

new format spec for iterable types

older
Structural type checking for PEP...

Wolfgang Maier

Sept. 8, 2015

noon

Hi, in the parallel "format and print" thread, Andrew Barnert wrote:

...

For example, in a 5-line script I wrote last night, I've got print(head, *names, sep='\t'). I could have used print('\t'.join(chain([head], names)) instead--in fact, any use of multi-argument print can be replaced by print(sep.join(map(str, args)))--but that's less convenient, less readable, and less likely to occur to novices. And there are plenty of other alternatives, from print('{}\t{}'.format(head, '\t'.join(names)) to ...

That last thing, '{}\t{}'.format(head, '\t'.join(names)), is something I find myself writing relatively often - when I do not want to print the result immediately, but store it - but it is ugly to read with its nested method calls and the separators occurring in two very different places. Now Andrew's comment has prompted me to think about alternative syntax for this and I came up with this idea: What if built in iterable types through their __format__ method supported a format spec string of the form "*separator" and interpreted it as join your elements' formatted representations using "separator" ? A quick and dirty illustration in Python: class myList(list): def __format__ (self, fmt=''): if fmt == '': return str(self) if fmt[0] == '*': sep = fmt[1:] or ' ' return sep.join(format(e) for e in self) else: raise TypeError() head = 99 data = myList(range(10)) s = '{}, {:*, }'.format(head, data) # or s2 = '{}{sep}{:*{sep}}'.format(head, data, sep=', ') print(s) print(s2) # 99, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 Thoughts?

Show replies by date

Andrew Barnert

September 2015

2:24 p.m.

On Sep 8, 2015, at 03:00, Wolfgang Maier <wolfgang.maier@biologie.uni-freiburg.de> wrote:

...

Hi,

in the parallel "format and print" thread, Andrew Barnert wrote:

...
For example, in a 5-line script I wrote last night, I've got print(head, *names, sep='\t'). I could have used print('\t'.join(chain([head], names)) instead--in fact, any use of multi-argument print can be replaced by print(sep.join(map(str, args)))--but that's less convenient, less readable, and less likely to occur to novices. And there are plenty of other alternatives, from print('{}\t{}'.format(head, '\t'.join(names)) to ...

That last thing, '{}\t{}'.format(head, '\t'.join(names)), is something I find myself writing relatively often - when I do not want to print the result immediately, but store it - but it is ugly to read with its nested method calls and the separators occurring in two very different places. Now Andrew's comment has prompted me to think about alternative syntax for this and I came up with this idea:

What if built in iterable types through their __format__ method supported a format spec string of the form "*separator" and interpreted it as join your elements' formatted representations using "separator" ? A quick and dirty illustration in Python:

class myList(list): def __format__ (self, fmt=''): if fmt == '': return str(self) if fmt[0] == '*': sep = fmt[1:] or ' ' return sep.join(format(e) for e in self) else: raise TypeError()

head = 99 data = myList(range(10)) s = '{}, {:*, }'.format(head, data) # or s2 = '{}{sep}{:*{sep}}'.format(head, data, sep=', ') print(s) print(s2) # 99, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9

Formatting positional argument #2 with *{sep} as the format specifier makes no sense to me. Even knowing what you're trying to do, I can't understand what *(', ') is going to pass to data.__format__, or why it should do what you want. What is the * supposed to mean there? Is it akin to *args in a function call expression, so you get ',' and ' ' as separate positional arguments? If so, how does the fmt[1] do anything useful? It seems like you would be using [' '] as the separator, and in not sure what that would do that you'd want.

Wolfgang Maier

2:55 p.m.

On 08.09.2015 14:24, Andrew Barnert via Python-ideas wrote:

...

Formatting positional argument #2 with *{sep} as the format specifier

makes no sense to me. Even knowing what you're trying to do, I can't understand what *(', ') is going to pass to data.__format__, or why it should do what you want. What is the * supposed to mean there? Is it akin to *args in a function call expression, so you get ',' and ' ' as separate positional arguments? If so, how does the fmt[1] do anything useful? It seems like you would be using [' '] as the separator, and in not sure what that would do that you'd want.

...

Not sure what happened to the indentation in the posted code. Here's another attempt copy pasting from working code as I thought I had done before (sorry for the inconvenience): class myList(list): def __format__ (self, fmt=''): if fmt == '': return str(self) if fmt[0] == '*': sep = fmt[1:] or ' ' return sep.join(format(e) for e in self) else: raise TypeError() head = 99 data = myList(range(10)) s = '{}, {:*, }'.format(head, data) # or s2 = '{}{sep}{:*{sep}}'.format(head, data, sep=', ') print(s) print(s2) # 99, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 Does that make things clearer?

Oscar Benjamin

3:41 p.m.

On 8 September 2015 at 13:24, Andrew Barnert via Python-ideas <python-ideas@python.org> wrote:

...

Wolfgang wrote:

...
A quick and dirty illustration in Python:

class myList(list): def __format__ (self, fmt=''): if fmt == '': return str(self) if fmt[0] == '*': sep = fmt[1:] or ' ' return sep.join(format(e) for e in self) else: raise TypeError()

head = 99 data = myList(range(10)) s = '{}, {:*, }'.format(head, data) # or s2 = '{}{sep}{:*{sep}}'.format(head, data, sep=', ') print(s) print(s2) # 99, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9

Formatting positional argument #2 with *{sep} as the format specifier makes no sense to me. Even knowing what you're trying to do, I can't understand what *(', ') is going to pass to data.__format__, or why it should do what you want. What is the * supposed to mean there? Is it akin to *args in a function call expression, so you get ',' and ' ' as separate positional arguments? If so, how does the fmt[1] do anything useful? It seems like you would be using [' '] as the separator, and in not sure what that would do that you'd want.

The *{sep} surprised me until I tried >>> '{x:.{n}f}'.format(x=1.234567, n=2) '1.23' So format uses a two-level pass over the string for nested curly brackets (I tried a third level of nesting but it didn't work). So following it through: '{}{sep}{:*{sep}}'.format(head, data, sep=', ') '{}, {:*, }'.format(head, data) '{}, {}'.format(head, format(data, '*, ')) '{}, {}'.format(head, ', '.join(format(e) for e in data)) '99, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9' Unfortunately there's no way to also give a format string to the inner format call format(e) if I wanted to e.g. format those numbers in hex. -- Oscar

Wolfgang Maier

4:20 p.m.

On 08.09.2015 15:41, Oscar Benjamin wrote:

...

The *{sep} surprised me until I tried

>>> '{x:.{n}f}'.format(x=1.234567, n=2) '1.23'

So format uses a two-level pass over the string for nested curly brackets (I tried a third level of nesting but it didn't work).

Yes, this is documented behavior (https://docs.python.org/3/library/string.html#format-string-syntax): "A format_spec field can also include nested replacement fields within it. These nested replacement fields can contain only a field name; conversion flags and format specifications are not allowed. The replacement fields within the format_spec are substituted before the format_spec string is interpreted. This allows the formatting of a value to be dynamically specified."

...

Unfortunately there's no way to also give a format string to the inner format call format(e) if I wanted to e.g. format those numbers in hex.

Right, that would require a much more complex format_spec definition. But the proposed simple version saves me from mistakenly writing: '{}\t{}'.format(head, '\t'.join(data)) when some of the elements in data aren't strings and I should have written: '{}\t{}'.format(head, '\t'.join(str(e) for e in data)) , a mistake that I seem to never learn to avoid :)

Stephen J. Turnbull

6:27 p.m.

Wolfgang Maier writes:

...

But the proposed simple version saves me from mistakenly writing:

'{}\t{}'.format(head, '\t'.join(data))

when some of the elements in data aren't strings and I should have written:

'{}\t{}'.format(head, '\t'.join(str(e) for e in data))

, a mistake that I seem to never learn to avoid :)

(Note: I don't suffer from that particular mistake, so I may be biased.) I think it's a nice trick but doesn't clear the bar for adding to the standard iterables yet. A technical comment: you don't actually need the '*' for myList (although I guess you find it useful to get an error rather than line noise as a separator if it isn't present?) On the basic idea: if this can be generalized a bit so that head = 99 data = range(10) # optimism! s = '{:.1f}, {:.1f*, }'.format(head, data) produces s == '99.0, 0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0' then I'd be a lot more attracted to it. I would think the simple version is likely to produce rather ugly output if you have a bunch of floats in data. (BTW, that string was actually generated with '{:.2f}, {}'.format(99, ', '.join('{:.2f}'.format(x) for x in range(10))) which doesn't win any beauty contests.) Bikeshedding in advance, now you pretty much need the '*' (and have to hope that the types in the iterable don't use it themselves!), because '{:.1f, }' really does look like line noise! I might actually prefer '|' (or '/') which is "heavier" and "looks like a separator" to me: s = '{:.1f}, {:.1f|, }'.format(head, data) Finally, another alternative syntax would be the same in the replacement field, but instead of iterables implementing it, the .format method would (using your syntax and example for easier comparison): s = '{}, {:*, }'.format(head, *data) I'm afraid this won't work unless restricted to be the last replacement field, where it just consumes all remaining positional arguments. I think that restriction deserves a loud "ugh", but maybe it will give somebody a better idea. Steve

Oscar Benjamin

8:15 p.m.

On 8 September 2015 at 17:27, Stephen J. Turnbull <stephen@xemacs.org> wrote:

...

A technical comment: you don't actually need the '*' for myList (although I guess you find it useful to get an error rather than line noise as a separator if it isn't present?)

I think Wolfgang wants it to work with any iterable rather than his own custom type (at least that's what I'd want). For that to work it would be better if it was handled by the format method itself rather than every iterable's __format__ method. Then it could work with generators, lists, tuples etc.

...

On the basic idea: if this can be generalized a bit so that

head = 99 data = range(10) # optimism! s = '{:.1f}, {:.1f*, }'.format(head, data)

produces

s == '99.0, 0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0'

then I'd be a lot more attracted to it.

ATM the colon separates the part of the format element that is interpreted by the format method to find the formatted object from the part that is passed to the __format__ method of the formatted object. Perhaps an additional colon could be used to separate the separator for when the formatted object is an iterable so that 'foo {name:<fmt>:<sep>} bar'.format(name=<expr>) could become 'foo {_name} bar'.format(_name = '<sep>'.join(format(o, '<fmt>') for o in <expr>)) The example would then be >>> '{:.1f}, {:.1f:, }'.format(99, range(10)) '99.0, 0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0' -- Oscar

Oscar Benjamin

8:42 p.m.

On 8 September 2015 at 19:15, Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:

...

ATM the colon separates the part of the format element that is interpreted by the format method to find the formatted object from the part that is passed to the __format__ method of the formatted object. Perhaps an additional colon could be used to separate the separator for when the formatted object is an iterable so that

'foo {name:<fmt>:<sep>} bar'.format(name=<expr>)

could become

'foo {_name} bar'.format(_name = '<sep>'.join(format(o, '<fmt>') for o in <expr>))

The example would then be

>>> '{:.1f}, {:.1f:, }'.format(99, range(10)) '99.0, 0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0'

Except that obviously that wouldn't work because colon can be part of the <fmt> string e.g. for datetime: >>> '{:%H:%M}'.format(datetime.datetime.now()) '19:39' So you'd need something before the colon to disambiguate. In which case perhaps 'foo {*name:<sep>:<fmt>} bar'.format(name=<expr>) meaning that if the * is there then everything after the second colon is the format string. Then it would be: >>> '{:.1f}, {*:, :.1f}'.format(99, range(10)) '99.0, 0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0' -- Oscar

Stephen J. Turnbull

4:37 a.m.

Oscar Benjamin writes:

...

ATM the colon separates the part of the format element that is interpreted by the format method to find the formatted object from the part that is passed to the __format__ method of the formatted object. Perhaps an additional colon could be used to separate the separator for when the formatted object is an iterable so that

'foo {name:<fmt>:<sep>} bar'.format(name=<expr>)

I thought about a colon, but that loses if the objects are times. I guess that kills '/' and '-', too, since the objects might be dates. Of course there may be a tricky way to use these that I haven't thought of, or they could be escaped for use in <fmt>.

Oscar Benjamin

1:56 p.m.

On 9 September 2015 at 03:37, Stephen J. Turnbull <stephen@xemacs.org> wrote:

...

Oscar Benjamin writes:

...
ATM the colon separates the part of the format element that is interpreted by the format method to find the formatted object from the part that is passed to the __format__ method of the formatted object. Perhaps an additional colon could be used to separate the separator for when the formatted object is an iterable so that

'foo {name:<fmt>:<sep>} bar'.format(name=<expr>)

I thought about a colon, but that loses if the objects are times. I guess that kills '/' and '-', too, since the objects might be dates. Of course there may be a tricky way to use these that I haven't thought of, or they could be escaped for use in <fmt>.

You can use the * at the start of the format element (before the first colon). It can then imply that there will be two colons to separate the three parts with any further colons part of fmt e.g.: '{*<expr>:<sep>:<fmt>}'.format(...) So then you can have: >>> '{*numbers: :.1f}'.format(numbers) '1.0, 2.0, 3.0' >>> '{*times:, :%H:%M}'.format(times) '12:30, 14:50, 22:39' -- Oscar

Ron Adam

6:59 p.m.

On 09/08/2015 09:37 PM, Stephen J. Turnbull wrote:

...

Oscar Benjamin writes:

...
ATM the colon separates the part of the format element that is interpreted by the format method to find the formatted object from the part that is passed to the __format__ method of the formatted object. Perhaps an additional colon could be used to separate the separator for when the formatted object is an iterable so that

'foo {name:<fmt>:<sep>} bar'.format(name=<expr>)

I thought about a colon, but that loses if the objects are times. I guess that kills '/' and '-', too, since the objects might be dates. Of course there may be a tricky way to use these that I haven't thought of, or they could be escaped for use in <fmt>.

This seems to me to need a nested format spec. An outer one to format the whole list, and an inner one to format each item. f"foo {', '.join(f'{x:inner_spec}' for x in iter):outer_spec}" Actually this is how I'd rather write it. "foo " + l.fmap(inner_spec).join(', ').fstr(outer_spec) But sequences don't have the methods to write it that way.

...

...
...
l = range(10) "foo" + format(','.join(map(lambda x: format(x, '>5'), l)), '>50') 'foo 0, 1, 2, 3, 4, 5, 6, 7, 8, 9'

It took me a few times to get that right. Cheers, Ron

random832＠fastmail.us

9:38 p.m.

On Tue, Sep 8, 2015, at 12:27, Stephen J. Turnbull wrote:

...

I'm afraid this won't work unless restricted to be the last replacement field, where it just consumes all remaining positional arguments. I think that restriction deserves a loud "ugh", but maybe it will give somebody a better idea.

So, this is the second time in as many weeks that I've suggested a new !converter, but this seems like the place for it - have something like "!join" which "converts" [wraps] the argument in a class whose __format__ method knows how to join [and call __format__ on the individual members]. So you could make a list of floating point numbers by "List: {0:, |.2f!join}".format([1.2, 3.4, 5.6]) and it will simply call Joiner([1.2, 3.4, 5.6]).__format__(", |.2f")

Andrew Barnert

2:03 a.m.

On Sep 8, 2015, at 12:38, random832@fastmail.us wrote:

...

...
On Tue, Sep 8, 2015, at 12:27, Stephen J. Turnbull wrote: I'm afraid this won't work unless restricted to be the last replacement field, where it just consumes all remaining positional arguments. I think that restriction deserves a loud "ugh", but maybe it will give somebody a better idea.

So, this is the second time in as many weeks that I've suggested a new !converter, but this seems like the place for it - have something like "!join" which "converts" [wraps] the argument in a class whose __format__ method knows how to join [and call __format__ on the individual members].

So you could make a list of floating point numbers by "List: {0:, |.2f!join}".format([1.2, 3.4, 5.6])

and it will simply call Joiner([1.2, 3.4, 5.6]).__format__(", |.2f")

I like this version. Even without the flexibility, just adding another hardcoded 'j' converter for iterables would be nice, but being able to program it would of course be better.

Sven R. Kunze

7:49 p.m.

On 08.09.2015 12:00, Wolfgang Maier wrote:

...

head = 99 data = myList(range(10)) s = '{}, {:*, }'.format(head, data) # or s2 = '{}{sep}{:*{sep}}'.format(head, data, sep=', ') print(s) print(s2) # 99, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9

Thoughts?

I like it and I agree this is an oft-used pattern. From my experience I can tell patterns are workarounds if a language cannot handle it properly. I cannot tell what a concrete syntax would exactly look like but I would love to see an easy-to-read solution. Best, Sven

Chris Angelico

7:58 p.m.

On Wed, Sep 9, 2015 at 3:49 AM, Sven R. Kunze <srkunze@mail.de> wrote:

...

On 08.09.2015 12:00, Wolfgang Maier wrote:

...
head = 99 data = myList(range(10)) s = '{}, {:*, }'.format(head, data) # or s2 = '{}{sep}{:*{sep}}'.format(head, data, sep=', ') print(s) print(s2) # 99, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9

Thoughts?

I like it and I agree this is an oft-used pattern. From my experience I can tell patterns are workarounds if a language cannot handle it properly.

I cannot tell what a concrete syntax would exactly look like but I would love to see an easy-to-read solution.

It looks tempting, but there's a reason Python has join() as a *string* method, not a method on any sort of iterable. For the same reason, I think it'd be better to handle this as a special case inside str.format(), rather than as a format string of the iterables; it would be extremely surprising for code to be able to join a list, a tuple, a ListIterator, or a generator, but not a custom class with __iter__ and __next__ methods. (Even more surprising if it works with some standard library types and not others.) Plus, it'd mean a lot of code duplication across all those types, which is unnecessary. It'd be rather cool if it could be done as a special format string, though, which says "here's a separator, here's a format string, now iterate over the argument and format them with that string, then join them with that sep, and stick it in here". It might get a bit verbose, though. ChrisA

Sven R. Kunze

8:21 p.m.

On 08.09.2015 19:58, Chris Angelico wrote:

...

It'd be rather cool if it could be done as a special format string, though, which says "here's a separator, here's a format string, now iterate over the argument and format them with that string, then join them with that sep, and stick it in here". It might get a bit verbose, though.

Most of the time, the "format string" of yours I use is "str". So, defaulting to "str" would suffice at least from my point of view: output = f'Have a look at this comma separated list: {fruits#, }.' Substitute # by any character that you see fit. I mean, seriously, you don't use a full-featured template engine, throw an iterable into it and hope that is just works and provides some readable output. Job done and you can move on. What do you expect? From my point of view, the str + join suffices for once again 80% of the use-cases. Best, Sven

Wolfgang Maier

3:41 p.m.

Thanks for all the feedback! Just to summarize ideas and to clarify what I had in mind when proposing this: 1) Yes, I would like to have this work with any (or at least most) iterables, not just with my own custom type that I used for illustration. So having this handled by the format method rather than each object's __format__ method could make sense. It was just simple to implement it in Python through the __format__ method. Why did I propose * as the first character of the new format spec string? Because I think you really need some token to state unambiguously[1] that what follows is a format specification that involves going through the elements of the iterable instead of working on the container object itself. I thought that * is most intuitive to understand because of its use in unpacking. [1] unfortunately, in my original proposal the leading * can still be ambiguous because *<, *> *= and *^ could mean element joining with <, >, = or ^ as separators or aligning of the container's formatted string representation using * as the fill character. Ideally, the * should be the very first thing inside a replacement field - pretty much as suggested by Oscar - and should not be part of the format spec. This is not feasible through a format spec handled by the __format__ method, but through a modified str.format method, i.e., that's another argument for this approach. Examples: 'foo {*name:<sep>} bar'.format(name=<expr>) 'foo {*0:<sep>} bar {1}'.format(x, y) 'foo {*:<sep>} bar'.format(x) 2) As for including an additional format spec to apply to the elements of the iterable: I decided against including this in the original proposal to keep it simple and to get feedback on the general idea first. The problem here is that any solution requires an additional token to indicate the boundary between the <separator> part and the element format spec. Since you would not want to have anyone's custom format spec broken by this, this boils down to disallowing one reserved character in the <separator> part, like in Oscar's example: 'foo {*name:<sep>:<fmt>} bar'.format(name=<expr>) where <sep> cannot contain a colon. So that character would have to be chosen carefully (both : and | are quite readable, but also relatively common element separators I guess). In addition, the <separator> part should be non-optional (though the empty string should be allowed) to guarantee the presence of the delimiter token, which avoids accidental splitting of lonely element format specs into a "<sep>" and <fmt> part: # format the elements of name using <fmt>, join them using <sep> 'foo {*name:<sep>:<fmt>} bar'.format(name=<expr>) # format the elements of name using <fmt>, join them using '' 'foo {*name::<fmt>} bar'.format(name=<expr>) # a syntax error 'foo {*name:<fmt>} bar'.format(name=<expr>) On the other hand, these restriction do not look too dramatic given the flexibility gain in most situations. So to sum up how this could work: If str.format encounters a leading * in a replacement field, it splits the format spec (i.e. everything after the first colon) on the first occurrence of the <sep>|<fmt> separator (possibly ':' or '|') and does, essentially: <sep>.join(format(e, <fmt>) for e in iterable) Without the *, it just works the current way. 3) Finally, the alternative idea of having the new functionality handled by a new !converter, like: "List: {0!j:,}".format([1.2, 3.4, 5.6]) I considered this idea before posting the original proposal, but, in addition to requiring a change to str.format (which would need to recognize the new token), this approach would need either: - a new special method (e.g., __join__) to be implemented for every type that should support it, which is worse than for my original proposal or - the str.format method must react directly to the converter flag, which is then no different to the above solution just that it uses !j instead of *. Personally, I find the * syntax more readable, plus, the !j syntax would then suggest that this is a regular converter (calling a special method of the object) when, in fact, it is not. Please correct me, if I misunderstood something about this alternative proposal. Best, Wolfgang

Eric V. Smith

4:02 p.m.

At some point, instead of complicating how format works internally, you should just write a function that does what you want. I realize there's a continuum between '{}'.format(iterable) and '{<really-really-complex-stuff}'.format(iterable). It's not clear where to draw the line. But when the solution is to bake knowledge of iterables into .format(), I think we've passed the point where we should switch to a function: '{}'.format(some_function(iterable)). In any event, If you want to play with this, I suggest you write some_function(iterable) that does what you want, first. Eric. On 9/9/2015 9:41 AM, Wolfgang Maier wrote:

...

Thanks for all the feedback!

Just to summarize ideas and to clarify what I had in mind when proposing this:

1) Yes, I would like to have this work with any (or at least most) iterables, not just with my own custom type that I used for illustration. So having this handled by the format method rather than each object's __format__ method could make sense. It was just simple to implement it in Python through the __format__ method.

Why did I propose * as the first character of the new format spec string? Because I think you really need some token to state unambiguously[1] that what follows is a format specification that involves going through the elements of the iterable instead of working on the container object itself. I thought that * is most intuitive to understand because of its use in unpacking.

[1] unfortunately, in my original proposal the leading * can still be ambiguous because *<, *> *= and *^ could mean element joining with <, >, = or ^ as separators or aligning of the container's formatted string representation using * as the fill character.

Ideally, the * should be the very first thing inside a replacement field - pretty much as suggested by Oscar - and should not be part of the format spec. This is not feasible through a format spec handled by the __format__ method, but through a modified str.format method, i.e., that's another argument for this approach. Examples:

'foo {*name:<sep>} bar'.format(name=<expr>) 'foo {*0:<sep>} bar {1}'.format(x, y) 'foo {*:<sep>} bar'.format(x)

2) As for including an additional format spec to apply to the elements of the iterable: I decided against including this in the original proposal to keep it simple and to get feedback on the general idea first. The problem here is that any solution requires an additional token to indicate the boundary between the <separator> part and the element format spec. Since you would not want to have anyone's custom format spec broken by this, this boils down to disallowing one reserved character in the <separator> part, like in Oscar's example:

'foo {*name:<sep>:<fmt>} bar'.format(name=<expr>)

where <sep> cannot contain a colon.

So that character would have to be chosen carefully (both : and | are quite readable, but also relatively common element separators I guess). In addition, the <separator> part should be non-optional (though the empty string should be allowed) to guarantee the presence of the delimiter token, which avoids accidental splitting of lonely element format specs into a "<sep>" and <fmt> part:

# format the elements of name using <fmt>, join them using <sep> 'foo {*name:<sep>:<fmt>} bar'.format(name=<expr>) # format the elements of name using <fmt>, join them using '' 'foo {*name::<fmt>} bar'.format(name=<expr>) # a syntax error 'foo {*name:<fmt>} bar'.format(name=<expr>)

On the other hand, these restriction do not look too dramatic given the flexibility gain in most situations.

So to sum up how this could work: If str.format encounters a leading * in a replacement field, it splits the format spec (i.e. everything after the first colon) on the first occurrence of the <sep>|<fmt> separator (possibly ':' or '|') and does, essentially:

<sep>.join(format(e, <fmt>) for e in iterable)

Without the *, it just works the current way.

3) Finally, the alternative idea of having the new functionality handled by a new !converter, like:

"List: {0!j:,}".format([1.2, 3.4, 5.6])

I considered this idea before posting the original proposal, but, in addition to requiring a change to str.format (which would need to recognize the new token), this approach would need either:

- a new special method (e.g., __join__) to be implemented for every type that should support it, which is worse than for my original proposal or

- the str.format method must react directly to the converter flag, which is then no different to the above solution just that it uses !j instead of *. Personally, I find the * syntax more readable, plus, the !j syntax would then suggest that this is a regular converter (calling a special method of the object) when, in fact, it is not. Please correct me, if I misunderstood something about this alternative proposal.

Best, Wolfgang

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Wolfgang Maier

4:32 p.m.

Well, here it is: def unpack_format (iterable, format_spec=None): if format_spec: try: sep, element_fmt = format_spec.split('|', 1) except ValueError: raise TypeError('Invalid format_spec for iterable formatting') return sep.join(format(e, element_fmt) for e in iterable) usage examples: # '0.00, 1.00, 2.00, 3.00, 4.00, 5.00, 6.00, 7.00, 8.00, 9.00' '{}'.format(unpack_format(range(10), ', |.2f')) # '0.001.002.003.004.005.006.007.008.009.00' '{}'.format(unpack_format(range(10), '|.2f')) # invalid syntax '{}'.format(unpack_format(range(10), '.2f')) Best, Wolfgang On 09.09.2015 16:02, Eric V. Smith wrote:

...

At some point, instead of complicating how format works internally, you should just write a function that does what you want. I realize there's a continuum between '{}'.format(iterable) and '{<really-really-complex-stuff}'.format(iterable). It's not clear where to draw the line. But when the solution is to bake knowledge of iterables into .format(), I think we've passed the point where we should switch to a function: '{}'.format(some_function(iterable)).

In any event, If you want to play with this, I suggest you write some_function(iterable) that does what you want, first.

Eric.

On 9/9/2015 9:41 AM, Wolfgang Maier wrote:

...
Thanks for all the feedback!

Just to summarize ideas and to clarify what I had in mind when proposing this:

1) Yes, I would like to have this work with any (or at least most) iterables, not just with my own custom type that I used for illustration. So having this handled by the format method rather than each object's __format__ method could make sense. It was just simple to implement it in Python through the __format__ method.

Why did I propose * as the first character of the new format spec string? Because I think you really need some token to state unambiguously[1] that what follows is a format specification that involves going through the elements of the iterable instead of working on the container object itself. I thought that * is most intuitive to understand because of its use in unpacking.

[1] unfortunately, in my original proposal the leading * can still be ambiguous because *<, *> *= and *^ could mean element joining with <, >, = or ^ as separators or aligning of the container's formatted string representation using * as the fill character.

Ideally, the * should be the very first thing inside a replacement field - pretty much as suggested by Oscar - and should not be part of the format spec. This is not feasible through a format spec handled by the __format__ method, but through a modified str.format method, i.e., that's another argument for this approach. Examples:

'foo {*name:<sep>} bar'.format(name=<expr>) 'foo {*0:<sep>} bar {1}'.format(x, y) 'foo {*:<sep>} bar'.format(x)

2) As for including an additional format spec to apply to the elements of the iterable: I decided against including this in the original proposal to keep it simple and to get feedback on the general idea first. The problem here is that any solution requires an additional token to indicate the boundary between the <separator> part and the element format spec. Since you would not want to have anyone's custom format spec broken by this, this boils down to disallowing one reserved character in the <separator> part, like in Oscar's example:

'foo {*name:<sep>:<fmt>} bar'.format(name=<expr>)

where <sep> cannot contain a colon.

So that character would have to be chosen carefully (both : and | are quite readable, but also relatively common element separators I guess). In addition, the <separator> part should be non-optional (though the empty string should be allowed) to guarantee the presence of the delimiter token, which avoids accidental splitting of lonely element format specs into a "<sep>" and <fmt> part:

# format the elements of name using <fmt>, join them using <sep> 'foo {*name:<sep>:<fmt>} bar'.format(name=<expr>) # format the elements of name using <fmt>, join them using '' 'foo {*name::<fmt>} bar'.format(name=<expr>) # a syntax error 'foo {*name:<fmt>} bar'.format(name=<expr>)

On the other hand, these restriction do not look too dramatic given the flexibility gain in most situations.

So to sum up how this could work: If str.format encounters a leading * in a replacement field, it splits the format spec (i.e. everything after the first colon) on the first occurrence of the <sep>|<fmt> separator (possibly ':' or '|') and does, essentially:

<sep>.join(format(e, <fmt>) for e in iterable)

Without the *, it just works the current way.

3) Finally, the alternative idea of having the new functionality handled by a new !converter, like:

"List: {0!j:,}".format([1.2, 3.4, 5.6])

I considered this idea before posting the original proposal, but, in addition to requiring a change to str.format (which would need to recognize the new token), this approach would need either:

- a new special method (e.g., __join__) to be implemented for every type that should support it, which is worse than for my original proposal or

- the str.format method must react directly to the converter flag, which is then no different to the above solution just that it uses !j instead of *. Personally, I find the * syntax more readable, plus, the !j syntax would then suggest that this is a regular converter (calling a special method of the object) when, in fact, it is not. Please correct me, if I misunderstood something about this alternative proposal.

Best, Wolfgang

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Paul Moore

4:41 p.m.

On 9 September 2015 at 15:32, Wolfgang Maier <wolfgang.maier@biologie.uni-freiburg.de> wrote:

...

Well, here it is:

def unpack_format (iterable, format_spec=None): if format_spec: try: sep, element_fmt = format_spec.split('|', 1) except ValueError: raise TypeError('Invalid format_spec for iterable formatting') return sep.join(format(e, element_fmt) for e in iterable)

usage examples:

# '0.00, 1.00, 2.00, 3.00, 4.00, 5.00, 6.00, 7.00, 8.00, 9.00' '{}'.format(unpack_format(range(10), ', |.2f'))

# '0.001.002.003.004.005.006.007.008.009.00' '{}'.format(unpack_format(range(10), '|.2f'))

# invalid syntax '{}'.format(unpack_format(range(10), '.2f'))

Honestly, it seems to me that def format_iterable(it, spec, sep=', '): return sep.join(format(e, spec) for e in it) # '0.00, 1.00, 2.00, 3.00, 4.00, 5.00, 6.00, 7.00, 8.00, 9.00' format_iterable(range(10), '.2f') # '0.001.002.003.004.005.006.007.008.009.00' format_iterable(range(10), '.2f', sep='') is perfectly adequate. It reads more clearly to me than the "sep|fmt" syntax does, as well. Paul

Wolfgang Maier

4:41 p.m.

Or with default behavior when there is no format_spec: def unpack_format (iterable, format_spec=None): if format_spec: try: sep, element_fmt = format_spec.split('|', 1) except ValueError: raise TypeError('Invalid format_spec for iterable formatting') return sep.join(format(e, element_fmt) for e in iterable) else: return ' '.join(format(e) for e in iterable) On 09.09.2015 16:32, Wolfgang Maier wrote:

...

Well, here it is:

def unpack_format (iterable, format_spec=None): if format_spec: try: sep, element_fmt = format_spec.split('|', 1) except ValueError: raise TypeError('Invalid format_spec for iterable formatting') return sep.join(format(e, element_fmt) for e in iterable)

usage examples:

# '0.00, 1.00, 2.00, 3.00, 4.00, 5.00, 6.00, 7.00, 8.00, 9.00' '{}'.format(unpack_format(range(10), ', |.2f'))

# '0.001.002.003.004.005.006.007.008.009.00' '{}'.format(unpack_format(range(10), '|.2f'))

# invalid syntax '{}'.format(unpack_format(range(10), '.2f'))

Best, Wolfgang

On 09.09.2015 16:02, Eric V. Smith wrote:

...
At some point, instead of complicating how format works internally, you should just write a function that does what you want. I realize there's a continuum between '{}'.format(iterable) and '{<really-really-complex-stuff}'.format(iterable). It's not clear where to draw the line. But when the solution is to bake knowledge of iterables into .format(), I think we've passed the point where we should switch to a function: '{}'.format(some_function(iterable)).

In any event, If you want to play with this, I suggest you write some_function(iterable) that does what you want, first.

Eric.

On 9/9/2015 9:41 AM, Wolfgang Maier wrote:

...
Thanks for all the feedback!

Just to summarize ideas and to clarify what I had in mind when proposing this:

1) Yes, I would like to have this work with any (or at least most) iterables, not just with my own custom type that I used for illustration. So having this handled by the format method rather than each object's __format__ method could make sense. It was just simple to implement it in Python through the __format__ method.

Why did I propose * as the first character of the new format spec string? Because I think you really need some token to state unambiguously[1] that what follows is a format specification that involves going through the elements of the iterable instead of working on the container object itself. I thought that * is most intuitive to understand because of its use in unpacking.

[1] unfortunately, in my original proposal the leading * can still be ambiguous because *<, *> *= and *^ could mean element joining with <, >, = or ^ as separators or aligning of the container's formatted string representation using * as the fill character.

Ideally, the * should be the very first thing inside a replacement field - pretty much as suggested by Oscar - and should not be part of the format spec. This is not feasible through a format spec handled by the __format__ method, but through a modified str.format method, i.e., that's another argument for this approach. Examples:

'foo {*name:<sep>} bar'.format(name=<expr>) 'foo {*0:<sep>} bar {1}'.format(x, y) 'foo {*:<sep>} bar'.format(x)

2) As for including an additional format spec to apply to the elements of the iterable: I decided against including this in the original proposal to keep it simple and to get feedback on the general idea first. The problem here is that any solution requires an additional token to indicate the boundary between the <separator> part and the element format spec. Since you would not want to have anyone's custom format spec broken by this, this boils down to disallowing one reserved character in the <separator> part, like in Oscar's example:

'foo {*name:<sep>:<fmt>} bar'.format(name=<expr>)

where <sep> cannot contain a colon.

So that character would have to be chosen carefully (both : and | are quite readable, but also relatively common element separators I guess). In addition, the <separator> part should be non-optional (though the empty string should be allowed) to guarantee the presence of the delimiter token, which avoids accidental splitting of lonely element format specs into a "<sep>" and <fmt> part:

# format the elements of name using <fmt>, join them using <sep> 'foo {*name:<sep>:<fmt>} bar'.format(name=<expr>) # format the elements of name using <fmt>, join them using '' 'foo {*name::<fmt>} bar'.format(name=<expr>) # a syntax error 'foo {*name:<fmt>} bar'.format(name=<expr>)

On the other hand, these restriction do not look too dramatic given the flexibility gain in most situations.

So to sum up how this could work: If str.format encounters a leading * in a replacement field, it splits the format spec (i.e. everything after the first colon) on the first occurrence of the <sep>|<fmt> separator (possibly ':' or '|') and does, essentially:

<sep>.join(format(e, <fmt>) for e in iterable)

Without the *, it just works the current way.

3) Finally, the alternative idea of having the new functionality handled by a new !converter, like:

"List: {0!j:,}".format([1.2, 3.4, 5.6])

I considered this idea before posting the original proposal, but, in addition to requiring a change to str.format (which would need to recognize the new token), this approach would need either:

- a new special method (e.g., __join__) to be implemented for every type that should support it, which is worse than for my original proposal or

- the str.format method must react directly to the converter flag, which is then no different to the above solution just that it uses !j instead of *. Personally, I find the * syntax more readable, plus, the !j syntax would then suggest that this is a regular converter (calling a special method of the object) when, in fact, it is not. Please correct me, if I misunderstood something about this alternative proposal.

Best, Wolfgang

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Paul Moore

4:58 p.m.

On 9 September 2015 at 15:41, Wolfgang Maier <wolfgang.maier@biologie.uni-freiburg.de> wrote:

...

def unpack_format (iterable, format_spec=None): if format_spec: try: sep, element_fmt = format_spec.split('|', 1) except ValueError: raise TypeError('Invalid format_spec for iterable formatting') return sep.join(format(e, element_fmt) for e in iterable) else: return ' '.join(format(e) for e in iterable)

...

From the docs, "The default format_spec is an empty string which usually gives the same effect as calling str(value)"

So you can just use format_spec='' and avoid the extra conditional logic. Paul

Mark Lawrence

midnight

On 09/09/2015 15:02, Eric V. Smith wrote:

...

At some point, instead of complicating how format works internally, you should just write a function that does what you want. I realize there's a continuum between '{}'.format(iterable) and '{<really-really-complex-stuff}'.format(iterable). It's not clear where to draw the line. But when the solution is to bake knowledge of iterables into .format(), I think we've passed the point where we should switch to a function: '{}'.format(some_function(iterable)).

In any event, If you want to play with this, I suggest you write some_function(iterable) that does what you want, first.

Eric.

Something like this from Nick Coghlan https://code.activestate.com/recipes/577845-format_iter-easy-formatting-of-a... ??? -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence

Andrew Barnert

11:28 p.m.

On Sep 9, 2015, at 06:41, Wolfgang Maier <wolfgang.maier@biologie.uni-freiburg.de> wrote:

...

3) Finally, the alternative idea of having the new functionality handled by a new !converter, like:

"List: {0!j:,}".format([1.2, 3.4, 5.6])

I considered this idea before posting the original proposal, but, in addition to requiring a change to str.format (which would need to recognize the new token), this approach would need either:

- a new special method (e.g., __join__) to be implemented for every type that should support it, which is worse than for my original proposal or

- the str.format method must react directly to the converter flag, which is then no different to the above solution just that it uses !j instead of *. Personally, I find the * syntax more readable, plus, the !j syntax would then suggest that this is a regular converter (calling a special method of the object) when, in fact, it is not. Please correct me, if I misunderstood something about this alternative proposal.

But the format method already _does_ react directly to the conversion flag. As the docs say, the "type coercion" (call to str, repr, or ascii) happens before formatting, and then the __format__ method is called on the result. A new !j would be a "regular converter"; it just calls a new join function (which returns something whose __format__ method then does the right thing) instead of the str, repr, or ascii functions. And random's custom converter idea would work similarly, except that presumably his !join would specify a function registered to handle the "join" conversion in some way rather than being hardcoded to a builtin.

Wolfgang Maier

12:03 a.m.

On 09.09.2015 23:28, Andrew Barnert via Python-ideas wrote:

...

On Sep 9, 2015, at 06:41, Wolfgang Maier <wolfgang.maier@biologie.uni-freiburg.de> wrote:

...
3) Finally, the alternative idea of having the new functionality handled by a new !converter, like:

"List: {0!j:,}".format([1.2, 3.4, 5.6])

I considered this idea before posting the original proposal, but, in addition to requiring a change to str.format (which would need to recognize the new token), this approach would need either:

- a new special method (e.g., __join__) to be implemented for every type that should support it, which is worse than for my original proposal or

- the str.format method must react directly to the converter flag, which is then no different to the above solution just that it uses !j instead of *. Personally, I find the * syntax more readable, plus, the !j syntax would then suggest that this is a regular converter (calling a special method of the object) when, in fact, it is not. Please correct me, if I misunderstood something about this alternative proposal.

But the format method already _does_ react directly to the conversion flag. As the docs say, the "type coercion" (call to str, repr, or ascii) happens before formatting, and then the __format__ method is called on the result. A new !j would be a "regular converter"; it just calls a new join function (which returns something whose __format__ method then does the right thing) instead of the str, repr, or ascii functions.

Ah, I see! Thanks for correcting me here. Somehow, I had the mental picture that the format converters would call the object's __str__ and __repr__ methods directly (and so you'd need an additional __join__ method for the new converter), but that's not the case then.

...

And random's custom converter idea would work similarly, except that presumably his !join would specify a function registered to handle the "join" conversion in some way rather than being hardcoded to a builtin.

How would such a registration work (sorry, I haven't had the time to search the list for his previous mention of this idea)? A new builtin certainly won't fly. Thanks, Wolfgang

Andrew Barnert

12:39 a.m.

On Sep 9, 2015, at 15:03, Wolfgang Maier <wolfgang.maier@biologie.uni-freiburg.de> wrote:

...

...
On 09.09.2015 23:28, Andrew Barnert via Python-ideas wrote:

...
On Sep 9, 2015, at 06:41, Wolfgang Maier <wolfgang.maier@biologie.uni-freiburg.de> wrote:

3) Finally, the alternative idea of having the new functionality handled by a new !converter, like:

"List: {0!j:,}".format([1.2, 3.4, 5.6])

I considered this idea before posting the original proposal, but, in addition to requiring a change to str.format (which would need to recognize the new token), this approach would need either:

- a new special method (e.g., __join__) to be implemented for every type that should support it, which is worse than for my original proposal or

- the str.format method must react directly to the converter flag, which is then no different to the above solution just that it uses !j instead of *. Personally, I find the * syntax more readable, plus, the !j syntax would then suggest that this is a regular converter (calling a special method of the object) when, in fact, it is not. Please correct me, if I misunderstood something about this alternative proposal.

But the format method already _does_ react directly to the conversion flag. As the docs say, the "type coercion" (call to str, repr, or ascii) happens before formatting, and then the __format__ method is called on the result. A new !j would be a "regular converter"; it just calls a new join function (which returns something whose __format__ method then does the right thing) instead of the str, repr, or ascii functions.

Ah, I see! Thanks for correcting me here. Somehow, I had the mental picture that the format converters would call the object's __str__ and __repr__ methods directly (and so you'd need an additional __join__ method for the new converter), but that's not the case then.

...
And random's custom converter idea would work similarly, except that presumably his !join would specify a function registered to handle the "join" conversion in some way rather than being hardcoded to a builtin.

How would such a registration work (sorry, I haven't had the time to search the list for his previous mention of this idea)? A new builtin certainly won't fly.

I believe he posted a more detailed version of the idea on one of the other spinoff threads from the f-string thread, but I don't have a link. But there are lots of possibilities, and if you want to start bikeshedding, it doesn't matter that much what his original color was. For example, here's a complete proposal: class MyJoiner: def __init__(self, value): self.value = value def __format__(self, spec): return spec.join(map(str, self.value)) string.register_converter('join', MyJoiner) That last line adds it to some global table (maybe string._converters, or maybe it's not exposed at the Python level at all; whatever). In str.format, instead of reading a single character after a !, it reads until colon or end of field; if that's more than a single character, it looks it up in the global table and calls the registered callable. So, in this case, "{spam!join:-}" would call MyJoiner(spam).__format__('-'). Any more complexity can be added to MyJoiner pretty easily, so this small extension to str.format seems sufficient for anything you might want. For example, if you want a three-part format spec that includes the join string, a format spec to pass to each element, and a format spec to apply to the whole thing: def __format__(self, spec): joinstr, _, spec = spec.partition(':') espec, _, jspec = spec.partition(':') bits = (format(e, espec) for e in self.value) joined = joinstr.join(bits) return format(joined, jspec) Or maybe it would be better to have a standard way to do multi-part format specs--maybe even passing arguments to a converter rather than cramming them in the spec--but this seems simple and flexible enough. It might also be worth having multiple converters called in a chain, but I can't think of a use case for that, so I'll ignore it. Most converters will be classes that just store the constructor argument and use it in __format__, so it seems tedious to repeat that boilerplate for 90% of them, but that's easy to fix with a decorator: def simple_converter(func): class Converter: def __init__(self, value): self.value = value def __format__(self, spec): return func(self.value, spec) Meanwhile, maybe you want the register function to be a decorator: def register_converter(name): def decorator(func): _global_converter_table[name] = func return func return decorator So now, the original example becomes: @string.register_converter('join') @string.simple_converter def my_joiner(values, joinstr): return joinstr.join(map(str, values))

random832＠fastmail.us

1:24 a.m.

On Wed, Sep 9, 2015, at 18:39, Andrew Barnert via Python-ideas wrote:

...

I believe he posted a more detailed version of the idea on one of the other spinoff threads from the f-string thread, but I don't have a link. But there are lots of possibilities, and if you want to start bikeshedding, it doesn't matter that much what his original color was. For example, here's a complete proposal:

class MyJoiner: def __init__(self, value): self.value = value def __format__(self, spec): return spec.join(map(str, self.value)) string.register_converter('join', MyJoiner)

Er, I wanted it to be something more like def __format__(self, spec): sep, fmt = # 'somehow' break up spec into two parts return sep.join(map(lambda x: x.__format__(fmt))) And I wasn't the one who actually proposed user-registered converters; I'm not sure who did. At one point in the f-string thread I suggested using a _different_ special !word for stuff like a string that can be inserted into HTML without quoting. I'm also not 100% sure how good an idea it is (since it means either using global state or moving formatting to a class instead of str). The Joiner class wouldn't have to exist as a builtin, it could be private to the format function.

Andrew Barnert

2:03 a.m.

On Sep 9, 2015, at 16:24, random832@fastmail.us wrote:

...

...
On Wed, Sep 9, 2015, at 18:39, Andrew Barnert via Python-ideas wrote: I believe he posted a more detailed version of the idea on one of the other spinoff threads from the f-string thread, but I don't have a link. But there are lots of possibilities, and if you want to start bikeshedding, it doesn't matter that much what his original color was. For example, here's a complete proposal:

class MyJoiner: def __init__(self, value): self.value = value def __format__(self, spec): return spec.join(map(str, self.value)) string.register_converter('join', MyJoiner)

Er, I wanted it to be something more like

def __format__(self, spec): sep, fmt = # 'somehow' break up spec into two parts

I covered later in the same message how this simple version could be extended to a smarter version that does that, or even more, without requiring any further changes to str.format. I just wanted to show the simplest version first, and then show that designing for that doesn't lose any flexibility.

...

And I wasn't the one who actually proposed user-registered converters; I'm not sure who did.

Well, that does make it a bit harder to search for... But anyway, I think the idea is obvious enough once someone's mentioned it that it only matters if everyone decides we should do it, when we want to figure out who to give the credit to.

...

At one point in the f-string thread I suggested using a _different_ special !word for stuff like a string that can be inserted into HTML without quoting. I'm also not 100% sure how good an idea it is (since it means either using global state or moving formatting to a class instead of str).

I don't see why global state is more of a problem here than for any other global registry (sys.modules, pickle/copy, ABCs, registries, etc.). In fact, it seems less likely that, e.g., a multithreaded app container would run into problems with this than with most of those other things, not more likely. And the same ideas for solving those problems (subinterpreters, better IPC so multithreaded app containers aren't necessary, easier switchable contexts, whatever) seem like they'd solve this one just as easily. And meanwhile, the alternative seems to be having something similar, but not exposing it publicly, and just baking in a handful of hardcoded converters for join, html, re-escape, etc., and I don't see why str should know about all of those things, or why extending that set when we realize that we forgot about shlex should require a patch to str and a new Python version.

...

The Joiner class wouldn't have to exist as a builtin, it could be private to the format function.

If it's custom-registerable, it can be on PyPI, or in the middle of your app, although of course there could be some converters, maybe including your Joiner, somewhere in the stdlib, or even private to format, as well.

Wolfgang Maier

10:58 a.m.

On 10.09.2015 02:03, Andrew Barnert via Python-ideas wrote:

...

On Sep 9, 2015, at 16:24, random832@fastmail.us wrote:

...
...
On Wed, Sep 9, 2015, at 18:39, Andrew Barnert via Python-ideas wrote: I believe he posted a more detailed version of the idea on one of the other spinoff threads from the f-string thread, but I don't have a link. But there are lots of possibilities, and if you want to start bikeshedding, it doesn't matter that much what his original color was. For example, here's a complete proposal:

class MyJoiner: def __init__(self, value): self.value = value def __format__(self, spec): return spec.join(map(str, self.value)) string.register_converter('join', MyJoiner)

Er, I wanted it to be something more like

def __format__(self, spec): sep, fmt = # 'somehow' break up spec into two parts

I covered later in the same message how this simple version could be extended to a smarter version that does that, or even more, without requiring any further changes to str.format. I just wanted to show the simplest version first, and then show that designing for that doesn't lose any flexibility.

Ok, I think I got the idea. One question though: how would you prevent this from getting competely out of hand?

...

And meanwhile, the alternative seems to be having something similar, but not exposing it publicly, and just baking in a handful of hardcoded converters for join, html, re-escape, etc., and I don't see why str should know about all of those things, or why extending that set when we realize that we forgot about shlex should require a patch to str and a new Python version.

...
The Joiner class wouldn't have to exist as a builtin, it could be private to the format function.

If it's custom-registerable, it can be on PyPI, or in the middle of your app, although of course there could be some converters, maybe including your Joiner, somewhere in the stdlib, or even private to format, as well.

The strength of this idea - flexibility - could also be called its biggest weakness and that is scaring me. Essentially, such converters would be completely free to do anything they want: change their input at will, return something completely unrelated, have side-effects. All of that hidden behind a simple !token in a replacement field. While the idea is really cool and certainly powerful if used responsibly, it could also create completely unreadable code. Just adding one single hardcoded converter for joining iterables looks like a much more reasonable and realistic idea and now that I understand the concept I have to say I really like it. Just paraphrasing once more to see if a understood things correctly this time: The !j converter converts the iterable to an instance of a Joiner class just like !s, !r and !a convert to a str instance. After that conversion the __format__ method of the new object gets called with the format_spec string (which specifies the separator and the inner format spec) as argument and that method produces the joint string. So everything follows the existing logic of a converter and no really new replacement field syntax is required. Great and +1!

Andrew Barnert

12:27 p.m.

On Sep 10, 2015, at 01:58, Wolfgang Maier <wolfgang.maier@biologie.uni-freiburg.de> wrote:

...

...
On 10.09.2015 02:03, Andrew Barnert via Python-ideas wrote:

...
On Sep 9, 2015, at 16:24, random832@fastmail.us wrote:

...
On Wed, Sep 9, 2015, at 18:39, Andrew Barnert via Python-ideas wrote: I believe he posted a more detailed version of the idea on one of the other spinoff threads from the f-string thread, but I don't have a link. But there are lots of possibilities, and if you want to start bikeshedding, it doesn't matter that much what his original color was. For example, here's a complete proposal:

class MyJoiner: def __init__(self, value): self.value = value def __format__(self, spec): return spec.join(map(str, self.value)) string.register_converter('join', MyJoiner)

Er, I wanted it to be something more like

def __format__(self, spec): sep, fmt = # 'somehow' break up spec into two parts

I covered later in the same message how this simple version could be extended to a smarter version that does that, or even more, without requiring any further changes to str.format. I just wanted to show the simplest version first, and then show that designing for that doesn't lose any flexibility.

Ok, I think I got the idea. One question though: how would you prevent this from getting competely out of hand?

Same way we keep types with weird __format__ methods, nested or multi-clause comprehensions, import hooks, operator overloads like using __ror__ to partial functions, metaclasses, subclass hooks, multiple inheritance, dynamic method lookup, descriptors, etc. from getting completely out of hand: trust users to have some taste, and don't write bad documentation that would convince them to abuse it. :)

...

...
And meanwhile, the alternative seems to be having something similar, but not exposing it publicly, and just baking in a handful of hardcoded converters for join, html, re-escape, etc., and I don't see why str should know about all of those things, or why extending that set when we realize that we forgot about shlex should require a patch to str and a new Python version.

...
The Joiner class wouldn't have to exist as a builtin, it could be private to the format function.

If it's custom-registerable, it can be on PyPI, or in the middle of your app, although of course there could be some converters, maybe including your Joiner, somewhere in the stdlib, or even private to format, as well.

The strength of this idea - flexibility - could also be called its biggest weakness and that is scaring me. Essentially, such converters would be completely free to do anything they want: change their input at will, return something completely unrelated, have side-effects. All of that hidden behind a simple !token in a replacement field. While the idea is really cool and certainly powerful if used responsibly, it could also create completely unreadable code.

There aren't any obvious reasons for anyone to write such unreadable code, so I don't see it being a real attractive nuisance.

...

Just adding one single hardcoded converter for joining iterables looks like a much more reasonable and realistic idea and now that I understand the concept I have to say I really like it.

Just paraphrasing once more to see if a understood things correctly this time: The !j converter converts the iterable to an instance of a Joiner class just like !s, !r and !a convert to a str instance. After that conversion the __format__ method of the new object gets called with the format_spec string (which specifies the separator and the inner format spec) as argument and that method produces the joint string.

So everything follows the existing logic of a converter and no really new replacement field syntax is required. Great and +1!

Yep, and I'm +1 on it as well. But in also at least +0.5 on the custom converter idea, because joining is the fourth idea people have come up with for converters in the past few weeks, and I'd bet there are another few widely-usable ideas, plus some good uses for specific applications (different web frameworks, scientific computing, etc.). When I get a chance, I'll hack something up to play with it and see if it's as useful as I'm expecting.

3447

Age (days ago)

3458

Last active (days ago)

List overview

Download

29 comments

11 participants

participants (11)

Andrew Barnert
Chris Angelico
Eric V. Smith
Mark Lawrence
Oscar Benjamin
Paul Moore
random832＠fastmail.us
Ron Adam
Stephen J. Turnbull
Sven R. Kunze
Wolfgang Maier

new format spec for iterable types

random832＠fastmail.us

Mark Lawrence

random832＠fastmail.us

tags

participants (11)