[Python-3000] PEP - string.format

Sat Apr 22 06:55:52 CEST 2006

Ian Bicking <ianb <at> colorstudy.com> writes:

>

...A lot of good ideas, and I haven't had a chance to consider them all. Let me
respond to what I can.

>      Brace characters ('curly braces') are used to indicate a
>      replacement field within the string:
> 
>          "My name is {0}".format( 'Fred' )
> 
> While I've argued in an earlier thread that $var is more conventional, 
> honestly I don't care (except that %(var)s is not very nice).  A couple 
> other people also preferred $var, but I don't know if they have 
> particularly strong opinions either.

I don't care much either; however as you noted, the nested form has some
possible uses, which seems harder to do with just a prefix.

> Does } have to be escaped?  Or just optionally escaped?  I assume this 
> is not a change to string literals, so we're relying on '\{' producing 
> the same thing as '\\{' (which of course it does).

Good points.

> Thus you can't nest formatters, e.g., {0:pad(23):xmlquote}, unless the 
> underlying object understands that.  Which is probably unlikely.

At this point, I'm thinking not, although I could be convinced otherwise.
Remember, that you can accomplish all of the same things by processing the input
arguments; The conversion specifiers are a convenience.

Also, in your model, there would be a distinction between the first specifier
(which converts the object to a string), and subsequent ones (which modify the
string). My complexity senses are tingling...

> Potentially : could be special, but \: would be pass the ':' to the 
> underlying formatter.  Then {x:pad(23):xmlquote} would mean 
> format(format(x, 'pad(23)'), 'xmlquote')
>
> Also, I note that {} doesn't naturally nest in this specification, you 
> have to quote those as well.  E.g.: {0:\{a:b\}}.  But I don't really see 
> why you'd be inclined to use {} in a formatter anyway ([] and () seem 
> much more likely).

Actually, I meant for them to nest, I just didn't say it right.

> Also, some parsing will be required in these formatters, e.g., pad(23) 
> is not parsed in any way and so it's up to the formatter to handle that 
> (and may use different rules than normal Python syntax).

I've been wondering about that too.

>      When using the 'fformat' variant, it is possible to omit the field
>      name entirely, and simply include the conversion specifiers:
> 
>          "My name is {:pad(23)}"
> 
>      This syntax is used to send special instructions to the custom
>      formatter object (such as instructing it to insert padding
>      characters up to a given column.)  The interpretation of this
>      'empty' field is entirely up to the custom formatter; no
>      standard interpretation will be defined in this PEP.
> 
>      If a custom formatter is not being used, then it is an error to
>      omit the field name.
> 
> This sounds similar to (?i) in a regex.  I can't think of a good 
> use-case, though, since most commands would be localized to a specific 
> formatter or to the formatting object constructor.  {:pad(23)} seems 
> like a bad example.  {:upper}?  Also, it applies globally (or does it?); 
> that is, the formatter can't detect what markers come after the command, 
> and which come before.  So {:upper} seems like a bad example.

Right. My thought was that the empty field name would instruct the custom
formatter to do something special *at that point in the string*. If you want
something done to the entire string, it's probably easier to just pass the
string to a function.

> 
>       3) Otherwise, check the internal formatter within
>          string.format that contains knowledge of certain builtin
>          types.
> 
> If it is a language change, could all those types have __format__ 
> methods added?  Is there any way for the object to accept or decline to 
> do formatting?

Good question. I suspect that it may be impractical to add __format__ to all
built-in types, so we should plan to allow a fallback to an internal formatter.

>       4) Otherwise, call str() or unicode() as appropriate.
> 
> Is there a global repr() formatter, like %r?  Potentially {0:repr} could 
> be implemented the same way by convention, including in object.__format__?

Good idea. (Should there be a *global* custom formatter? Plugins? Subject of a
separate PEP I think.)

>      The formatter should examine the type of the object and the
>      specifier string, and decide whether or not it wants to handle
>      this field. If it decides not to, then it should return False
>      to indicate that the default formatting for that field should be
>      used; Otherwise, it should call builder.append() (or whatever
>      is the appropriate method) to concatenate the converted value
>      to the end of the string, and return True.
> 
> Well, I guess this is the use case, but it feels a bit funny to me.  A 
> concrete use case would be appreciated.

The main use case was that the formatter might need to examine the part of the
string that's already been built. For example, it can't handle expansion of tabs
unless it knows the current column index. I had originally planned to pass only
the column index, but that seemed too special-case to me.

>      A fairly high degree of convenience for relatively small risk can
>      be obtained by supporting the getattr (.) and getitem ([])
>      operators.  While it is certainly possible that these operators
>      can be overloaded in a way that a maliciously written string could
>      exploit their behavior in nasty ways, it is fairly rare that those
>      operators do anything more than retargeting to another container.
>      On other other hand, the ability of a string to execute function
>      calls would be quite dangerous by comparison.
> 
> It could be a keyword option to enable this.  Though all the keywords 
> are kind of taken.  This itself wouldn't be an issue if ** wasn't going 
> to be used so often.

The keywords are all taken - but there are still plenty of method names
available :) That's why "fformat" has a different method name, so that we can
distinguish the custom formatter parameter from the rest of the params.

Unfortunately, this can't be used too much, or you get a combinatorial explosion
of method names:

   string.format
   string.fformat
   string.format_dict
   string.fformat_dict
   ...

>      One other thing that could be done to make the debugging case
>      more convenient would be to allow the locals() dict to be omitted
>      entirely.  Thus, a format function with no arguments would instead
>      use the current scope as a dictionary argument:
> 
>          print "Error in file {p.file}, line {p.line}".format()
> 
>      An alternative would be to dedicate a special method name, other
>      than 'format' - say, 'interpolate' or 'lformat' - for this
>      behavior.
> 
> It breaks some conventions to have a method that looks into the parent 
> frame; but the use cases are very strong for this.  Also, if attribute 
> access was a keyword argument potentially that could be turned on by 
> default when using the form that pulled from locals().

To be honest, I'd be willing to drop this whole part of the proposal if that's
what the folks here would like. I like to present all options, but that doesn't
mean that I myself am in favor of all of them.

I realize that there are some use cases for it, but I don't know if the use
cases are significantly better.

>      - Backquoting. This method has the benefit of minimal syntactical
>        clutter, however it lacks many of the benefits of a function
>        call syntax (such as complex expression arguments, custom
>        formatters, etc.)
> 
> It doesn't have any natural nesting, nor any way to immediately see the 
> difference between opening and closing an expression.  It also implies a 
> relation to shell ``, which evaluates the contents.  I don't see any 
> benefit to backquotes.

Agreed 100%.

> Personally I'm very uncomfortable with using str.format(**args) for all 
> named substitution.  It removes the possibility of non-enumerable 
> dictionary-like objects, and requires a dictionary copy whenever an 
> actual dictionary is used.

Right. Let me think about this one.

> In the case of positional arguments it is currently an error if you 
> don't use all your positional arguments with %.  Would it be an error in 
> this case?

I dunno. It certainly could be implemented that way, but I am not sure why it
should be.

> Should the custom formatter get any opportunity to finalize the 
> formatted string (e.g., "here's the finished string, give me what you 
> want to return")?

Easier to just pass the string to a function I think.

Great stuff Ian, thanks for spending the time to write such a detailed critique.

-- Talin