[Python-Dev] transitioning from % to {} formatting

Thu Oct 1 18:49:24 CEST 2009

On approximately 9/30/2009 4:03 PM, came the following characters from
the keyboard of Vinay Sajip:
> Steven Bethard <steven.bethard <at> gmail.com> writes:
> 
>> There's a lot of code already out there (in the standard library and
>> other places) that uses %-style formatting, when in Python 3.0 we
>> should be encouraging {}-style formatting. We should really provide
>> some sort of transition plan. Consider an example from the logging
>> docs:
>>
>> logging.Formatter("%(asctime)s - %(name)s - %(levelname)s - %(message)s")
>>
>> We'd like to support both this style as well as the following style:
>>
>> logging.Formatter("{asctime} - {name} - {levelname} - {message}")
>>
> 
> In logging at least, there are two different places where the formatting issue
> crops up.
> 
> The first is creating the "message" part of the the logging event, which is
> made up of a format string and arguments.
> 
> The second is the one Steven's mentioned: formatting the message along with
> other event data such as time of occurrence, level, logger name etc. into the
> final text which is output.

It seems to me that most of the discussion of in this thread is
concerned with the first issue... and yet I see the second as the harder
issue, and it has gotten less press.

Abstracting this away from logger, I think the problem has three cases:

1) Both the format message and all the parameters are supplied in a
single API call.  This is really a foolish API, because

    def API( fmt, p1, p2, p3 ):
	str = fmt % (p1, p2, p3)

could have just as easily been documented originally as

    def API( str ):

where the user is welcome to supply a string such as

    API( fmt % (p1, p2, p3 ))

and if done this way, the conversion to .format is obvious... and all
under the users control.

2) The format message and the parameters are supplied to separate APIs,
because the format message is common to many invocations of the other
APIs that supply parameters, and is cached by the API.  This is
sufficient to break the foolishness of #1, but is really just a subset
of #3, so any solutions to #3 apply here.

3) The format message and the parameters for it may be supplied by the
same or separate APIs, but one or both are incomplete, and are augmented
by the API.  In other words, one or both of the following cases:

3a) The user supplied format message may include references to named
parameters that are documented by the API, and supplied by the API,
rather than by the user.

3b) The user supplied format string may be embedded into a larger format
string by the API, which contains references to other values that the
user must also supply.

In either case of 3a or 3b, the user has insufficient information to
perform the whole format operation and pass the result to the API.

In both cases, the API that accepts the format string must be informed
whether it is a % or {} string, somehow.  This could be supplied to the
API that accepts the string, or to some other related API that sets a
format mode.  Internally, the code would have to be able to manipulate
both types of formats.

> Support for both % and {} forms in logging would need to be considered in
> these two places. I sort of liked Martin's proposal about using different
> keyword arguments, but apart from the ugliness of "dicttemplate" and the fact
> that "fmt" is already used in Formatter.__init__ as a keyword argument, it's
> possible that two different keyword arguments "fmt" and "format" both referring
> to format strings might be confusing to some users.
> 
> Benjamin's suggestion of providing a flag to Formatter seems slightly better,
> as it doesn't change what existing positional or keyword parameters do, and
> just adds an additional, optional parameter which can start off with a default
> of False and transition to a default of True.
>
> However, AFAICT these approaches only cover the second area where formatting
> options are chosen - not the creation of the message from the parameters passed
> to the logging call itself. 

The above three paragraphs are unclear to me.  I think they might be
referring to case 2 or 3, though.

> Of course one can pass arbitrary objects as messages which contain their own
> formatting logic. This has been possible since the very first release but I'm
> not sure that it's widely used, as it's usually easier to pass strings. So
> instead of passing a string and arguments such as
> 
> logger.debug("The %s is %d", "answer", 42)
> 
> one can currently pass, for a fictitious class PercentMessage,
> 
> logger.debug(PercentMessage("The %s is %d", "answer", 42))
> 
> and when the time comes to obtain the formatted message, LogRecord.getMessage
> calls str() on the PercentMessage instance, whose __str__ will use %-formatting
> to get the actual message.
> 
> Of course, one can also do for example
> 
> logger.debug(BraceMessage("The {} is {}", "answer", 42))
> 
> where the __str__() method on the BraceMessage will do {} formatting.
> 
> Of course, I'm not suggesting we actually use the names PercentMessage and
> BraceMessage, I've just used them there for clarity.

It seems that the above is only referring to case 1?  And doesn't help
with case 2 or 3?

> Also, although Raymond has pointed out that it seems likely that no one ever
> needs *both* types of format string, what about the case where application A
> depends on libraries B and C, and they don't all share the same preferences
> regarding which format style to use? ISTM no-one's brought this up yet, but it
> seems to me like a real issue. It would certainly appear to preclude any
> approach that configured a logging-wide or logger-wide flag to determine how to
> interpret the format string.

Agreed here... a single global state would not make modular upgrades to
a complex program easy... the state would be best included with
particular instance objects, especially when such instance objects exist
already.  The format type parameter could be provided to the instance,
instead of globally.

> Another potential issue is where logging events are pickled and sent over
> sockets to be finally formatted and output on different machines. What if a
> sending machine has a recent version of Python, which supports {} formatting,
> but a receiving machine doesn't? It seems that at the very least, it would
> require a change to SocketHandler and DatagramHandler to format the "message"
> part into the LogRecord before pickling and sending. While making this change
> is simple, it represents a potential backwards-incompatible problem for users
> who have defined their own handlers for doing something similar.
> 
> Apart from thinking through the above issues, the actual formatting only
> happens in two locations - LogRecord.getMessage and Formatter.format - so
> making the code do either %- or {} formatting would be simple, as long as it
> knows which of % and {} to pick.
> 
> Does it seems too onerous to expect people to pass an additional "use_format"
> keyword argument with every logging call to indicate how to interpret the
> message format string? Or does the PercentMessage/BraceMessage type approach
> have any mileage? What do y'all think?

These last 3 paragraphs seem to be very related to logger, specifically.
  The first of the 3 does point out a concern for systems that
interoperate across networks: if the format strings and parameters are
exposed separately across networks, whatever types are sent must be
usable at the receiver, or at least appropriate version control must be
required so that incompatible systems can be detected and reported.

On approximately 9/30/2009 5:47 PM, came the following characters from
the keyboard of Antoine Pitrou:
> Vinay Sajip <vinay_sajip <at> yahoo.co.uk> writes:
>> Does it seems too onerous to expect people to pass an additional
>> "use_format" keyword argument with every logging call to indicate 
how >> to interpret the message format string? Or does the
>> PercentMessage/BraceMessage type approach have any mileage? What do
>> y'all think?
>
> What about the proposal I made earlier?
> (support for giving a callable, so that you pass the "{foobar}".format
> method when you want new-style formatting)

This "callable" technique seems to only support case 1 and 2, but not 3,
unless I misunderstand it.

-- 
Glenn -- http://nevcal.com/
===========================
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking