[Python-3000] Updating PEP 3101

Sat Jun 2 09:38:40 CEST 2007

Some more thoughts on this, and some questions.

PEP 3101 defines two layers of APIs for string formatting: a low-level 
formatting engine, and a high-level set of convenience methods 
(primarily str.format).

Both layers have grown complex due to the desire to satisfy feature 
requests that have been asked for by various folks. What I would like to 
do is move the design back to a more OOWTDI style.

The way I propose to do this is to redesign the low-level engine as a 
class, called Formatter, with overridable methods.

To support the high-level API, there will be a single, built-in global 
singleton instance of Formatter. Calls to str.format will simply be 
routed to this singleton instance.

So for example, when you call:

    "The value is {0}".format(1)

This will call:

     builtin_formatter.format("The value is {0}", 1)

I'm not sure that it makes any sense to allow the built-in formatter 
instance to be replaceable or mutable, since that would cause all string 
formatting behavior to change. Also, there's no way to negotiate 
conflicts between various library modules that might want different 
behavior. Fortunately, the base formatter has no state, so all we have 
to worry about is preventing it from being replaced.

Rather, I think it makes more sense to allow people to create their own 
Formatter instances and use them directly. This does mean, however, that 
people who want to use their own custom Formatter instance won't be able 
to use the high-level convenience methods.

The Formatter class has at least three overridable methods:

    1) The method that parses a format string into constant characters 
and replacement fields.
    2) A method that retrieves a field value given a field name or index.
    3) A method that formats an individual replacement field, given a 
value and a conversion specifier string.

So:

    -- If you want a different syntax for format strings, you override 
method #1. This satisfies the feature requests of people who wanted 
variations in the format string syntax.

    -- If you want to be able to change the way that field values are 
accessed, you override #2. This satisfies the desire of people who want 
to have it automatically access locals() or globals(). You can do this 
via passing in those namespaces as a constructor parameter, or if you 
want to get fancy, you can do look at the stack frames and figure it out 
automatically. The main point is that this functionality won't be built 
in by default, but it could be a cookbook recipe.

    Another reason to override this method is to change the rules for 
tracking what field names are legal. The built-in method does not allow 
fields beginning with an underscore to be used as attributes, i.e. you 
cannot say "{0._index}" as a format string. If you override the field 
value method, however, you can change this behavior.

    Similarly, if you want to add/remove functionality to insure that 
all positional arguments are used, or change the way errors are handled, 
you can do that here as well.

    -- If you want to change the way that built-in types are converted 
to string form, you override #3. (For non-builtin types you can just add 
a __format__ special method to the type.)

The main point is, however, that none of these overrides affect the 
behavior of the built-in string.format function.

Now, in the current version of the PEP, all of the things that I just 
mentioned can be changed on a per-call basis by passing in 
specially-named parameters, i.e.:

    "The name is {0._index}".format(1, flags=ALLOW_LEADING_UNDERSCORES)

I'm proposing to eliminate all of that extra flexibility, and instead 
say that if you want to be able to do that, use a custom formatter 
class, but without the syntactical convenience of str.format.

So my first question is to get a sense of how many people would find 
that agreeable. In other words, is it reasonable to require people to 
give up the syntactical convenience of "string".format() when they want 
to do custom formatting?

My second question deals with implementation. Because 'str' is a 
built-in type, all of its methods must be built-in as well, and 
therefore implemented in C. If 'str' depends on a built-in formatter 
singleton instance, that singleton instance must also be implemented in 
C, and must be initialized in the Parser before any calls to str.format.

Since I am not an expert in the internals of the Python interpreter C 
code, I would ask how feasible is this?

-- Talin