[Python-3000] Updating PEP 3101
Talin
talin at acm.org
Sat Jun 2 09:38:40 CEST 2007
Some more thoughts on this, and some questions.
PEP 3101 defines two layers of APIs for string formatting: a low-level
formatting engine, and a high-level set of convenience methods
(primarily str.format).
Both layers have grown complex due to the desire to satisfy feature
requests that have been asked for by various folks. What I would like to
do is move the design back to a more OOWTDI style.
The way I propose to do this is to redesign the low-level engine as a
class, called Formatter, with overridable methods.
To support the high-level API, there will be a single, built-in global
singleton instance of Formatter. Calls to str.format will simply be
routed to this singleton instance.
So for example, when you call:
"The value is {0}".format(1)
This will call:
builtin_formatter.format("The value is {0}", 1)
I'm not sure that it makes any sense to allow the built-in formatter
instance to be replaceable or mutable, since that would cause all string
formatting behavior to change. Also, there's no way to negotiate
conflicts between various library modules that might want different
behavior. Fortunately, the base formatter has no state, so all we have
to worry about is preventing it from being replaced.
Rather, I think it makes more sense to allow people to create their own
Formatter instances and use them directly. This does mean, however, that
people who want to use their own custom Formatter instance won't be able
to use the high-level convenience methods.
The Formatter class has at least three overridable methods:
1) The method that parses a format string into constant characters
and replacement fields.
2) A method that retrieves a field value given a field name or index.
3) A method that formats an individual replacement field, given a
value and a conversion specifier string.
So:
-- If you want a different syntax for format strings, you override
method #1. This satisfies the feature requests of people who wanted
variations in the format string syntax.
-- If you want to be able to change the way that field values are
accessed, you override #2. This satisfies the desire of people who want
to have it automatically access locals() or globals(). You can do this
via passing in those namespaces as a constructor parameter, or if you
want to get fancy, you can do look at the stack frames and figure it out
automatically. The main point is that this functionality won't be built
in by default, but it could be a cookbook recipe.
Another reason to override this method is to change the rules for
tracking what field names are legal. The built-in method does not allow
fields beginning with an underscore to be used as attributes, i.e. you
cannot say "{0._index}" as a format string. If you override the field
value method, however, you can change this behavior.
Similarly, if you want to add/remove functionality to insure that
all positional arguments are used, or change the way errors are handled,
you can do that here as well.
-- If you want to change the way that built-in types are converted
to string form, you override #3. (For non-builtin types you can just add
a __format__ special method to the type.)
The main point is, however, that none of these overrides affect the
behavior of the built-in string.format function.
Now, in the current version of the PEP, all of the things that I just
mentioned can be changed on a per-call basis by passing in
specially-named parameters, i.e.:
"The name is {0._index}".format(1, flags=ALLOW_LEADING_UNDERSCORES)
I'm proposing to eliminate all of that extra flexibility, and instead
say that if you want to be able to do that, use a custom formatter
class, but without the syntactical convenience of str.format.
So my first question is to get a sense of how many people would find
that agreeable. In other words, is it reasonable to require people to
give up the syntactical convenience of "string".format() when they want
to do custom formatting?
My second question deals with implementation. Because 'str' is a
built-in type, all of its methods must be built-in as well, and
therefore implemented in C. If 'str' depends on a built-in formatter
singleton instance, that singleton instance must also be implemented in
C, and must be initialized in the Parser before any calls to str.format.
Since I am not an expert in the internals of the Python interpreter C
code, I would ask how feasible is this?
-- Talin
More information about the Python-3000
mailing list