[Python-3000] More PEP 3101 changes incoming
Talin
talin at acm.org
Fri Aug 3 08:55:03 CEST 2007
Guido van Rossum wrote:
> My personal suggestion is to stay close to the .NET formatting language:
>
> name_specifier [',' width_specifier] [':' conversion_specifier]
>
> where width_specifier is a positive or negative number giving the
> minimum width (negative for left-alignment) and conversion_specifier
> is passed uninterpreted to the object's __format__ method.
Before I comment on this I think I need to clear up a mismatch between
your understanding of how __format__ works and mine. In particular, why
it won't work for float and int to define a __format__ method.
Remember how I said in your office that it made sense to me there were
two levels of format hooks in .Net? I realize that I wasn't being very
clear at the time - as often happens when my thoughts are racing too
fast for my mouth.
What I meant was that conceptually, there are two stages of
customization, which I will call "pre-coercion" and "post-coercion"
customization.
Before I explain what that means, let me say that I don't think that
this is actually how .Net works, and I'm not proposing that there
actually be two customization hooks. What I want to do is describe an
abstract conceptual model of formatting, in which formatting occurs in a
number of stages.
Pre-coercion formatting means that the real type of the value is used to
control formatting. We don't attempt to convert the value to an int or
float or repr() or anything - instead it's allowed to completely
dominate the interpretation of the format codes. So the case of the
DateTime object interpreting its specifiers as a stftime argument falls
into this case.
In most cases, there won't be a pre-coercion hook. In which case the
formatting proceeds to the next two stages, which are type coercion and
then post-coercion formatting. The type coercion is driven be a
*standard interpretation* of the format specifier. After the value is
converted to the type, we then apply formatting that is specific to that
type.
Now, I always envisioned that __format__ would allow reinterpretation of
the format specifier. Therefore, __format__ fits into this model as a
pre-coercion customization hook - it has to come *before* the type
coercion, because otherwise type information would be destroyed and
__format__ wouldn't work.
But the formatters for int and float have to happen *after* type
coercion. Therefore, those formatters can't be the same as __format__.
> In order to support the use cases for %s and %r, I propose to allow
> appending a single letter 's', 'r' or 'f' to the width_specifier
> (*not* the conversion_specifier):
>
> 'r' always calls repr() on the object;
> 's' always calls str() on the object;
> 'f' calls the object's __format__() method passing it the
> conversion_specifier, or if it has no __format__() method, calls
> repr() on it. This is also the default.
>
> If no __format__() method was called (either because 'r' or 's' was
> used, or because there was no __format__() method on the object), the
> conversion_specifier (if given) is a *maximum* length; this handles
> the pretty common use cases of %.20s and %.20r (limiting the size of a
> printed value).
>
> The numeric types are the main types that must provide __format__().
> (I also propose that for datetime types the format string ought to be
> interpreted as a strftime format string.) I think that
> float.__format__() should *not* support the integer formatting codes
> (d, x, o etc.) -- I find the current '%d' % 3.14 == '3' an abomination
> which is most likely an incidental effect of calling int() on the
> argument (should really be __index__()). But int.__format__() should
> support the float formatting codes; I think '%6.3f' % 12 should return
> ' 12.000'. This is in line with 1/2 returning 0.5; int values should
> produce results identical to the corresponding float values when used
> in the same context. I think this should be solved inside
> int.__format__() though; the generic formatting code should not have
> to know about this.
I don't agree that using the 'd' format type to print floats is an
abomination, but that's because of a difference in design philosophy.
I'm inclined to be permissive in this, because I don't see the benefit
of being pedantic here, and I do see the potential usefulness of
considering 'd' to be the same as 'f' with a precision of 0.
But that's a detail. I want to think about the larger picture.
Earlier I said that there were 6 attributes being controlled by the
various specifiers, but based on the previous discussion there are
actually 8, in no particular order:
-- minimum width
-- maximum width
-- decimal precision
-- alignment
-- padding
-- treatment of signs and negative numbers
-- type coercion options
-- number formatting options for a given type, such as exponential
notation.
That seems a lot of parameters to cram into a lowly format string, and I
can't imagine that anyone would like a system that requires these all to
be specified individually. It would be cumbersome and hard to remember.
Fortunately, we recognize that these parameters are not all independent.
Many combinations of parameters are nonsensical, especially when talking
about non-number types. Therefore, we can can compress the visual
specification of these attributes on a much smaller number of actual
specified format codes.
Traditionally the C sprintf function has done two kinds of
'multiplexing' of these codes. The first is to change the interpretation
of a particular field (such as precision) based on the number formatting
type. The second is to use letters to represent combinations of
attributes - so for example the letter 'd' implies both that it's an
integer type, and also how that integer type should be formatted.
So the challenge is to try and figure out how to represent all of the
sensible permutations of formatting attributes in a way which is both
intuitive and mnemonic.
There are two approaches to making this system programmer friendly: We
can either try to invent the best possible system out of whole cloth, or
we can steal from the past in the hopes that programmers who already
know a previous syntax for format strings will be able to employ their
prior knowledge.
If we decide to create a new system out of whole cloth, then what do we
have to work with? Well, as I see it we have the following tools at our
disposal for encoding meaning in a short form:
-- Various delimiter characters: :,.!#$ and so on.
-- Letters to represent one or more attributes.
-- Numbers to represent scalar quantities
-- The relative ordering of all of the above.
We also have to consider what it means to be 'intuitive'. In this case,
we should consider that the various delimiter characters have
connotations - such as the fact that '.' suggests a decimal point, or
that '<' suggests a left-pointing arrow.
(I should also mention that "a:b,c" looks prettier to my eye than
"a,b:c". There's a reason for this, and its because of Python syntax.
Now, in Python, ':' isn't an operator - but if it was, you would have to
consider its precedence to be very low. Because when we look at an
expression 'if x: a,b' we know that comma binds more tightly than the
colon, and so it's the same thing as saying 'if x: (a,b)'. But in any
case this is purely an aesthetic digression and not terribly weighty.)
That's all I have to say for the moment - I'm still thinking this
through. In any case, I think it's worthwhile to be scrutinizing this
issue at a very low level and examining all of the assumptions.
-- Talin
More information about the Python-3000
mailing list