[Python-3000] More PEP 3101 changes incoming

Sun Aug 5 21:41:59 CEST 2007

Talin wrote:
> Ron Adam wrote:
>> Talin wrote:

> Let me define some terms again for the discussion. As noted before, the 
> ',' part is called the alignment specifier. It's no longer appropriate 
> to use the term 'conversion specifier', since we're not doing 
> conversions, so I guess I will stick with the term 'format specifier' 
> for the ':' part.

I don't consider them as conversions, it's all going to end up as either a 
string or an exception at the end.  It's just a matter of the best way to 
get there.

The only case where a conversion of any type *doesn't* happen is when a 
value is already a string and a string specifier is applied to it or there 
is no format specifier.  In all most other cases, some sort of converting 
process occurs, although it may be a manual reading of characters or bytes 
and not an explicit type cast.  And in those cases, its more a matter of 
when it happens rather than how it happens that is important.

Also this is a one directional data path.  The process should never have 
side effects that may effect an object that is passed into a formatter. 
This isn't enforceable, but pythons builtin mechanisms should never do 
that.  Creating new objects in an intermediate step doesn't do that.

> What Guido wants is for the general 'apply_format' function to not 
> examine the format specifier *at all*.

Hmmm...  With this, it becomes much harder to determine what a format 
specifier will do because it depends totally on the objects __format__ 
method implementation.  So the behavior of a specific format specifier may 
change depending on the argument object type.

It also makes the __format__ methods much more complex because you need to 
have them know how to handle a wider variety of possibilities.

What will the built in types __format__ method do if they get a specifier 
they don't know how to handle?  Raise an exception, or fall back to str, or 
repr?

> The reason is that for some types, the __format__ method can define its 
> own interpretation of the format string which may include the letters 
> 'rtgd' as part of its regular syntax. Basically, he wants no constraints 
> on what __format__ is allowed to do.

You suggested the format specification be interpreted like a mini language. 
  That implies there may be global format interpreter that an objects 
__format__ method can call.

Such an interpreter would know how to handle the built in types and be 
extendable. Or we could supply a __format__ method to change the behavior 
if we want something else.  In effect, it moves any 
conversions/interpretations that may happen later in the even chain.

Is this the direction he wants to go in?

Or does he want each built in object to have it's own __format__ method 
independent from each other?

> Given this constraint, it becomes pretty obvious which attributes go in 
> which part. Attributes which are actually involved in generating the 
> text (signs and leading digits) would have to go in the 
> format_specifier, and attributes which are are interpreted by 
> apply_format (such as left/right alignment) would have to go in the 
> alignment specifier.
> 
> Of course, the two can't be entirely isolated because there is 
> interaction between the two specifiers for some types. For example, it 
> would normally be the case that padding is applied by 'apply_format', 
> which knows about the field width and the padding character. However, in 
> the case of an integer that is printed with leading zeros, the sign must 
> come *before* the padding: '+000000010'. It's not sufficient to simply 
> apply padding blindly to the output of __format__, which would give you 
> '000000+10'.
> 
> (Maybe leading zeros and padding are different things? But the 
> __format__ would still need to know the field width, which is usually 
> part of the alignment spec, since it's usually applied as a 
> post-processing step by 'apply_format')

It is different.  That is why earlier I made the distinction between a 
numeric width and a field width.  This would be a numeric width, and it 
would be inside a field which may have it's own minimum width and possibly 
a different fill character.

    '{0:d+6/0,^15/_}'.format(123)  ->      '____+000123____'

This way, the two terms don't have to know about each other.

The same output in some cases can be generated in more than one way, but I 
don't think that is always a bad thing.  Trying to avoid that makes things 
more complex.

>>> There is no longer any automatic coercion of types based on the 
>>> format string
>>
>> Ever?  This seems to contradict below where you say int needs to 
>> handle float, and float needs to handle int.  Can you explain further?
> 
> What I mean is that a float, upon receiving a format specifier of 'd', 
> needs to print the number so that it 'looks like' an integer. It doesn't 
> actually have to convert it to an int. So 'd' in this case is just a 
> synonym for 'f0'.

I will think about this a bit.  It seems to me, the results are the same 
with more work.

What about rounding behaviors, isn't 'f0' different in that regard?

>>> - so simply defining an __int__ method for a type is insufficient if 
>>> you want to use the 'd' format type. Instead, if you want to use 'd' 
>>> you can simply write the following:
>>>
>>>    def MyClass:
>>>       def __format__(self, spec):
>>>          return int(self).__format__(spec)
>>
>>
>> So if an item has an __int__ method, but not a __format__ method, and 
>> you tried to print it with a 'd' format type, it would raise an 
>> exception?
>>
>>  From your descriptions elsewhere in this reply it sounds like it 
>> would fall back to string output.  Or am I missing something?
> 
> Yes, we have to have some sort of fallback if there's no __format__ 
> method at all. My thought here is to coerce to str() in this case.

Will a string have a __format__ method and if so, will the format specifier 
term be forwarded to the string's __format__ method in this case too?

>>> So for example, in .Net having a float field of minimum width 10 and 
>>> a decimal precision of 3 digits would be ':f3,10'.
>>
>> It looks ok to me, but there may be some cases where it could be 
>> ambiguous.   How would you specify leading 0's.  Or would we do that 
>> in the alignment specifier?
>>
>>     {0:f3,-10/0}    '000123.000'
> 
> I'm not sure. This is the one case where the two specifiers interact, as 
> I mentioned above.

Yes, that is way I asked about it. To avoid interaction you need for floats 
to have a 'numeric width'.  And to avoid ambiguities with the precision 
term you need the '.'.

      {0:f+6/0.3}       '-000123.000'
      {0:f+6.3}         '+   456.000'
      {0:f6}            '   789.0'
      {0:f.3}           '42.000'

>> To me there is an underlying consistency with grouping 
>> abstract/indirect types with more concrete types rather than makeing 
>> an exception in the field alignment specifier.
>>
>> Moving repr to the format side sort of breaks the original clean idea 
>> of having a field alignment specifier and separate type format 
>> specifiers.
> 
> The reason for this is because of the constraint that apply_format never 
> looks at the format specifier, so overrides for repr() can only go in 
> the thing that it does look at - the alignment spec.

Ok. But I'm -1 on this for the record.   It creates an exceptional case. 
ie... the format is applied first, except if the alignment term has an 'r' 
in it.

Then what happens to the format specifier term if it exists?  Is it 
forwarded to the string __format__ method here?, ignored?, or is an 
exception raised?

I'm going to think about these issues some more. Maybe I'll change my mind 
  or find another way to 'see' this.

Cheers,
    Ron