[Python-3000] More PEP 3101 changes incoming

Sun Aug 5 06:17:21 CEST 2007

Ron Adam wrote:

> Ron Adam wrote:
> 
>> An alternative I thought of this morning is to reuse the alignment symbols 
>> '^', '+', and '-' and require a minimum width if a maximum width is specified.
> 
> One more (or two) additions to this...

(snipped)

I've kind of lost track of what the proposal is at this specific point. 
I like several of the ideas you have proposed, but I think it needs to 
be slimmed down even more.

I don't have a particular syntax in mind - yet - but I can tell you what 
I would like to see in general.

Guido used the term "mini-language" to describe the conversion specifier 
syntax. I think that's a good term, because it implies that it's not 
just a set of isolated properties, but rather a grammar where the 
arrangement and ordering of things matters.

Like real human languages, it has a "Huffman-coding" property, where the 
most commonly-uttered phrases are the shortest. This conciseness is 
achieved by sacrificing some degree of orthogonality (in the same way 
that a CISC machine instruction is shorter than an equivalent RISC 
instruction.) In practical terms it means that the interpretation of a 
symbol depends on what comes before it.

So in general common cases should be short, uncommon cases should be 
possible. And we don't have to allow every possible combination of 
options, just the ones that are most important.

Another thing I want to point out is that Guido and I (in a private 
discussion) have resolved our argument about the role of __format__. 
Well, not so much *agreed* I guess, more like I capitulated.

But in any case, the deal is that int, float, and decimal all get to 
have a __format__ method which interprets the format string for those 
types. There is no longer any automatic coercion of types based on the 
format string - so simply defining an __int__ method for a type is 
insufficient if you want to use the 'd' format type. Instead, if you 
want to use 'd' you can simply write the following:

    def MyClass:
       def __format__(self, spec):
          return int(self).__format__(spec)

This at least has the advantage of simplifying the problem quite a bit. 
The global 'format(value, spec)' function now just does:

    1) check for the 'repr' override, if present return repr(val)
    2) call val.__format__(spec) if it exists
    3) call str(val).__format__(spec)

Note that this also means that float.__format__ will have to handle 'd' 
and int.__format__ will handle 'f', and so on, although this can be done 
by explicit type conversion in the __format__ method. (No need for float 
to handle 'x' and the like, even though it does work with %-formatting 
today.)

> One other feature might be to use the fill syntax form to specify an 
> overflow replacement character...
> 
>     '{0:10+10/#}'.format('Python')                 ->  'Python    '
> 
>     '{0:10+10/#}'.format('To be, or not to be.')   ->  '##########'

Yeah, as Guido pointed out in another message that's not going to fly.

A few minor points on syntax of the minilanguage:

-- I like your idea that :xxxx and ,yyyy can occur in any order.

-- I'm leaning towards the .Net conversion spec syntax convention where 
the type letter comes first: ':f10'. The idea being that the first 
letter changes the interpretation of subsequent letters.

Note that in the .Net case, the numeric quantity after the letter 
represents a *precision* specifier, not a min/max field width.

So for example, in .Net having a float field of minimum width 10 and a 
decimal precision of 3 digits would be ':f3,10'.

Now, as stated above, there's no 'max field width' for any data type 
except strings. So in the case of strings, we can re-use the precision 
specifier just like C printf does: ':s10' to limit the string to 10 
characters. So 's:10,5' to indicate a max width of 10, min width of 5.

-- There's no decimal precision quantity for any data type except 
floats. So ':d10' doesn't mean anything I think, but ':d,10' is minimum 
10 digits.

-- I don't have an opinion yet on where the other stuff (sign options, 
padding, alignment) should go, except that sign should go next to the 
type letter, while the rest should go after the comma.

-- For the 'repr' override, Guido suggests putting 'r' in the alignment 
field: '{0,r}'. How that mixes with alignment and padding is unknown, 
although frankly why anyone would want to pad and align a repr() is 
completely beyond me.

-- Talin