[Python-3000] More PEP 3101 changes incoming
Talin
talin at acm.org
Sun Aug 5 06:17:21 CEST 2007
Ron Adam wrote:
> Ron Adam wrote:
>
>> An alternative I thought of this morning is to reuse the alignment symbols
>> '^', '+', and '-' and require a minimum width if a maximum width is specified.
>
> One more (or two) additions to this...
(snipped)
I've kind of lost track of what the proposal is at this specific point.
I like several of the ideas you have proposed, but I think it needs to
be slimmed down even more.
I don't have a particular syntax in mind - yet - but I can tell you what
I would like to see in general.
Guido used the term "mini-language" to describe the conversion specifier
syntax. I think that's a good term, because it implies that it's not
just a set of isolated properties, but rather a grammar where the
arrangement and ordering of things matters.
Like real human languages, it has a "Huffman-coding" property, where the
most commonly-uttered phrases are the shortest. This conciseness is
achieved by sacrificing some degree of orthogonality (in the same way
that a CISC machine instruction is shorter than an equivalent RISC
instruction.) In practical terms it means that the interpretation of a
symbol depends on what comes before it.
So in general common cases should be short, uncommon cases should be
possible. And we don't have to allow every possible combination of
options, just the ones that are most important.
Another thing I want to point out is that Guido and I (in a private
discussion) have resolved our argument about the role of __format__.
Well, not so much *agreed* I guess, more like I capitulated.
But in any case, the deal is that int, float, and decimal all get to
have a __format__ method which interprets the format string for those
types. There is no longer any automatic coercion of types based on the
format string - so simply defining an __int__ method for a type is
insufficient if you want to use the 'd' format type. Instead, if you
want to use 'd' you can simply write the following:
def MyClass:
def __format__(self, spec):
return int(self).__format__(spec)
This at least has the advantage of simplifying the problem quite a bit.
The global 'format(value, spec)' function now just does:
1) check for the 'repr' override, if present return repr(val)
2) call val.__format__(spec) if it exists
3) call str(val).__format__(spec)
Note that this also means that float.__format__ will have to handle 'd'
and int.__format__ will handle 'f', and so on, although this can be done
by explicit type conversion in the __format__ method. (No need for float
to handle 'x' and the like, even though it does work with %-formatting
today.)
> One other feature might be to use the fill syntax form to specify an
> overflow replacement character...
>
> '{0:10+10/#}'.format('Python') -> 'Python '
>
> '{0:10+10/#}'.format('To be, or not to be.') -> '##########'
Yeah, as Guido pointed out in another message that's not going to fly.
A few minor points on syntax of the minilanguage:
-- I like your idea that :xxxx and ,yyyy can occur in any order.
-- I'm leaning towards the .Net conversion spec syntax convention where
the type letter comes first: ':f10'. The idea being that the first
letter changes the interpretation of subsequent letters.
Note that in the .Net case, the numeric quantity after the letter
represents a *precision* specifier, not a min/max field width.
So for example, in .Net having a float field of minimum width 10 and a
decimal precision of 3 digits would be ':f3,10'.
Now, as stated above, there's no 'max field width' for any data type
except strings. So in the case of strings, we can re-use the precision
specifier just like C printf does: ':s10' to limit the string to 10
characters. So 's:10,5' to indicate a max width of 10, min width of 5.
-- There's no decimal precision quantity for any data type except
floats. So ':d10' doesn't mean anything I think, but ':d,10' is minimum
10 digits.
-- I don't have an opinion yet on where the other stuff (sign options,
padding, alignment) should go, except that sign should go next to the
type letter, while the rest should go after the comma.
-- For the 'repr' override, Guido suggests putting 'r' in the alignment
field: '{0,r}'. How that mixes with alignment and padding is unknown,
although frankly why anyone would want to pad and align a repr() is
completely beyond me.
-- Talin
More information about the Python-3000
mailing list