[Python-3000] More PEP 3101 changes incoming

Thu Aug 9 12:42:56 CEST 2007

Talin wrote:
> Ron Adam wrote:
>> Talin wrote:
>>> Ron Adam wrote:
>>>> Now here's the problem with all of this.  As we add the widths back 
>>>> into the format specifications, we are basically saying the idea of 
>>>> a separate field width specifier is wrong.
>>>>
>>>> So maybe it's not really a separate independent thing after all, and 
>>>> it just a convenient grouping for readability purposes only.
>>>
>>> I'm beginning to suspect that this is indeed the case.
>>
>> Yes, I believe so even more after experimenting last night with 
>> specifier objects.
>>
>> for now I'm using ','s for separating *all* the terms.  I don't intend 
>> that should be used for a final version, but for now it makes parsing 
>> the terms and getting the behavior right much easier.
>>
>>      f,.3,>7     right justify in field width 7, with 3 decimal places.
>>
>>      s,^10,w20   Center in feild 10,  expands up to width 20.
>>
>>      f,.3,%
>>
>> This allows me to just split on ',' and experiment with ordering and 
>> see how some terms might need to interact with other terms and how to 
>> do that without having to fight the syntax problem for now.
>>
>> Later the syntax can be compressed and tested with a fairly complete 
>> doctest as a separate problem.
> 
> When you get a chance, can you write down your current thinking in a 
> single document? Right now, there are lots of suggestions scattered in a 
> bunch of different messages, some of which have been superseded, and 
> it's hard to sew them together.

I'll see what I can come up with.  But I think you pretty much covered it 
below.

> At this point, I think that as far as the mini-language goes, after 
> wandering far afield from the original PEP we have arrived at a design 
> that's not very far - at least semantically - from what we started with. 

Yes, I agree.

> In other words, other than the special case of 'repr', we find that 
> pretty much everything can fit into a single specifier string; Attempts 
> to break it up into two independent specifiers that are handled by two 
> different entities run into the problem that the specifiers aren't 
> independent and there are interactions between the two. Because the 
> dividing line between "format specifier" and "alignment specifier" 
> changes based on the type of data being formatted, trying to keep them 
> separate results in redundancy and duplication, where we end up with 
> more than one way to specify padding, alignment, or minimum width.

Yes.

Another deciding factor is weather or not users want a general formatting 
language that is very flexible, and allows them to combine and order 
instructions to do a wide variety of things.  Some of which may not make 
much sense.  (Just like you can create regular expressions that don't make 
sense.)

Or do they want an option based system that limits what they can do to a 
set of well defined behaviors?

It seems having well defined behaviors (limited to things that make sense.) 
is preferred.

(Although I prefer the former myself.)

> So I'm tempted to just use what's in the PEP now as a starting point - 
> perhaps re-arranging the order of attributes, as has been discussed, or 
> perhaps not - and then handling 'repr' via a different prefix character 
> other than ':'. The 'repr' flag does nothing more than call __repr__ on 
> the object, and then call __format__ on the result using whatever 
> conversion spec was specified. (There might be a similar flag that does 
> a call to __str__, which has the effect of calling str.__format__ 
> instead of the object's native __format__ function.)

The way to think of 'repr' and 'str' is that of a general "object" format 
type/specifier.  That puts str and repr into the same context as the rest 
of the format types.  This is really a point of view issue and not so much 
of a semantic one.  I think {0:r} and {0:s} are to "object", as {0:d} and 
{0:e} are to "float" ...  just another relationship relative to the value 
being formatted.  So I don't understand the need to treat them differently.

> As far as requiring the different built-in versions of __format__ to 
> have to parse the standard conversion specifier, that is not a problem 
> in practice, as we'll have a little mini-parser that parses the 
> conversion spec and fills in a C struct. There will also be a 
> Python-accessible version of the same thing for people extending 
> formatters in Python.

This is not too far from what I was thinking then.

I'm not sure I can add much to that.

My current experimental implementation, allows for pre-parsing a format 
string so the parsing step can be moved outside of a loop and doesn't have 
to be reparsed on each use, or it can be examined and possibly modified 
before applying it to arguments.

I'm not sure how useful that is, but instead of iterating a string and 
handling each item sequentially, it parses the whole string and all the 
format fields at one time, then formats all the arguments, then does a 
list.join() operation to combine them.  This may be faster in pure python, 
but probably slower in C.

> So, the current action items are:
> 
> 1) Get consensus the syntax of the formatting mini-language.

Putting the syntax first can introduce side effects or limitations as a 
result of the syntax.  So this might be better as a later step.

By getting a consensus on the exact behaviors and then proceeding to the 
implementation, I think it will move things along faster.  While this is 
for the most part is in the pep, I think any loose ends on the behavior 
side should be nailed down completely before the final syntax is worked out.

Then we can find a syntax that works with the implementation, rather than 
try to make the implementation work with the syntax.

> 2) Create a pure-python implementation of the global 'format' function, 
> which will be a new standard library function that formats a single 
> value, given a conversion spec:
> 
>    format(value, conversion)
> 
> 3) Write implementations of str.__format__, int.__format__, 
> float.__format__, decimal.__format__ and so on.
> 
> 4) Create C implementations of the above.
> 
> 5) Write the code for complex, multi-value formatting as specified in 
> the PEP, and hook up to the built-in string class.

I think finishing up #2 and #3 should come first with very extensive tests. 
  (Using what ever syntax works for now.)

I've been going over the tests in the sand box trying to get my 
experimental version to pass them.  Once I get it to pass most of them I'll 
send you a copy.

BTW.. I noticed str.center() has an odd behavior of alternating uneven 
padding widths on odd or even lengths strings.  Is this intentional?

 >>> 'a'.center(2)
'a '
 >>> 'aa'.center(3)
' aa'

Cheers,
    Ron