[Python-Dev] Backporting PEP 3101 to 2.6

Thu Jan 10 20:12:14 CET 2008

Guido van Rossum wrote:
> On Jan 10, 2008 9:57 AM, Eric Smith <eric+python-dev at trueblade.com> wrote:
>> Eric Smith wrote:
>>> 1: How should the builtin format() work?  It takes 2 parameters, an
>>> object o and a string s, and returns o.__format__(s).  If s is None, it
>>> returns o.__format__(empty_string).  In 3.0, the empty string is of
>>> course unicode.  For 2.6, should I use u'' or ''?
>> I just re-read PEP 3101, and it doesn't mention this behavior with None.
>>   The way the code actually works is that the specifier is optional, and
>> if it isn't present then it defaults to an empty string.  This behavior
>> isn't mentioned in the PEP, either.
>>
>> This feature came from a request from Talin[0].  We should either add
>> this to the PEP (and docs), or remove it.  If we document it, it should
>> mention the 2.x behavior (as other places in the PEP do).  If we removed
>> it, it would remove the one place in the backport that's not just hard,
>> but ambiguous.  I'd just as soon see the feature go away, myself.
> 
> IIUC, the 's' argument is the format specifier. Format specifiers are
> written in a very conservative character set, so I'm not sure it
> matters. Or are you assuming that the *type* of 's' also determines
> the type of the output?

Yes, 's' is the format specifier.  I should have used its actual name. 
I'm am saying that the type of 's' determines the type of the output. 
Maybe that's a needless assumption for the builtin format(), since it 
doesn't inspect the value of 's' (other than to verify its type).  But 
for ''.format() and u''.format(), I was thinking it will be true (but 
see below).

It just seems weird to me that the result of format(3, u'd') would be a 
'3', not u'3'.

> I may be in the minority here, but I think I like having a default for
> 's' (as implemented -- the PEP ought to be updated) and I also think
> it should default to an 8-bit string, assuming you support 8-bit
> strings at all -- after all in 2.x 8-bit strings are the default
> string type (as reflected by their name, 'str').

As long as it's defined, I'm okay with it.  I think making the 2.6 
default be an empty str is reasonable.

>>> 3: Every overridden __format__() method is going to have to check for
>>> string or unicode, just like object.__format() does, and return either a
>>> string or unicode object, appropriately.  I don't see any way around
>>> this, but I'd like to hear any thoughts.  I guess there aren't all that
>>> many __format__ methods that will be implemented, so this might not be a
>>> big burden.  I'll of course implement the built in ones.
>> The PEP actually mentions that this is how 2.x will have to work.  So
>> I'll go ahead and implement it that way, on the assumption that getting
>> string support into 2.6 is desirable.
> 
> I think it is. (But then I still live in a predominantly ASCII world.  :-)

I live in that same world, which is why I started implementing this to 
begin with!  I've always been more interested in the ascii version for 
2.6 than for the 3.0 unicode version.  Doing it first in 3.0 was my way 
of getting it into 2.6.

> For data types whose output uses only ASCII, would it be acceptable if
> they always returned an 8-bit string and left it up to the caller to
> convert it to Unicode? This would apply to all numeric types. (The
> date/time types have a strftime() style API which means the user must
> be able to specifiy Unicode.)

I guess in str.format() I could require the result of format(obj, 
format_spec) to be a str, and in unicode.format() I could convert it to 
be unicode, which would either succeed or fail.  I think all I need to 
do is have the numeric formatters work with both unicode and str format 
specifiers, and always return str results.  That should be doable. As 
you say, the format specifiers for the numerics are restricted to 8-bit 
strings, anyway.

Now that I think about it, the str .__format__() will also need to 
accept unicode and produce a str, for this to work:

u"{0}{1}{2}".format('a', u'b', 3)

I'll give these ideas a shot and see how far I get.  Thanks for the 
feedback!

Eric.