On Jan 10, 2008 9:57 AM, Eric Smith firstname.lastname@example.org wrote:
Eric Smith wrote:
(I'm posting to python-dev, because this isn't strictly 3.0 related. Hopefully most people read it in addition to python-3000).
I'm working on backporting the changes I made for PEP 3101 (Advanced String Formatting) to the trunk, in order to meet the pre-PyCon release date for 2.6a1.
I have a few questions about how I should handle str/unicode. 3.0 was pretty easy, because everything was unicode.
1: How should the builtin format() work? It takes 2 parameters, an object o and a string s, and returns o.__format__(s). If s is None, it returns o.__format__(empty_string). In 3.0, the empty string is of course unicode. For 2.6, should I use u'' or ''?
I just re-read PEP 3101, and it doesn't mention this behavior with None. The way the code actually works is that the specifier is optional, and if it isn't present then it defaults to an empty string. This behavior isn't mentioned in the PEP, either.
This feature came from a request from Talin. We should either add this to the PEP (and docs), or remove it. If we document it, it should mention the 2.x behavior (as other places in the PEP do). If we removed it, it would remove the one place in the backport that's not just hard, but ambiguous. I'd just as soon see the feature go away, myself.
IIUC, the 's' argument is the format specifier. Format specifiers are written in a very conservative character set, so I'm not sure it matters. Or are you assuming that the *type* of 's' also determines the type of the output?
I may be in the minority here, but I think I like having a default for 's' (as implemented -- the PEP ought to be updated) and I also think it should default to an 8-bit string, assuming you support 8-bit strings at all -- after all in 2.x 8-bit strings are the default string type (as reflected by their name, 'str').
3: Every overridden __format__() method is going to have to check for string or unicode, just like object.__format() does, and return either a string or unicode object, appropriately. I don't see any way around this, but I'd like to hear any thoughts. I guess there aren't all that many __format__ methods that will be implemented, so this might not be a big burden. I'll of course implement the built in ones.
The PEP actually mentions that this is how 2.x will have to work. So I'll go ahead and implement it that way, on the assumption that getting string support into 2.6 is desirable.
I think it is. (But then I still live in a predominantly ASCII world. :-)
For data types whose output uses only ASCII, would it be acceptable if they always returned an 8-bit string and left it up to the caller to convert it to Unicode? This would apply to all numeric types. (The date/time types have a strftime() style API which means the user must be able to specifiy Unicode.)