[Python-3000] String formatting: Conversion specifiers

Tue Jun 6 09:53:38 CEST 2006

I've been slowly working on PEP 3101, specifically fleshing out the 
details, and there's a couple of issues that I wanted to run by the 
group mind here.

Originally, I decided to punt on the issue of field conversion 
specifiers (i.e. %2.2s etc.) and simply say that they were unchanged 
from the existing implementation.

However, I've been looking over the source for PyString_Format, and I'm 
thinking that what the code for handling field conversions is a lot more 
complicated than what we really need here.

Here is a list of the conversion types that are currently supported by 
the % operator. First thing you notice is an eerie similarity between 
this and the documentation for 'sprintf'. :)

Conversion	Meaning	Notes
d	Signed integer decimal.	
i	Signed integer decimal.	
o	Unsigned octal.	(1)
u	Unsigned decimal.	
x	Unsigned hexadecimal (lowercase).	(2)
X	Unsigned hexadecimal (uppercase).	(2)
e	Floating point exponential format (lowercase).	
E	Floating point exponential format (uppercase).	
f	Floating point decimal format.	
F	Floating point decimal format.	
g	Same as "e" if exponent is greater than -4 or less than precision, 
"f" otherwise.	
G	Same as "E" if exponent is greater than -4 or less than precision, 
"F" otherwise.	
c	Single character (accepts integer or single character string).	
r	String (converts any python object using repr()).	(3)
s	String (converts any python object using str()).	(4)
%	No argument is converted, results in a "%" character in the result.	

Now, unlike C, in Python we already know the type of the thing we're 
going to print. So there's no need to tell the system 'this is a float' 
or 'this is an integer'. The only way I could see this being useful is 
if you had a type and wanted it to print out as some different type - 
but is that really the proper role of the string formatter?

Similarly, what does it mean to have an 'unsigned' quantity in Python? 
If you say "print this negative number as unsigned", what does that 
mean? Does it take the absolute value, or does it do what C does and 
takes the number modulo 2^32? Neither seems particularly correct or 
intuitive to me.

So I decided to sit down and rethink the whole conversion specifier 
system. I looked at the docs for the '%' operator, and some other 
languages, and here is what I came up with (this is an excerpt from the 
revised PEP.)

Oh, and I should mention that I have a working implementation of what is 
described below.
--------------------------

Standard Conversion Specifiers

     Most built-in types will support a standard set of conversion
     specifiers. These are similar in concept to the conversion
     specifiers used by the existing '%' operator, however there are
     also a number of significant differences.

     The general form of the standard conversion specifier is:

         [flags][length][.precision][type]

     The brackets ([]) indicate an optional field.

     The flags can be one of the following:

         '+' - indicates that a sign should be used for both
               positive as well as negative numbers (normally only
               negative numbers will have a sign.)

         '<' - Forces the field to be left-aligned within the available
               space (This is the default.)

         '>' - Forces the field to be right-aligned within the
               available space.

         '0' - Causes any leftover space in the field to be filled
               with leading zeros. Note that this option also implies
               that the field is right-aligned.

         ' ' - Causes the leftover space in the field to be filled
               with spaces.

     'length' is the minimum field width. If not specified, then the
     field width will be determined by the content.

     For a numeric value, 'precision' is the number of digits after
     the decimal point that should be displayed.

     Finally, the 'type' determines how the data should be presented.
     It is generally only used for numeric types - string types do
     not need to indicate a type.

     The available types are:

         'b' - Binary. Outputs the number in base 2.
         'c' - Character. Converts the integer to the corresponding
               unicode character before printing.
         'd' - Decimal Integer. Prints only the whole-number portion
               of the number.
         'e' - Exponent notation. Prints the number in scientific
               notation using the letter 'e' to indicate the exponent.
         'E' - Exponent notation. Same as 'e' except it uses an upper
               case 'E' as the separator character.
         'f' - Fixed point. Displays the number as a fixed-point
               number.
         'F' - Fixed point. Same as 'f'.
         'g' - General format. This prints the number as a fixed-point
               number, unless the number is too large, in which case
               it switches to exponent notation.
         'G' - General format. Same as 'g' except switches to 'E'
               if the number gets to large.
         'n' - Number. This is the same as 'g', except that it uses the
               current locale setting to insert the appropriate
               number separator characters.
         'o' - Octal format. Outputs the number in base 8.
         'r' - Repr format. Outputs the value in a format which is
               likely to be readable by the interpreter. Also works
               with non-numeric fields.
         'x' - Hex format. Outputs the number in base 16, using lower-
               case letters for the upper digits.
         'X' - Hex format. Outputs the number in base 16, using upper-
               case letters for the upper digits.
         '%' - Percentage. Multiplies the number by 100 and displays
               in fixed ('f') format, followed by a percent sign.

     For non-built-in types, the conversion specifiers will be specific
     to that type.  An example is the 'datetime' class, whose
     conversion specifiers might look something like the arguments
     to the strftime() function:

         "Today is: {0:a b d H:M:S Y}".format(datetime.now())

-- Talin