[Python-3000] String formatting: Conversion specifiers

Wed Jun 7 07:36:49 CEST 2006

Nick Coghlan wrote:
> Talin wrote:
> 
>> So I decided to sit down and rethink the whole conversion specifier 
>> system. I looked at the docs for the '%' operator, and some other 
>> languages, and here is what I came up with (this is an excerpt from 
>> the revised PEP.)
> 
> 
> Generally nice, but I'd format the writeup a bit differently (see below) 
> and reorder the elements so that an arbitrary character can be supplied 
> as the fill character and the old ' ' sign flag behaviour remains 
> available.
> 
> I'd also design it so that the standard conversion specifiers are 
> available 'for free' (i.e., they work for any class, unless the class 
> author deliberately replaces them with something else).
> 
> Cheers,
> Nick.
> 

I've taken your proposal as a base, and made some additional changes to 
it. In addition, I've gone ahead and implemented a prototype of the 
built-in formatter based on the revised text.

Note that I decided not to have different specifier syntax for each 
different data type - the reason is because I have a single parser that 
parses the conversion specifier, and it always parses precision, sign, 
etc., even if they are not used by that particular format type. So 
instead, it is simply the case that some specifier options aren't use 
for some format types.

Here is the new text for the section:

Standard Conversion Specifiers

     If an object does not define its own conversion specifiers, a
     standard set of conversion specifiers are used.  These are similar
     in concept to the conversion specifiers used by the existing '%'
     operator, however there are also a number of significant
     differences.  The standard conversion specifiers fall into three
     major categories: string conversions, integer conversions and
     floating point conversions.

     The general form of a standard conversion specifier is:

         [[fill]align][sign][width][.precision][type]

     The brackets ([]) indicate an optional field.

     Then the optional align flag can be one of the following:

         '<' - Forces the field to be left-aligned within the available
               space (This is the default.)
         '>' - Forces the field to be right-aligned within the
               available space.
         '=' - Forces the padding to be placed between immediately
               after the sign, if any. This is used for printing fields
               in the form '+000000120'.

     Note that unless a minimum field width is defined, the field
     width will always be the same size as the data to fill it, so
     that the alignment option has no meaning in this case.

     The optional 'fill' character defines the character to be used to
     pad the field to the minimum width.  The alignment flag must be
     supplied if the character is a number other than 0 (otherwise the
     character would be interpreted as part of the field width
     specifier). A '0' fill character without an alignment flag
     implies an alignment type of '='.

     The 'sign' field can be one of the following:

         '+'  - indicates that a sign should be used for both
                positive as well as negative numbers
         '-'  - indicates that a sign should be used only for negative
                numbers (this is the default behaviour)
         ' '  - indicates that a leading space should be used on
                positive numbers
         '()' - indicates that negative numbers should be surrounded
                by parentheses

     'width' is a decimal integer defining the minimum field width. If
     not specified, then the field width will be determined by the
     content.

     The 'precision' field is a decimal number indicating how many
     digits should be displayed after the decimal point.

     Finally, the 'type' determines how the data should be presented.
     If the type field is absent, an appropriate type will be assigned
     based on the value to be formatted ('d' for integers and longs,
     'g' for floats, and 's' for everything else.)

     The available string conversion types are:

         's' - String format. Invokes str() on the object.
               This is the default conversion specifier type.
         'r' - Repr format. Invokes repr() on the object.

     There are several integer conversion types. All invoke int() on
     the object before attempting to format it.

     The available integer conversion types are:

         'b' - Binary. Outputs the number in base 2.
         'c' - Character. Converts the integer to the corresponding
               unicode character before printing.
         'd' - Decimal Integer. Outputs the number in base 10.
         'o' - Octal format. Outputs the number in base 8.
         'x' - Hex format. Outputs the number in base 16, using lower-
               case letters for the digits above 9.
         'X' - Hex format. Outputs the number in base 16, using upper-
               case letters for the digits above 9.

     There are several floating point conversion types. All invoke
     float() on the object before attempting to format it.

     The available floating point conversion types are:

         'e' - Exponent notation. Prints the number in scientific
               notation using the letter 'e' to indicate the exponent.
         'E' - Exponent notation. Same as 'e' except it uses an upper
               case 'E' as the separator character.
         'f' - Fixed point. Displays the number as a fixed-point
               number.
         'F' - Fixed point. Same as 'f'.
         'g' - General format. This prints the number as a fixed-point
               number, unless the number is too large, in which case
               it switches to 'e' exponent notation.
         'G' - General format. Same as 'g' except switches to 'E'
               if the number gets to large.
         'n' - Number. This is the same as 'g', except that it uses the
               current locale setting to insert the appropriate
               number separator characters.
         '%' - Percentage. Multiplies the number by 100 and displays
               in fixed ('f') format, followed by a percent sign.

     Objects are able to define their own conversion specifiers to
     replace the standard ones.  An example is the 'datetime' class,
     whose conversion specifiers might look something like the
     arguments to the strftime() function:

         "Today is: {0:a b d H:M:S Y}".format(datetime.now())

Finally, I have two questions:

1) Where would be a good place to stick the rough prototype? I don't 
want to post it here, its rather long.

2) I'd like to know if anyone out there wants to take over the task of 
implementing 3102 so that I can focus my attention on 3101. I have 
fairly limited bandwidth at the moment, and 3101 is by far the more 
complex proposal.

-- Talin