[Python-3000] String formatting: Conversion specifiers
Talin
talin at acm.org
Tue Jun 6 09:53:38 CEST 2006
I've been slowly working on PEP 3101, specifically fleshing out the
details, and there's a couple of issues that I wanted to run by the
group mind here.
Originally, I decided to punt on the issue of field conversion
specifiers (i.e. %2.2s etc.) and simply say that they were unchanged
from the existing implementation.
However, I've been looking over the source for PyString_Format, and I'm
thinking that what the code for handling field conversions is a lot more
complicated than what we really need here.
Here is a list of the conversion types that are currently supported by
the % operator. First thing you notice is an eerie similarity between
this and the documentation for 'sprintf'. :)
Conversion Meaning Notes
d Signed integer decimal.
i Signed integer decimal.
o Unsigned octal. (1)
u Unsigned decimal.
x Unsigned hexadecimal (lowercase). (2)
X Unsigned hexadecimal (uppercase). (2)
e Floating point exponential format (lowercase).
E Floating point exponential format (uppercase).
f Floating point decimal format.
F Floating point decimal format.
g Same as "e" if exponent is greater than -4 or less than precision,
"f" otherwise.
G Same as "E" if exponent is greater than -4 or less than precision,
"F" otherwise.
c Single character (accepts integer or single character string).
r String (converts any python object using repr()). (3)
s String (converts any python object using str()). (4)
% No argument is converted, results in a "%" character in the result.
Now, unlike C, in Python we already know the type of the thing we're
going to print. So there's no need to tell the system 'this is a float'
or 'this is an integer'. The only way I could see this being useful is
if you had a type and wanted it to print out as some different type -
but is that really the proper role of the string formatter?
Similarly, what does it mean to have an 'unsigned' quantity in Python?
If you say "print this negative number as unsigned", what does that
mean? Does it take the absolute value, or does it do what C does and
takes the number modulo 2^32? Neither seems particularly correct or
intuitive to me.
So I decided to sit down and rethink the whole conversion specifier
system. I looked at the docs for the '%' operator, and some other
languages, and here is what I came up with (this is an excerpt from the
revised PEP.)
Oh, and I should mention that I have a working implementation of what is
described below.
--------------------------
Standard Conversion Specifiers
Most built-in types will support a standard set of conversion
specifiers. These are similar in concept to the conversion
specifiers used by the existing '%' operator, however there are
also a number of significant differences.
The general form of the standard conversion specifier is:
[flags][length][.precision][type]
The brackets ([]) indicate an optional field.
The flags can be one of the following:
'+' - indicates that a sign should be used for both
positive as well as negative numbers (normally only
negative numbers will have a sign.)
'<' - Forces the field to be left-aligned within the available
space (This is the default.)
'>' - Forces the field to be right-aligned within the
available space.
'0' - Causes any leftover space in the field to be filled
with leading zeros. Note that this option also implies
that the field is right-aligned.
' ' - Causes the leftover space in the field to be filled
with spaces.
'length' is the minimum field width. If not specified, then the
field width will be determined by the content.
For a numeric value, 'precision' is the number of digits after
the decimal point that should be displayed.
Finally, the 'type' determines how the data should be presented.
It is generally only used for numeric types - string types do
not need to indicate a type.
The available types are:
'b' - Binary. Outputs the number in base 2.
'c' - Character. Converts the integer to the corresponding
unicode character before printing.
'd' - Decimal Integer. Prints only the whole-number portion
of the number.
'e' - Exponent notation. Prints the number in scientific
notation using the letter 'e' to indicate the exponent.
'E' - Exponent notation. Same as 'e' except it uses an upper
case 'E' as the separator character.
'f' - Fixed point. Displays the number as a fixed-point
number.
'F' - Fixed point. Same as 'f'.
'g' - General format. This prints the number as a fixed-point
number, unless the number is too large, in which case
it switches to exponent notation.
'G' - General format. Same as 'g' except switches to 'E'
if the number gets to large.
'n' - Number. This is the same as 'g', except that it uses the
current locale setting to insert the appropriate
number separator characters.
'o' - Octal format. Outputs the number in base 8.
'r' - Repr format. Outputs the value in a format which is
likely to be readable by the interpreter. Also works
with non-numeric fields.
'x' - Hex format. Outputs the number in base 16, using lower-
case letters for the upper digits.
'X' - Hex format. Outputs the number in base 16, using upper-
case letters for the upper digits.
'%' - Percentage. Multiplies the number by 100 and displays
in fixed ('f') format, followed by a percent sign.
For non-built-in types, the conversion specifiers will be specific
to that type. An example is the 'datetime' class, whose
conversion specifiers might look something like the arguments
to the strftime() function:
"Today is: {0:a b d H:M:S Y}".format(datetime.now())
-- Talin
More information about the Python-3000
mailing list