[Python-checkins] r45928 - peps/trunk/pep-3101.txt

Sun May 7 03:49:44 CEST 2006

Author: talin
Date: Sun May  7 03:49:43 2006
New Revision: 45928

Modified:
   peps/trunk/pep-3101.txt
Log:
Updated based on collected feedback.



Modified: peps/trunk/pep-3101.txt
==============================================================================

--- peps/trunk/pep-3101.txt	(original)
+++ peps/trunk/pep-3101.txt	Sun May  7 03:49:43 2006
@@ -8,7 +8,7 @@
 Content-Type: text/plain
 Created: 16-Apr-2006
 Python-Version: 3.0
-Post-History:
+Post-History: 28-Apr-2006
 
 
 Abstract
@@ -50,8 +50,8 @@
 
     The specification will consist of 4 parts:
 
-    - Specification of a set of methods to be added to the built-in
-      string class.
+    - Specification of a new formatting method to be added to the
+      built-in string class.
 
     - Specification of a new syntax for format strings.
 
@@ -99,13 +99,13 @@
         "My name is Fred :-{}"
 
     The element within the braces is called a 'field'.  Fields consist
-    of a name, which can either be simple or compound, and an optional
-    'conversion specifier'.
+    of a 'field name', which can either be simple or compound, and an
+    optional 'conversion specifier'.
 
-    Simple names are either names or numbers.  If numbers, they must
-    be valid decimal numbers; if names, they must be valid Python
-    identifiers.  A number is used to identify a positional argument,
-    while a name is used to identify a keyword argument.
+    Simple field names are either names or numbers. If numbers, they
+    must be valid base-10 integers; if names, they must be valid
+    Python identifiers.  A number is used to identify a positional
+    argument, while a name is used to identify a keyword argument.
 
     Compound names are a sequence of simple names seperated by
     periods:
@@ -118,8 +118,9 @@
     positional argument 0.
 
     Each field can also specify an optional set of 'conversion
-    specifiers'.  Conversion specifiers follow the field name, with a
-    colon (':') character separating the two:
+    specifiers' which can be used to adjust the format of that field.
+    Conversion specifiers follow the field name, with a colon (':')
+    character separating the two:
 
         "My name is {0:8}".format('Fred')
 
@@ -130,11 +131,15 @@
 
     The conversion specifier consists of a sequence of zero or more
     characters, each of which can consist of any printable character
-    except for a non-escaped '}'.  The format() method does not
-    attempt to intepret the conversion specifiers in any way; it
-    merely passes all of the characters between the first colon ':'
-    and the matching right brace ('}') to the various underlying
-    formatters (described later.)
+    except for a non-escaped '}'.
+    
+    Conversion specifiers can themselves contain replacement fields;
+    this will be described in a later section.  Except for this
+    replacement, the format() method does not attempt to intepret the
+    conversion specifiers in any way; it merely passes all of the
+    characters between the first colon ':' and the matching right
+    brace ('}') to the various underlying formatters (described
+    later.)
 
 
 Standard Conversion Specifiers
@@ -142,16 +147,19 @@
     For most built-in types, the conversion specifiers will be the
     same or similar to the existing conversion specifiers used with
     the '%' operator.  Thus, instead of '%02.2x", you will say
-    '{0:2.2x}'.
+    '{0:02.2x}'.
 
     There are a few differences however:
 
     - The trailing letter is optional - you don't need to say '2.2d',
-      you can instead just say '2.2'.  If the letter is omitted, the
-      value will be converted into its 'natural' form (that is, the
-      form that it take if str() or unicode() were called on it)
-      subject to the field length and precision specifiers (if
-      supplied).
+      you can instead just say '2.2'.  If the letter is omitted, a
+      default will be assumed based on the type of the argument.
+      The defaults will be as follows:
+      
+        string or unicode object: 's'
+        integer: 'd'
+        floating-point number: 'f'
+        all other types: 's'
 
     - Variable field width specifiers use a nested version of the {}
       syntax, allowing the width specifier to be either a positional
@@ -159,10 +167,6 @@
 
         "{0:{1}.{2}d}".format(a, b, c)
 
-      (Note: It might be easier to parse if these used a different
-      type of delimiter, such as parens - avoiding the need to create
-      a regex that handles the recursive case.)
-
     - The support for length modifiers (which are ignored by Python
       anyway) is dropped.
 
@@ -171,7 +175,7 @@
     conversion specifiers are identical to the arguments to the
     strftime() function:
 
-        "Today is: {0:%x}".format(datetime.now())
+        "Today is: {0:%a %b %d %H:%M:%S %Y}".format(datetime.now())
 
 
 Controlling Formatting
@@ -203,19 +207,37 @@
 
 User-Defined Formatting Classes
 
-    The code that interprets format strings can be called explicitly
-    from user code.  This allows the creation of custom formatter
-    classes that can override the normal formatting rules.
-
-    The string and unicode classes will have a class method called
-    'cformat' that does all the actual work of formatting; The
-    format() method is just a wrapper that calls cformat.
+    There will be times when customizing the formatting of fields
+    on a per-type basis is not enough.  An example might be an
+    accounting application, which displays negative numbers in
+    parentheses rather than using a negative sign.
+    
+    The string formatting system facilitates this kind of application-
+    specific formatting by allowing user code to directly invoke
+    the code that interprets format strings and fields.  User-written
+    code can intercept the normal formatting operations on a per-field
+    basis, substituting their own formatting methods.
+    
+    For example, in the aforementioned accounting application, there
+    could be an application-specific number formatter, which reuses
+    the string.format templating code to do most of the work. The
+    API for such an application-specific formatter is up to the
+    application; here are several possible examples:
+    
+        cell_format( "The total is: {0}", total )
+        
+        TemplateString( "The total is: {0}" ).format( total )
+        
+    Creating an application-specific formatter is relatively straight-
+    forward.  The string and unicode classes will have a class method
+    called 'cformat' that does all the actual work of formatting; The
+    built-in format() method is just a wrapper that calls cformat.
 
     The parameters to the cformat function are:
 
         -- The format string (or unicode; the same function handles
            both.)
-        -- A field format hook (see below)
+        -- A callable 'format hook', which is called once per field
         -- A tuple containing the positional arguments
         -- A dict containing the keyword arguments
 
@@ -223,15 +245,16 @@
     string, and return a new string (or unicode) with all of the
     fields replaced with their formatted values.
 
-    For each field, the cformat function will attempt to call the
-    field format hook with the following arguments:
+    The format hook is a callable object supplied by the user, which
+    is invoked once per field, and which can override the normal
+    formatting for that field.  For each field, the cformat function
+    will attempt to call the field format hook with the following
+    arguments:
 
-       field_hook(value, conversion, buffer)
+       format_hook(value, conversion, buffer)
 
     The 'value' field corresponds to the value being formatted, which
-    was retrieved from the arguments using the field name.  (The
-    field_hook has no control over the selection of values, only
-    how they are formatted.)
+    was retrieved from the arguments using the field name.
 
     The 'conversion' argument is the conversion spec part of the
     field, which will be either a string or unicode object, depending
@@ -299,11 +322,34 @@
 
     - Other variations include Ruby's #{}, PHP's {$name}, and so
       on.
-
+      
+    Some specific aspects of the syntax warrant additional comments:
+    
+    1) The use of the backslash character for escapes.  A few people
+    suggested doubling the brace characters to indicate a literal
+    brace rather than using backslash as an escape character.  This is
+    also the convention used in the .Net libraries.  Here's how the
+    previously-given example would look with this convention:
+    
+        "My name is {0} :-{{}}".format('Fred')
+    
+    One problem with this syntax is that it conflicts with the use of
+    nested braces to allow parameterization of the conversion
+    specifiers:
+    
+        "{0:{1}.{2}}".format(a, b, c)
+        
+    (There are alternative solutions, but they are too long to go
+    into here.)
+    
+    2) The use of the colon character (':') as a separator for
+    conversion specifiers.  This was chosen simply because that's
+    what .Net uses.
+    
 
 Sample Implementation
 
-    A rought prototype of the underlying 'cformat' function has been
+    A rough prototype of the underlying 'cformat' function has been
     coded in Python, however it needs much refinement before being
     submitted.