I've updated PEP 3101 based on the feedback collected so far. --------------------- PEP: 3101 Title: Advanced String Formatting Version: $Revision: 45928 $ Last-Modified: $Date: 2006-05-06 18:49:43 -0700 (Sat, 06 May 2006) $ Author: Talin <talin at acm.org> Status: Draft Type: Standards Content-Type: text/plain Created: 16-Apr-2006 Python-Version: 3.0 Post-History: 28-Apr-2006, 6-May-2006 Abstract This PEP proposes a new system for built-in string formatting operations, intended as a replacement for the existing '%' string formatting operator. Rationale Python currently provides two methods of string interpolation: - The '%' operator for strings. [1] - The string.Template module. [2] The scope of this PEP will be restricted to proposals for built-in string formatting operations (in other words, methods of the built-in string type). The '%' operator is primarily limited by the fact that it is a binary operator, and therefore can take at most two arguments. One of those arguments is already dedicated to the format string, leaving all other variables to be squeezed into the remaining argument. The current practice is to use either a dictionary or a tuple as the second argument, but as many people have commented [3], this lacks flexibility. The "all or nothing" approach (meaning that one must choose between only positional arguments, or only named arguments) is felt to be overly constraining. While there is some overlap between this proposal and string.Template, it is felt that each serves a distinct need, and that one does not obviate the other. In any case, string.Template will not be discussed here. Specification The specification will consist of 4 parts: - Specification of a new formatting method to be added to the built-in string class. - Specification of a new syntax for format strings. - Specification of a new set of class methods to control the formatting and conversion of objects. - Specification of an API for user-defined formatting classes. String Methods The build-in string class will gain a new method, 'format', which takes takes an arbitrary number of positional and keyword arguments: "The story of {0}, {1}, and {c}".format(a, b, c=d) Within a format string, each positional argument is identified with a number, starting from zero, so in the above example, 'a' is argument 0 and 'b' is argument 1. Each keyword argument is identified by its keyword name, so in the above example, 'c' is used to refer to the third argument. The result of the format call is an object of the same type (string or unicode) as the format string. Format Strings Brace characters ('curly braces') are used to indicate a replacement field within the string: "My name is {0}".format('Fred') The result of this is the string: "My name is Fred" Braces can be escaped using a backslash: "My name is {0} :-\{\}".format('Fred') Which would produce: "My name is Fred :-{}" The element within the braces is called a 'field'. Fields consist of a 'field name', which can either be simple or compound, and an optional 'conversion specifier'. Simple field names are either names or numbers. If numbers, they must be valid base-10 integers; if names, they must be valid Python identifiers. A number is used to identify a positional argument, while a name is used to identify a keyword argument. Compound names are a sequence of simple names seperated by periods: "My name is {0.name} :-\{\}".format(dict(name='Fred')) Compound names can be used to access specific dictionary entries, array elements, or object attributes. In the above example, the '{0.name}' field refers to the dictionary entry 'name' within positional argument 0. Each field can also specify an optional set of 'conversion specifiers' which can be used to adjust the format of that field. Conversion specifiers follow the field name, with a colon (':') character separating the two: "My name is {0:8}".format('Fred') The meaning and syntax of the conversion specifiers depends on the type of object that is being formatted, however many of the built-in types will recognize a standard set of conversion specifiers. The conversion specifier consists of a sequence of zero or more characters, each of which can consist of any printable character except for a non-escaped '}'. Conversion specifiers can themselves contain replacement fields; this will be described in a later section. Except for this replacement, the format() method does not attempt to intepret the conversion specifiers in any way; it merely passes all of the characters between the first colon ':' and the matching right brace ('}') to the various underlying formatters (described later.) Standard Conversion Specifiers For most built-in types, the conversion specifiers will be the same or similar to the existing conversion specifiers used with the '%' operator. Thus, instead of '%02.2x", you will say '{0:02.2x}'. There are a few differences however: - The trailing letter is optional - you don't need to say '2.2d', you can instead just say '2.2'. If the letter is omitted, a default will be assumed based on the type of the argument. The defaults will be as follows: string or unicode object: 's' integer: 'd' floating-point number: 'f' all other types: 's' - Variable field width specifiers use a nested version of the {} syntax, allowing the width specifier to be either a positional or keyword argument: "{0:{1}.{2}d}".format(a, b, c) - The support for length modifiers (which are ignored by Python anyway) is dropped. For non-built-in types, the conversion specifiers will be specific to that type. An example is the 'datetime' class, whose conversion specifiers are identical to the arguments to the strftime() function: "Today is: {0:%a %b %d %H:%M:%S %Y}".format(datetime.now()) Controlling Formatting A class that wishes to implement a custom interpretation of its conversion specifiers can implement a __format__ method: class AST: def __format__(self, specifiers): ... The 'specifiers' argument will be either a string object or a unicode object, depending on the type of the original format string. The __format__ method should test the type of the specifiers parameter to determine whether to return a string or unicode object. It is the responsibility of the __format__ method to return an object of the proper type. string.format() will format each field using the following steps: 1) See if the value to be formatted has a __format__ method. If it does, then call it. 2) Otherwise, check the internal formatter within string.format that contains knowledge of certain builtin types. 3) Otherwise, call str() or unicode() as appropriate. User-Defined Formatting Classes There will be times when customizing the formatting of fields on a per-type basis is not enough. An example might be an accounting application, which displays negative numbers in parentheses rather than using a negative sign. The string formatting system facilitates this kind of application- specific formatting by allowing user code to directly invoke the code that interprets format strings and fields. User-written code can intercept the normal formatting operations on a per-field basis, substituting their own formatting methods. For example, in the aforementioned accounting application, there could be an application-specific number formatter, which reuses the string.format templating code to do most of the work. The API for such an application-specific formatter is up to the application; here are several possible examples: cell_format( "The total is: {0}", total ) TemplateString( "The total is: {0}" ).format( total ) Creating an application-specific formatter is relatively straight- forward. The string and unicode classes will have a class method called 'cformat' that does all the actual work of formatting; The built-in format() method is just a wrapper that calls cformat. The parameters to the cformat function are: -- The format string (or unicode; the same function handles both.) -- A callable 'format hook', which is called once per field -- A tuple containing the positional arguments -- A dict containing the keyword arguments The cformat function will parse all of the fields in the format string, and return a new string (or unicode) with all of the fields replaced with their formatted values. The format hook is a callable object supplied by the user, which is invoked once per field, and which can override the normal formatting for that field. For each field, the cformat function will attempt to call the field format hook with the following arguments: format_hook(value, conversion, buffer) The 'value' field corresponds to the value being formatted, which was retrieved from the arguments using the field name. The 'conversion' argument is the conversion spec part of the field, which will be either a string or unicode object, depending on the type of the original format string. The 'buffer' argument is a Python array object, either a byte array or unicode character array. The buffer object will contain the partially constructed string; the field hook is free to modify the contents of this buffer if needed. The field_hook will be called once per field. The field_hook may take one of two actions: 1) Return False, indicating that the field_hook will not process this field and the default formatting should be used. This decision should be based on the type of the value object, and the contents of the conversion string. 2) Append the formatted field to the buffer, and return True. Alternate Syntax Naturally, one of the most contentious issues is the syntax of the format strings, and in particular the markup conventions used to indicate fields. Rather than attempting to exhaustively list all of the various proposals, I will cover the ones that are most widely used already. - Shell variable syntax: $name and $(name) (or in some variants, ${name}). This is probably the oldest convention out there, and is used by Perl and many others. When used without the braces, the length of the variable is determined by lexically scanning until an invalid character is found. This scheme is generally used in cases where interpolation is implicit - that is, in environments where any string can contain interpolation variables, and no special subsitution function need be invoked. In such cases, it is important to prevent the interpolation behavior from occuring accidentally, so the '$' (which is otherwise a relatively uncommonly-used character) is used to signal when the behavior should occur. It is the author's opinion, however, that in cases where the formatting is explicitly invoked, that less care needs to be taken to prevent accidental interpolation, in which case a lighter and less unwieldy syntax can be used. - Printf and its cousins ('%'), including variations that add a field index, so that fields can be interpolated out of order. - Other bracket-only variations. Various MUDs (Multi-User Dungeons) such as MUSH have used brackets (e.g. [name]) to do string interpolation. The Microsoft .Net libraries uses braces ({}), and a syntax which is very similar to the one in this proposal, although the syntax for conversion specifiers is quite different. [4] - Backquoting. This method has the benefit of minimal syntactical clutter, however it lacks many of the benefits of a function call syntax (such as complex expression arguments, custom formatters, etc.). - Other variations include Ruby's #{}, PHP's {$name}, and so on. Some specific aspects of the syntax warrant additional comments: 1) The use of the backslash character for escapes. A few people suggested doubling the brace characters to indicate a literal brace rather than using backslash as an escape character. This is also the convention used in the .Net libraries. Here's how the previously-given example would look with this convention: "My name is {0} :-{{}}".format('Fred') One problem with this syntax is that it conflicts with the use of nested braces to allow parameterization of the conversion specifiers: "{0:{1}.{2}}".format(a, b, c) (There are alternative solutions, but they are too long to go into here.) 2) The use of the colon character (':') as a separator for conversion specifiers. This was chosen simply because that's what .Net uses. Sample Implementation A rough prototype of the underlying 'cformat' function has been coded in Python, however it needs much refinement before being submitted. Backwards Compatibility Backwards compatibility can be maintained by leaving the existing mechanisms in place. The new system does not collide with any of the method names of the existing string formatting techniques, so both systems can co-exist until it comes time to deprecate the older system. References [1] Python Library Reference - String formating operations http://docs.python.org/lib/typesseq-strings.html [2] Python Library References - Template strings http://docs.python.org/lib/node109.html [3] [Python-3000] String formating operations in python 3k http://mail.python.org/pipermail/python-3000/2006-April/000285.html [4] Composite Formatting - [.Net Framework Developer's Guide] http://msdn.microsoft.com/library/en-us/cpguide/html/cpconcompositeformattin... Copyright This document has been placed in the public domain. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End:
Talin wrote:
Braces can be escaped using a backslash:
"My name is {0} :-\{\}".format('Fred')
Which would produce:
"My name is Fred :-{}"
Do backslashes also need to be backslashed then? If not, then what is the translation of this:? r'abc\{%s\}' % 'x' I guess the only sensible translation if backslashes aren't backslashed would be: r'abc\\{{0}\\}'.format('x') But the parsing of that format string seems fairly unintuitive to me. If backslashes do need to be backslashed, then that fact needs to be mentioned. -Edward
"Edward Loper" <edloper@gradient.cis.upenn.edu> wrote in message news:445DEC49.6050709@gradient.cis.upenn.edu...
Talin wrote:
Braces can be escaped using a backslash:
"My name is {0} :-\{\}".format('Fred')
Which would produce:
"My name is Fred :-{}"
Do backslashes also need to be backslashed then? If not, then what is the translation of this:?
r'abc\{%s\}' % 'x'
I guess the only sensible translation if backslashes aren't backslashed would be:
.format('x')
But the parsing of that format string seems fairly unintuitive to me. If backslashes do need to be backslashed, then that fact needs to be mentioned.
AFAICT there would be no way to use raw strings with that method. That method would be using the escape sequence "\{" to indicate a bracket. Raw strings cannot have escape sequences. Additional backslashes are added to raw strings to remove anything that resembles an escape sequence. So either an additional layer of escaping is required (like with regex'es) or ".format()" would simply not be usable with raw strings. The problem of having an additional level of escaping is very shown with regexes. A simgle slash as the end value is either "\\\\" or r"\\".
Joe Smith wrote:
AFAICT there would be no way to use raw strings with that method. ... Additional backslashes are added to raw strings to remove anything that resembles an escape sequence.
You seem to be very confused about the way strings work. If you look at the repr() of a string containing a backslash, you will see two backslashes, but they're only in the repr(), *not* in the string itself. Some things to ponder:
"\{" == r"\{" True len("\{") 2 len(r"\{") 2
-- Greg
On 5/7/06, Edward Loper <edloper@gradient.cis.upenn.edu> wrote:
Talin wrote:
Braces can be escaped using a backslash:
"My name is {0} :-\{\}".format('Fred')
Which would produce:
"My name is Fred :-{}"
Do backslashes also need to be backslashed then? If not, then what is the translation of this:?
r'abc\{%s\}' % 'x'
I believe the proposal is taking advantage of the fact that '\{' is not interpreted as an escape sequence -- it is interpreted as a literal backslash followed by an open brace:
'\{' '\\{' '\\{' '\\{' r'\{' '\\{'
Thus, for 'abc\{0\}'.format('x'), you should get an error because there are no replacement fields in the format string. STeVe -- Grammar am for people who can't think for myself. --- Bucky Katt, Get Fuzzy
Steven Bethard wrote:
On 5/7/06, Edward Loper <edloper@gradient.cis.upenn.edu> wrote:
Talin wrote:
Braces can be escaped using a backslash:
"My name is {0} :-\{\}".format('Fred')
Which would produce:
"My name is Fred :-{}"
Do backslashes also need to be backslashed then? If not, then what is the translation of this:?
r'abc\{%s\}' % 'x'
I believe the proposal is taking advantage of the fact that '\{' is not interpreted as an escape sequence -- it is interpreted as a literal backslash followed by an open brace:
Yes, I knew that fact, but it's not related to my question. The basic issue is, that if you have a quote character ('\\'), then you usually need to be able to quote that quote character ('\\\\') in at least some contexts. So I'll try again with a different example. I'll avoid using raw strings, just because doing so seems to have confused a couple people about what my question is about. Let's say I have a program now that contains the following line: print 'Foo\\%s' % x And I'd like to translate that to use str.format. How do I do it? If I just replace %s with {0} then I get: print 'Foo\\{0}'.format(x) but this will presumably raise an exception, since the '\\{' (which is identical to '\{') gets treated as a quoted brace, and the '}' is unmatched. If it were possible to backslash backslashes, then I could do: print 'Foo\\\\{1}' % x (which can also be spelled r'Foo\\{1}' -- i.e., the string now contains two backslash characters). But I haven't seen any mention of backslashing backslashes so far. -Edward
On 5/6/06, Talin <talin@acm.org> wrote:
I've updated PEP 3101 based on the feedback collected so far. [snip] Compound names are a sequence of simple names seperated by periods:
"My name is {0.name} :-\{\}".format(dict(name='Fred'))
Compound names can be used to access specific dictionary entries, array elements, or object attributes. In the above example, the '{0.name}' field refers to the dictionary entry 'name' within positional argument 0.
I'm still not a big fan of mixing together getitem-style access and getattribute-style access. That makes classes that support both ambiguous in this context. You either need to specify the order in which these are checked (e.g. attribute then item or item then attribute), or, preferably, you need to extend the syntax to allow getitem-style access too. Just to be clear, I'm not suggesting that you support anything more then items and attributes. So this is *not* a request to allow arbitrary expressions. In fact, the only use-case I see in the PEP needs only item access, not attribute access, so maybe you could drop attribute access? Can't you just extend the syntax for *only* item access? E.g. something like: "My name is {0[name]} :-\{\}".format(dict(name='Fred')) STeVe -- Grammar am for people who can't think for myself. --- Bucky Katt, Get Fuzzy
Steven Bethard <steven.bethard <at> gmail.com> writes:
I'm still not a big fan of mixing together getitem-style access and getattribute-style access. That makes classes that support both ambiguous in this context. You either need to specify the order in which these are checked (e.g. attribute then item or item then attribute), or, preferably, you need to extend the syntax to allow getitem-style access too.
Just to be clear, I'm not suggesting that you support anything more then items and attributes. So this is *not* a request to allow arbitrary expressions. In fact, the only use-case I see in the PEP needs only item access, not attribute access, so maybe you could drop attribute access?
Can't you just extend the syntax for *only* item access? E.g. something like:
"My name is {0[name]} :-\{\}".format(dict(name='Fred'))
I'm not opposed to the idea of adding item access, although I believe that attribute access is also useful. In either case, its not terribly hard to implement. I'd like to hear what other people have to say on this issue. -- Talin
On 5/6/06, Talin <talin@acm.org> wrote:
I've updated PEP 3101 based on the feedback collected so far. [http://www.python.org/dev/peps/pep-3101/]
I think this is a step in the right direction. I wonder if we shouldn't borrow more from .NET. I read this URL that you referenced: http://msdn.microsoft.com/library/en-us/cpguide/html/cpconcompositeformattin... They have special syntax to support field width, e.g. {0,10} formats item 0 in a field of (at least) 10 positions wide, right-justified; {0,-10} does the same left-aligned. This is done independently from the type-specific formatting. (I'm not proposing that we use .NET's format specifiers after the colon, but I'm also no big fan for keeping the C specific stuff we have now; we should put some work in designing something with the same power as the current %-based system for floats and ints, that would cover it.) .NET's solution for quoting { and } as {{ and }} respectively also sidesteps the issue of how to quote \ itself -- since '\\{' is a 2-char string containing one \ and one {, you'd have to write either '\\\\{0}' or r'\\{0}' to produce a single literal \ followed by formatted item 0. Any time there's the need to quadruple a backslash I think we've lost the battle. (Or you might search the web for Tcl quoting hell. :-) I'm fine with not having a solution for doing variable substitution within the format parameters. That could be done instead by building up the format string with an extra formatting step: instead of "{x:{y}}".format(x=whatever, y=3) you could write "{{x,{y}}}".format(y=3).format(x=whatever). (Note that this is subtle: the final }}} are parsed as } followed by }}. Once the parser has seen a single {, the first } it sees is the matching closing } and adding another } after it won't affect it. The specifier cannot contain { or } at all. I like having a way to reuse the format parsing code while substituting something else for the formatting itself. The PEP appears silent on what happens if there are too few or too many positional arguments, or if there are missing or unused keywords. Missing ones should be errors; I'm not sure about redundant (unused) ones. On the one hand complaining about those gives us more certainty that the format string is correct. On the other hand there are some use cases for passing lots of keyword parameters (e.g. simple web templating could pass a fixed set of variables using **dict). Even in i18n (translation) apps I could see the usefulness of allowing unused parameters On the issue of {a.b.c}: like several correspondents, I don't like the ambiguity of attribute vs. key refs much, even though it appears useful enough in practice in web frameworks I've used. It seems to violate the Zen of Python: "In the face of ambiguity, refuse the temptation to guess." Unfortunately I'm pretty lukewarm about the proposal to support {a[b].c} since b is not a variable reference but a literal string 'b'. It is also relatively cumbersome to parse. I wish I could propose {a+b.c} for this case but that's so arbitrary... Even more unfortunately, I expect that dict key access is a pretty important use case so we'll have to address it somehow. I *don't* think there's an important use case for the ambiguity -- in any particular situation I expect that the programmer will know whether they are expecting a dict or an object with attributes. Hm, perhaps {a@b.c} might work? It's not an existing binary operator. Or perhaps # or !. It's too late to think straight so this will have to be continued... -- --Guido van Rossum (home page: http://www.python.org/~guido/)
Guido van Rossum wrote:
On 5/6/06, Talin <talin@acm.org> wrote:
I've updated PEP 3101 based on the feedback collected so far.
[http://www.python.org/dev/peps/pep-3101/]
I think this is a step in the right direction.
Cool, and thanks for the very detailed feedback.
I wonder if we shouldn't borrow more from .NET. I read this URL that you referenced:
http://msdn.microsoft.com/library/en-us/cpguide/html/cpconcompositeformattin...
They have special syntax to support field width, e.g. {0,10} formats item 0 in a field of (at least) 10 positions wide, right-justified; {0,-10} does the same left-aligned. This is done independently from
We already have that now, don't we? If you look at the docs for "String Formatting Operations" in the library reference, it shows that a negative sign on a field width indicates left justification.
the type-specific formatting. (I'm not proposing that we use .NET's format specifiers after the colon, but I'm also no big fan for keeping the C specific stuff we have now; we should put some work in designing something with the same power as the current %-based system for floats and ints, that would cover it.)
Agreed. As you say, the main work is in handling floats and ints, and everything else can either be formatted as plain str(), or use a custom format specifier syntax (as in my strftime example.)
.NET's solution for quoting { and } as {{ and }} respectively also sidesteps the issue of how to quote \ itself -- since '\\{' is a 2-char string containing one \ and one {, you'd have to write either '\\\\{0}' or r'\\{0}' to produce a single literal \ followed by formatted item 0. Any time there's the need to quadruple a backslash I think we've lost the battle. (Or you might search the web for Tcl quoting hell. :-)
I'm fine with not having a solution for doing variable substitution within the format parameters. That could be done instead by building up the format string with an extra formatting step: instead of "{x:{y}}".format(x=whatever, y=3) you could write "{{x,{y}}}".format(y=3).format(x=whatever). (Note that this is subtle: the final }}} are parsed as } followed by }}. Once the parser has seen a single {, the first } it sees is the matching closing } and adding another } after it won't affect it. The specifier cannot contain { or } at all.
There is another solution to this which is equally subtle, although fairly straightforward to parse. It involves defining the rules for escapes as follows: '{{' is an escaped '{' '}}' is an escaped '}', unless we are within a field. So you can write things like {0:10,{1}}, and the final '}}' will be parsed as two separate closing brackets, since we're within a field definition. From a parsing standpoint, this is unambiguous, however I've held off on suggesting it because it might appear to be ambiguous to a casual reader.
I like having a way to reuse the format parsing code while substituting something else for the formatting itself.
The PEP appears silent on what happens if there are too few or too many positional arguments, or if there are missing or unused keywords. Missing ones should be errors; I'm not sure about redundant (unused) ones. On the one hand complaining about those gives us more certainty that the format string is correct. On the other hand there are some use cases for passing lots of keyword parameters (e.g. simple web templating could pass a fixed set of variables using **dict). Even in i18n (translation) apps I could see the usefulness of allowing unused parameters
I am undecided on this issue as well, which is the reason that it's not mentioned in the PEP (yet).
On the issue of {a.b.c}: like several correspondents, I don't like the ambiguity of attribute vs. key refs much, even though it appears useful enough in practice in web frameworks I've used. It seems to violate the Zen of Python: "In the face of ambiguity, refuse the temptation to guess."
Unfortunately I'm pretty lukewarm about the proposal to support {a[b].c} since b is not a variable reference but a literal string 'b'. It is also relatively cumbersome to parse. I wish I could propose {a+b.c} for this case but that's so arbitrary...
Actually, it's not all that hard to parse, especially given that there is no need to deal with the 'nested' case. I will be supplying a Python implementation of the parser along with the PEP. What I would prefer not to supply (although I certainly can if you feel it's necessary) is an optimized C implementation of the same parser, as well as the implementations of the various type-specific formatters.
Even more unfortunately, I expect that dict key access is a pretty important use case so we'll have to address it somehow. I *don't* think there's an important use case for the ambiguity -- in any particular situation I expect that the programmer will know whether they are expecting a dict or an object with attributes.
Hm, perhaps {a@b.c} might work? It's not an existing binary operator. Or perhaps # or !.
[] is the most intuitive syntax by far IMHO. Let's run it up the flagpole and see if anybody salutes :)
It's too late to think straight so this will have to be continued...
One additional issue that I would like some feedback on: The way I have set up the API for writing custom formatters (not talking about the __format__ method here) allows the custom formatter object to examine the entire output string, not merely the part that it is responsible for; And moreover, the custom formatter is free to modify the entire string. So for example, a custom formatter could tabify or un-tabify all previous text within the string. The API could be made slightly simpler by eliminating this feature. The reason that I added it was specifically so that custom formatters could perform column-specific operations, like the old BASIC function that would print spaces up to a given column. Having generated my share of reports back in the old days (COBOL programming in the USAF), I thought it might be useful to have the ability to do operations based on the absolute column number. Currently the API specifies that a custom formatter is passed an array object, and the custom formatter should append its data to the end of the array, but it is also free to examine and modify the rest of the array. If I were to remove this feature, then instead of using an array, we'd simply have the custom formatter return a string like __format__ does. So the question is - is the use case useful enough to keep this feature? What do people think of the use of the Python array type in this case? -- Talin
On 5/19/06, Talin <talin@acm.org> wrote:
Guido van Rossum wrote:
[http://www.python.org/dev/peps/pep-3101/] http://msdn.microsoft.com/library/en-us/cpguide/html/cpconcompositeformattin...
[on width spec a la .NET]
We already have that now, don't we? If you look at the docs for "String Formatting Operations" in the library reference, it shows that a negative sign on a field width indicates left justification.
Yes, but I was proposing to adopt the .NET syntax for the feature. Making it a responsibility of the generic formatting code rather than of each individual type-specific formatter makes it more likely that it will be implemented correctly everywhere. [on escaping]
There is another solution to this which is equally subtle, although fairly straightforward to parse. It involves defining the rules for escapes as follows:
'{{' is an escaped '{' '}}' is an escaped '}', unless we are within a field.
So you can write things like {0:10,{1}}, and the final '}}' will be parsed as two separate closing brackets, since we're within a field definition.
From a parsing standpoint, this is unambiguous, however I've held off on suggesting it because it might appear to be ambiguous to a casual reader.
Sure. But I still think there isn't enough demand for variable expansion *within* a field to bother. When's the lats time you used a * in a % format string? And how essential was it? [on error handling for unused variables]
I am undecided on this issue as well, which is the reason that it's not mentioned in the PEP (yet).
There's another use case which suggests that perhaps all errors should pass silent (or at least produce a string result instead of an exception). It's a fairly common bug to have a broken format in an except clause for a rare exception. This can cause major issues in large programs because the one time you need the debug info badly you don't get anything at all. (The logging module now has a special work-around for this reason.) That's too bad, but it is a fact of life. If broken formats *never* caused exceptions (but instead some kind of butchered conversion drawing attention to the problem) then at least one source of frustration would be gone. If people like this I suggest putting some kind of error message in the place of any format conversion for which an error occurred. (If we left the {format} itself in, then this would become a feature that people would rely on in undesirable places.) I wouldn't want to go so far as to catch exceptions from type-specific formatters; but those formatters themselves should silently ignore bad specifiers, or at least return an error string, rather than raise exceptions. Still, this may be more painful for beginners learning to write format strings? Until we have decided, the PEP should list the two alternatives in a fair amount of detail as "undecided".
I will be supplying a Python implementation of the parser along with the PEP. What I would prefer not to supply (although I certainly can if you feel it's necessary) is an optimized C implementation of the same parser, as well as the implementations of the various type-specific formatters.
There's no need for any performance work at this point, and C code is "right out" (as the Pythons say). The implementation should be usable as a readable "spec" to resolve gray areas in the PEP.
[] is the most intuitive syntax by far IMHO. Let's run it up the flagpole and see if anybody salutes :)
Fair enough.
The way I have set up the API for writing custom formatters (not talking about the __format__ method here) allows the custom formatter object to examine the entire output string, not merely the part that it is responsible for; And moreover, the custom formatter is free to modify the entire string. So for example, a custom formatter could tabify or un-tabify all previous text within the string.
Ouch. I don't like that.
The API could be made slightly simpler by eliminating this feature. The reason that I added it was specifically so that custom formatters could perform column-specific operations, like the old BASIC function that would print spaces up to a given column. Having generated my share of reports back in the old days (COBOL programming in the USAF), I thought it might be useful to have the ability to do operations based on the absolute column number.
Please drop it. Python has never had it and AFAICR it's never been requested.
Currently the API specifies that a custom formatter is passed an array object, and the custom formatter should append its data to the end of the array, but it is also free to examine and modify the rest of the array.
If I were to remove this feature, then instead of using an array, we'd simply have the custom formatter return a string like __format__ does.
Yes please.
So the question is - is the use case useful enough to keep this feature? What do people think of the use of the Python array type in this case?
That rather constrains the formatting implementation, so I'd like to drop it. BTW I think we should move this back to the py3k list -- the PEP is 3101 after all. That should simplify the PEP a bit because it no longer has ti distinguish between str and unicode. If we later decide to backport it to 2.6 it should be easy enough to figure out what to do with str vs. unicode (probably the same as we do for %). -- --Guido van Rossum (home page: http://www.python.org/~guido/)
Guido van Rossum wrote:
[on escaping]
There is another solution to this which is equally subtle, although fairly straightforward to parse. It involves defining the rules for escapes as follows:
'{{' is an escaped '{' '}}' is an escaped '}', unless we are within a field.
So you can write things like {0:10,{1}}, and the final '}}' will be parsed as two separate closing brackets, since we're within a field definition.
From a parsing standpoint, this is unambiguous, however I've held off on suggesting it because it might appear to be ambiguous to a casual reader.
Sure. But I still think there isn't enough demand for variable expansion *within* a field to bother. When's the lats time you used a * in a % format string? And how essential was it?
True. I'm mainly trying to avoid excess debate by not dropping existing features unecessarily. (Otherwise, you spend way to much time arguing with the handful of people out there that do rely on that use case.) But if you want to use your special "BDFL superpower" to shortcut the debate, I'm fine with that :)
BTW I think we should move this back to the py3k list -- the PEP is 3101 after all. That should simplify the PEP a bit because it no longer has ti distinguish between str and unicode. If we later decide to backport it to 2.6 it should be easy enough to figure out what to do with str vs. unicode (probably the same as we do for %).
All right; Although my understanding is that the PEP should be escalated to c.l.p at some point before acceptance, and I figured py-dev would be a reasonable intermediate point before that. But it sounds like 3101 is going to go back into the shop for the moment, so that's a non-issue. Since you seem to be in a PEP-review mode, could you have a look at 3102? In particular, it seems that all of the controversies on that one have quieted down; Virtually everyone seems in favor of the first part, and you have already ruled in favor of the second part. So I am not sure that there is anything more to discuss. Perhaps I should go ahead and put 3102 on c.l.p at this point. -- Talin
On 5/19/06, Talin <talin@acm.org> wrote:
Since you seem to be in a PEP-review mode, could you have a look at 3102? In particular, it seems that all of the controversies on that one have quieted down; Virtually everyone seems in favor of the first part, and you have already ruled in favor of the second part. So I am not sure that there is anything more to discuss.
Perhaps I should go ahead and put 3102 on c.l.p at this point.
+1 -- --Guido van Rossum (home page: http://www.python.org/~guido/)
participants (6)
-
Edward Loper
-
Greg Ewing
-
Guido van Rossum
-
Joe Smith
-
Steven Bethard
-
Talin