Added a function to parse str.format() mini-language specifiers
I'd like to add a function (or method) to parse str.format()'s standard mini-language format specifiers. It's hard to get right, and if PEP 378 is accepted, it gets more complex. The primary use case is for non-builtin numeric types that want to add __format__, and want it to support the same mini-language that the built in types support. For example see issue 2110, where Mark Dickinson implements his own version for Decimal, and suggests it be moved elsewhere. This function exists in Objects/stringlib/formatter.h, and will just need to be exposed to Python code. I propose a function that takes a single str (or unicode) and returns a named tuple with the appropriate values filled in. So, is such a function desirable, and if so, where would it go? I could expose it through the string module, which is where the sort-of-related Formatter class lives. It could be a method on str and unicode, but I'm not sure that's most appropriate. Eric.
The subject shouldn't have said "Added". It's not a done deal! Eric. Eric Smith wrote:
I'd like to add a function (or method) to parse str.format()'s standard mini-language format specifiers. It's hard to get right, and if PEP 378 is accepted, it gets more complex.
The primary use case is for non-builtin numeric types that want to add __format__, and want it to support the same mini-language that the built in types support. For example see issue 2110, where Mark Dickinson implements his own version for Decimal, and suggests it be moved elsewhere.
This function exists in Objects/stringlib/formatter.h, and will just need to be exposed to Python code. I propose a function that takes a single str (or unicode) and returns a named tuple with the appropriate values filled in.
So, is such a function desirable, and if so, where would it go? I could expose it through the string module, which is where the sort-of-related Formatter class lives.
It could be a method on str and unicode, but I'm not sure that's most appropriate.
Eric. _______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas
On Mon, Mar 16, 2009 at 2:23 PM, Eric Smith <eric@trueblade.com> wrote:
I'd like to add a function (or method) to parse str.format()'s standard mini-language format specifiers.
+1 from me. (Of course. :-) Once the 'n' format code goes in, decimal.py will contain over 200 lines of Python code that really has very little to do with the decimal module at all. I'd like to see that code move somewhere else, partly out of a desire to unclutter the decimal module, and partly to make it easier to cope with changes and new features in the formatting mini-language. Out of curiosity, does anyone know of any numeric types (other than Decimal) that might benefit from this? Something like the '_format_align' function from decimal.py might also be of general use: it just does the job of padding and aligning a numeric string (as well as dealing with the sign).
be exposed to Python code. I propose a function that takes a single str (or unicode) and returns a named tuple with the appropriate values filled in.
Are there advantages to using a named tuple instead of a dict? If there's a possibility that some fields may or may not be defined depending on the value of other fields, then a dict may make more sense. (Not sure whether this can happen with the mini-language in its current form.)
So, is such a function desirable, and if so, where would it go?
Yes, and don't know!
It could be a method on str and unicode, but I'm not sure that's most appropriate.
Doesn't seem right to me, either. Mark
Mark Dickinson wrote:
Something like the '_format_align' function from decimal.py might also be of general use: it just does the job of padding and aligning a numeric string (as well as dealing with the sign).
Standby. That's next on my list of proposals.
Eric Smith wrote:
I'd like to add a function (or method) to parse str.format()'s standard mini-language format specifiers. It's hard to get right, and if PEP 378 is accepted, it gets more complex.
The primary use case is for non-builtin numeric types that want to add __format__, and want it to support the same mini-language that the built in types support. For example see issue 2110, where Mark Dickinson implements his own version for Decimal, and suggests it be moved elsewhere.
This function exists in Objects/stringlib/formatter.h, and will just need to be exposed to Python code. I propose a function that takes a single str (or unicode) and returns a named tuple with the appropriate values filled in.
So, is such a function desirable, and if so,
Yes, but I would take it further and and consider the string and dict/named-tuple as alternate interfaces to the formatting machinery. So I would a) add an inverse function that would take a dict or named tuple and produce the field specifier as a string (or raise ValueError). Such a string could be embedded into a complete format string. Some people might prefer this specification method. b> amend built-in format() to take a dict/n-t as the second argument on the basis that it is silly to transform the parse result back into a string just to be parsed again. This would make repeated calls to format faster by eliminating the parsing step.
where would it go? I could expose it through the string module, which is where the sort-of-related Formatter class lives.
That seems the most obvious place.
It could be a method on str and unicode, but I'm not sure that's most appropriate.
Terry Jan Reedy
Terry Reedy wrote:
Eric Smith wrote:
I'd like to add a function (or method) to parse str.format()'s standard mini-language format specifiers. It's hard to get right, and if PEP 378 is accepted, it gets more complex. ... So, is such a function desirable, and if so,
Yes, but I would take it further and and consider the string and dict/named-tuple as alternate interfaces to the formatting machinery. So I would
If the only use case for this is for non-builtin numeric types, I'd vote for a named tuple. But since Mark (who's one of the primary users) also raised the dict issue, I'll give it some thought.
a) add an inverse function that would take a dict or named tuple and produce the field specifier as a string (or raise ValueError). Such a string could be embedded into a complete format string. Some people might prefer this specification method.
This is a pretty simple transformation. I'm not so sure it's all that useful.
b> amend built-in format() to take a dict/n-t as the second argument on the basis that it is silly to transform the parse result back into a string just to be parsed again. This would make repeated calls to format faster by eliminating the parsing step.
But format() works for any type, including ones that don't understand the standard mini-language. What would they do with this info?
Eric Smith wrote:
Terry Reedy wrote:
Eric Smith wrote:
I'd like to add a function (or method) to parse str.format()'s standard mini-language format specifiers. It's hard to get right, and if PEP 378 is accepted, it gets more complex. ... So, is such a function desirable, and if so,
Yes, but I would take it further and and consider the string and dict/named-tuple as alternate interfaces to the formatting machinery. So I would
If the only use case for this is for non-builtin numeric types, I'd vote for a named tuple. But since Mark (who's one of the primary users) also raised the dict issue, I'll give it some thought.
a) add an inverse function that would take a dict or named tuple and produce the field specifier as a string (or raise ValueError). Such a string could be embedded into a complete format string. Some people might prefer this specification method.
This is a pretty simple transformation. I'm not so sure it's all that useful.
b> amend built-in format() to take a dict/n-t as the second argument on the basis that it is silly to transform the parse result back into a string just to be parsed again. This would make repeated calls to format faster by eliminating the parsing step.
But format() works for any type, including ones that don't understand the standard mini-language. What would they do with this info?
The same thing they would do (whatever that is) if the second argument were instead the equivalent unparsed format-spec string. The easiest implementation of what I am proposing would be for the parse_spec function whose output you propose to expose were to recognize when its input is not a string but previous parse output. Just as iter(iter(ob)) is iter(ob), parse(parse(spec)) should be the same as parse(spec). tjr
Eric Smith wrote:
So, is such a function desirable, and if so, where would it go? I could expose it through the string module, which is where the sort-of-related Formatter class lives.
string.parse_format and string.build_format perhaps? The inverse operation would be useful if you just wanted to do something like "use a default precision of 3" but otherwise leave things up to the original object. def custom_format(fmt, value): details = string.parse_format(fmt) if details["precision"] is None: # Assumes None indicates missing details["precision"] = 3 fmt = string.build_format(details) return format(fmt, value) While having to rebuild and reparse the string is a little annoying, changing that would involve changing the spec for the __format__ magic method and I don't think we want to go there. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
Nick Coghlan wrote:
Eric Smith wrote:
So, is such a function desirable, and if so, where would it go? I could expose it through the string module, which is where the sort-of-related Formatter class lives.
string.parse_format and string.build_format perhaps? The inverse operation would be useful if you just wanted to do something like "use a default precision of 3" but otherwise leave things up to the original object.
def custom_format(fmt, value): details = string.parse_format(fmt) if details["precision"] is None: # Assumes None indicates missing details["precision"] = 3 fmt = string.build_format(details) return format(fmt, value)
return format(value, fmt) # ;-)
While having to rebuild and reparse the string is a little annoying,
yes
changing that would involve changing the spec for the __format__ magic method and I don't think we want to go there.
If parse_format were idempotent for its output like iter, then the change would, I think, be pretty minimal. I am assuming here that the __format__ method calls the parse_format(fmt) function that Eric proposed to expose. If details == parse_format(details) == parse_format(build_format(details)), then the rebuild and reparse is not needed and passing details instead of the rebuilt string should be transparent to __format__. tjr
participants (4)
-
Eric Smith
-
Mark Dickinson
-
Nick Coghlan
-
Terry Reedy