Is this the right forum for a proposed new stdlib function?

I'd like to float the idea of an addition of a new function to the Template class (in the string module). I'm a bit of a newbie around here, though, and uncertain of the proper procedure. Is this the right mailing list for that, or should I be using python-list instead? Thanks, - Joe

On Fri, Oct 10, 2008 at 12:56 PM, Joe Strout <joe@strout.net> wrote:
You can try here or there. Both places will provide feedback, although this list happens to tend to focus on language stuff. -Brett

OK, here's my pitch -- in rough PEP form, even though this may be small enough to not merit a PEP. I'd really like your feedback on this idea. Abstract I propose we add a new function on the string.Template [1] class, match(), to perform the approximate inverse of the existing substitute() function. That is, it attempts to match an input string against a template, and if successful, returns a dictionary providing the matched text for each template field. Rationale PEP 292 [2] added a simplified string substitution feature, allowing users to easily substitute text for named fields in a template string. The inverse operation is also useful: given a template and an input string, one wishes to find the text in the input string matching the fields in the template. However, Python currently has no easy way to do it. While this named matching operation can be accomplished using RegEx, the constructions required are somewhat complex and error prone. It can also be done using third-party modules such as pyparse, but again the setup requires more code and is not obvious to programmers inexperienced with that module. In addition, the Template class already has all the data needed to perform this operation, so it is a natural fit to simply add a new method on this class to perform a match, in addition to the existing method to perform a substitution. Proposal Proposed is the addition of one new attribute, and one new function, on the existing Template class, as follows: 1. 'greedy' is a new attribute that determines whether the field matches should be done in a greedy manner, equivalent to regex pattern '(.*)'; or in a non-greedy manner, equivalent to '(.*?)'. This attribute defaults to false. 2. 'match' is a new function which accepts one parameter, an input string. If the input string can be matched to the template pattern (respecting the 'greedy' flag), then match returns a dictionary, where each field in the pattern maps to the corresponding part of the input string. If the input string cannot be matched to the template pattern, then match returns NOne. Examples: >>> from string import Template >>> s = Template('$name was born in ${country}') >>> print s.match('Guido was born in the Netherlands') {'name':'Guido', 'country':'the Netherlands'} >>> print s.match('Spam was born as a canned ham') None Note that when the match is successful, the resulting dictionary could be passed through Template.substitute to reconstitute the original input string. Conversely, any string created by Template.substitute could be matched by Template.match (though in unusual cases, the resulting dictionary might not exactly match the original, e.g. if the string could be matched in multiple ways). Thus, .match and .substitute are inverse operations. References [1] Template Strings http://www.python.org/doc/2.5.2/lib/node40.html [2] PEP 292: Simpler String Substitutions http://www.python.org/dev/peps/pep-0292/

I don't want to commit to whether this should be in the stdlib or not, but on the design part I'd say it would be better to make 'greedy' an optional parameter to the match method. It's only used in one method and not really a property of the template, but of the matching:
print s.match('Guido was born in the Netherlands', greedy=True)
Jan

On Oct 10, 2008, at 6:09 PM, Jan Kanis wrote:
That's an excellent point. I had it as a property because of the way my prototype implementation worked, but now that I look at it again, there's no good reason it has to work that way. (We probably want to cache the compiled regex object under the hood, but we can store which greediness option was used, or even cache them both -- all internal implementation detail that the user shouldn't care about.) Thanks, - Joe

On Oct 10, 2008, at 7:38 PM, Jared Grubb wrote:
You can basically do this using regular expressions; it's not as "pretty", but it does exactly the same thing
That's true; and you can use % to do the same thing as Template.substitute (though it's not as pretty). The point is, we already have a very pretty Template class that does this operation in one direction; it ought to do it in the other direction too. The fact that it doesn't is surprising to a newbie (speaking from personal experience there), and the equivalent 're' incantation is considerably harder to come up with -- even more so than using % is harder than Template.substitute. Best, - Joe

Boris Borcic wrote:
A phrase like 'similar comment' is sometimes hard to expand. Are you saying that in 3.x .split should produce an iterator instead of a list? Or that ''.split(s) should return list(s) instead of [''] as now (in 3.0 at least).

Terry Reedy wrote:
The latter, eg sep.join(sep.split(s))==s. But somewhat tongue-in-cheek. More generally, I guess what I am saying is that sequence-of-chars <--> string conversion is a particularly sore spot when someone tries to think/learn about the operations in Python in a structuralist or "mathematical" manner. There are three quite distinct manners to infer an operation that *should* convert back list(s) to s, but none work. Cheers, BB

On Fri, Oct 10, 2008 at 6:50 PM, Joe Strout <joe@strout.net> wrote: Proposed is the addition of one new attribute, and one new function, on the
One objection is that the hardcoded pattern '(.*)' or '(.*?)' doesn't seem generally applicable; e.g. the example above would break if the sentence continued "..in the Netherlands at 19XX". It might be possible to generalize it (e.g. by passing keyword arguments with the expected regexp for each template variable, such as "name=r'.*'', country=r'\w+'") but in this case you might as well use an explicit regexp. Regardless, you'll need more examples and more compelling use cases before this has any chance to move forward. You may start from the stdlib and see how much things could be simplified if Template.match was available. George

On Fri, Oct 10, 2008 at 12:56 PM, Joe Strout <joe@strout.net> wrote:
You can try here or there. Both places will provide feedback, although this list happens to tend to focus on language stuff. -Brett

OK, here's my pitch -- in rough PEP form, even though this may be small enough to not merit a PEP. I'd really like your feedback on this idea. Abstract I propose we add a new function on the string.Template [1] class, match(), to perform the approximate inverse of the existing substitute() function. That is, it attempts to match an input string against a template, and if successful, returns a dictionary providing the matched text for each template field. Rationale PEP 292 [2] added a simplified string substitution feature, allowing users to easily substitute text for named fields in a template string. The inverse operation is also useful: given a template and an input string, one wishes to find the text in the input string matching the fields in the template. However, Python currently has no easy way to do it. While this named matching operation can be accomplished using RegEx, the constructions required are somewhat complex and error prone. It can also be done using third-party modules such as pyparse, but again the setup requires more code and is not obvious to programmers inexperienced with that module. In addition, the Template class already has all the data needed to perform this operation, so it is a natural fit to simply add a new method on this class to perform a match, in addition to the existing method to perform a substitution. Proposal Proposed is the addition of one new attribute, and one new function, on the existing Template class, as follows: 1. 'greedy' is a new attribute that determines whether the field matches should be done in a greedy manner, equivalent to regex pattern '(.*)'; or in a non-greedy manner, equivalent to '(.*?)'. This attribute defaults to false. 2. 'match' is a new function which accepts one parameter, an input string. If the input string can be matched to the template pattern (respecting the 'greedy' flag), then match returns a dictionary, where each field in the pattern maps to the corresponding part of the input string. If the input string cannot be matched to the template pattern, then match returns NOne. Examples: >>> from string import Template >>> s = Template('$name was born in ${country}') >>> print s.match('Guido was born in the Netherlands') {'name':'Guido', 'country':'the Netherlands'} >>> print s.match('Spam was born as a canned ham') None Note that when the match is successful, the resulting dictionary could be passed through Template.substitute to reconstitute the original input string. Conversely, any string created by Template.substitute could be matched by Template.match (though in unusual cases, the resulting dictionary might not exactly match the original, e.g. if the string could be matched in multiple ways). Thus, .match and .substitute are inverse operations. References [1] Template Strings http://www.python.org/doc/2.5.2/lib/node40.html [2] PEP 292: Simpler String Substitutions http://www.python.org/dev/peps/pep-0292/

I don't want to commit to whether this should be in the stdlib or not, but on the design part I'd say it would be better to make 'greedy' an optional parameter to the match method. It's only used in one method and not really a property of the template, but of the matching:
print s.match('Guido was born in the Netherlands', greedy=True)
Jan

On Oct 10, 2008, at 6:09 PM, Jan Kanis wrote:
That's an excellent point. I had it as a property because of the way my prototype implementation worked, but now that I look at it again, there's no good reason it has to work that way. (We probably want to cache the compiled regex object under the hood, but we can store which greediness option was used, or even cache them both -- all internal implementation detail that the user shouldn't care about.) Thanks, - Joe

On Oct 10, 2008, at 7:38 PM, Jared Grubb wrote:
You can basically do this using regular expressions; it's not as "pretty", but it does exactly the same thing
That's true; and you can use % to do the same thing as Template.substitute (though it's not as pretty). The point is, we already have a very pretty Template class that does this operation in one direction; it ought to do it in the other direction too. The fact that it doesn't is surprising to a newbie (speaking from personal experience there), and the equivalent 're' incantation is considerably harder to come up with -- even more so than using % is harder than Template.substitute. Best, - Joe

Boris Borcic wrote:
A phrase like 'similar comment' is sometimes hard to expand. Are you saying that in 3.x .split should produce an iterator instead of a list? Or that ''.split(s) should return list(s) instead of [''] as now (in 3.0 at least).

Terry Reedy wrote:
The latter, eg sep.join(sep.split(s))==s. But somewhat tongue-in-cheek. More generally, I guess what I am saying is that sequence-of-chars <--> string conversion is a particularly sore spot when someone tries to think/learn about the operations in Python in a structuralist or "mathematical" manner. There are three quite distinct manners to infer an operation that *should* convert back list(s) to s, but none work. Cheers, BB

On Fri, Oct 10, 2008 at 6:50 PM, Joe Strout <joe@strout.net> wrote: Proposed is the addition of one new attribute, and one new function, on the
One objection is that the hardcoded pattern '(.*)' or '(.*?)' doesn't seem generally applicable; e.g. the example above would break if the sentence continued "..in the Netherlands at 19XX". It might be possible to generalize it (e.g. by passing keyword arguments with the expected regexp for each template variable, such as "name=r'.*'', country=r'\w+'") but in this case you might as well use an explicit regexp. Regardless, you'll need more examples and more compelling use cases before this has any chance to move forward. You may start from the stdlib and see how much things could be simplified if Template.match was available. George
participants (7)
-
Boris Borcic
-
Brett Cannon
-
George Sakkis
-
Jan Kanis
-
Jared Grubb
-
Joe Strout
-
Terry Reedy