[Python-ideas] Where/how to propose an addition to a standard module?

Joe Strout joe at strout.net
Mon Oct 13 18:16:07 CEST 2008


On Oct 13, 2008, at 8:46 AM, pruebauno at latinmail.com wrote:

> Whenever I needed such functionality I used the re module. The benefit
> is that it uses unix style regular expression syntax and an egrep/awk/
> perl/ruby user can understand it. You should show a few examples where
> your proposal looks better than just using RE.

Well, I suppose if you're already used to RE, then maybe it's not  
obvious that to an RE newbie, this:

  regex = re.compile("The (?P<object>.*?) in (?P<location>.*?) falls  
mainly in the (?P<subloc>.*?).")
  d = regex.match(text).groupdict()

is far harder to read and type correctly than this:

  templ = Template("This $object in $location falls mainly in the  
$subloc")
  d = templ.match(text)

Any other example would show the same simplification.

Of course, if you're the sort of person who uses RE, you probably  
don't use Template.substitute either, since you probably like and are  
comfortable with the string % operator.  But Template.substitute was  
introduced to make it easier to handle the common, simple substitution  
operations, and I believe adding a Template.match method would do the  
same thing for common, simple matching operations.

Here's a more fleshed-out proposal, with rationale and references --  
see if this makes it any clearer why I think this would be a fine  
addition to the Template class.


Abstract

Introduces a new function on the string.Template [1] class, match(),  
to perform the approximate inverse of the existing substitute()  
function.  That is, it attempts to match an input string against a  
template, and if successful, returns a dictionary providing the  
matched text for each template field.


Rationale

PEP 292 [2] added a simplified string substitution feature, allowing  
users to easily substitute text for named fields in a template  
string.  The inverse operation is also useful: given a template and an  
input string, one wishes to find the text in the input string matching  
the fields in the template.  However, Python currently has no easy way  
to do it.

While this named matching operation can be accomplished using RegEx,  
the constructions required are somewhat complex and error prone.  It  
can also be done using third-party modules such as pyparse, but again  
the setup requires more code and is not obvious to programmers  
inexperienced with that module.

In addition, the Template class already has all the data needed to  
perform this operation, so it is a natural fit to simply add a new  
method on this class to perform a match, in addition to the existing  
method to perform a substitution.


Proposal

Proposed is the addition of one new function, on the existing Template  
class, as follows:

   def match(text, greedy=false)

'match' is a new function which accepts one required parameter, an  
input string; and one optional parameter, 'greedy', which determines  
whether matches should be done in a greedy manner, equivalent to regex  
pattern '(.*)'; or in a non-greedy manner, equivalent to '(.*?)'.  If  
the input string can be matched to the template pattern (respecting  
the 'greedy' flag), then match returns a dictionary, where each field  
in the pattern maps to the corresponding part of the input string.  If  
the input string cannot be matched to the template pattern, then match  
returns None.

Examples:

    >>> from string import Template
    >>> s = Template('$name was born in ${country}')
    >>> print s.match('Guido was born in the Netherlands')
    {'name':'Guido', 'country':'the Netherlands'}
    >>> print s.match('Spam was born as a canned ham')
    None

Note that when the match is successful, the resulting dictionary could  
be passed through Template.substitute to reconstitute the original  
input string.  Conversely, any string created by Template.substitute  
could be matched by Template.match (though in unusual cases, the  
resulting dictionary might not exactly match the original, e.g. if the  
string could be matched in multiple ways).  Thus, .match  
and .substitute are inverse operations.


References

[1] Template Strings
    http://www.python.org/doc/2.5.2/lib/node40.html

[2] PEP 292: Simpler String Substitutions
    http://www.python.org/dev/peps/pep-0292/






More information about the Python-ideas mailing list