[Python-ideas] Where/how to propose an addition to a standard module?
Joe Strout
joe at strout.net
Mon Oct 13 18:16:07 CEST 2008
On Oct 13, 2008, at 8:46 AM, pruebauno at latinmail.com wrote:
> Whenever I needed such functionality I used the re module. The benefit
> is that it uses unix style regular expression syntax and an egrep/awk/
> perl/ruby user can understand it. You should show a few examples where
> your proposal looks better than just using RE.
Well, I suppose if you're already used to RE, then maybe it's not
obvious that to an RE newbie, this:
regex = re.compile("The (?P<object>.*?) in (?P<location>.*?) falls
mainly in the (?P<subloc>.*?).")
d = regex.match(text).groupdict()
is far harder to read and type correctly than this:
templ = Template("This $object in $location falls mainly in the
$subloc")
d = templ.match(text)
Any other example would show the same simplification.
Of course, if you're the sort of person who uses RE, you probably
don't use Template.substitute either, since you probably like and are
comfortable with the string % operator. But Template.substitute was
introduced to make it easier to handle the common, simple substitution
operations, and I believe adding a Template.match method would do the
same thing for common, simple matching operations.
Here's a more fleshed-out proposal, with rationale and references --
see if this makes it any clearer why I think this would be a fine
addition to the Template class.
Abstract
Introduces a new function on the string.Template [1] class, match(),
to perform the approximate inverse of the existing substitute()
function. That is, it attempts to match an input string against a
template, and if successful, returns a dictionary providing the
matched text for each template field.
Rationale
PEP 292 [2] added a simplified string substitution feature, allowing
users to easily substitute text for named fields in a template
string. The inverse operation is also useful: given a template and an
input string, one wishes to find the text in the input string matching
the fields in the template. However, Python currently has no easy way
to do it.
While this named matching operation can be accomplished using RegEx,
the constructions required are somewhat complex and error prone. It
can also be done using third-party modules such as pyparse, but again
the setup requires more code and is not obvious to programmers
inexperienced with that module.
In addition, the Template class already has all the data needed to
perform this operation, so it is a natural fit to simply add a new
method on this class to perform a match, in addition to the existing
method to perform a substitution.
Proposal
Proposed is the addition of one new function, on the existing Template
class, as follows:
def match(text, greedy=false)
'match' is a new function which accepts one required parameter, an
input string; and one optional parameter, 'greedy', which determines
whether matches should be done in a greedy manner, equivalent to regex
pattern '(.*)'; or in a non-greedy manner, equivalent to '(.*?)'. If
the input string can be matched to the template pattern (respecting
the 'greedy' flag), then match returns a dictionary, where each field
in the pattern maps to the corresponding part of the input string. If
the input string cannot be matched to the template pattern, then match
returns None.
Examples:
>>> from string import Template
>>> s = Template('$name was born in ${country}')
>>> print s.match('Guido was born in the Netherlands')
{'name':'Guido', 'country':'the Netherlands'}
>>> print s.match('Spam was born as a canned ham')
None
Note that when the match is successful, the resulting dictionary could
be passed through Template.substitute to reconstitute the original
input string. Conversely, any string created by Template.substitute
could be matched by Template.match (though in unusual cases, the
resulting dictionary might not exactly match the original, e.g. if the
string could be matched in multiple ways). Thus, .match
and .substitute are inverse operations.
References
[1] Template Strings
http://www.python.org/doc/2.5.2/lib/node40.html
[2] PEP 292: Simpler String Substitutions
http://www.python.org/dev/peps/pep-0292/
More information about the Python-ideas
mailing list