Proposed convenience functions for re module
![](https://secure.gravatar.com/avatar/5615a372d9866f203a22b2c437527bbb.jpg?s=120&d=mm&r=g)
Following the thread "Experiment: Adding "re" to string objects.", I would like to propose the addition of two convenience functions to the re module: def multimatch(s, *patterns): """Do a re.match on s using each pattern in patterns, returning the first one to succeed, or None if they all fail.""" for pattern in patterns: m = re.match(pattern, s) if m: return m def multisearch(s, *patterns): """Do a re.search on s using each pattern in patterns, returning the first one to succeed, or None if they all fail.""" for pattern in patterns: m = re.search(pattern, s) if m: return m The rationale is to make the following idiom easier: m = re.match(s, pattern1) if not m: m = re.match(s, pattern2) if not m: m = re.match(s, pattern3) if not m: m = re.match(s, pattern4) if m: m.group() which will become: m = re.multimatch(s, pattern1, pattern2, pattern3, pattern4) if m: m.group() Is there any support or objections to this proposal? Any comments? -- Steven D'Aprano
![](https://secure.gravatar.com/avatar/5ce43469c0402a7db8d0cf86fa49da5a.jpg?s=120&d=mm&r=g)
Steven D'Aprano wrote:
Following the thread "Experiment: Adding "re" to string objects.", I would like to propose the addition of two convenience functions to the re module:
def multimatch(s, *patterns): """Do a re.match on s using each pattern in patterns, returning the first one to succeed, or None if they all fail.""" for pattern in patterns: m = re.match(pattern, s) if m: return m
def multisearch(s, *patterns): """Do a re.search on s using each pattern in patterns, returning the first one to succeed, or None if they all fail.""" for pattern in patterns: m = re.search(pattern, s) if m: return m
The rationale is to make the following idiom easier:
m = re.match(s, pattern1) if not m: m = re.match(s, pattern2) if not m: m = re.match(s, pattern3) if not m: m = re.match(s, pattern4) if m: m.group()
which will become:
m = re.multimatch(s, pattern1, pattern2, pattern3, pattern4) if m: m.group()
Is there any support or objections to this proposal? Any comments?
Extend the current re.match and re.search to accept a tuple of patterns: m = re.match((pattern1, pattern2, pattern3, pattern4), s) if m: print m.group() This format is already used by some string methods, eg str.startswith().
![](https://secure.gravatar.com/avatar/943c60e4b1686ea72a0dd48fd394dee4.jpg?s=120&d=mm&r=g)
Why not: from itertools import starmap for x in starmap(re.match, *patterns): if x: break On Tue, Jul 21, 2009 at 8:50 PM, MRAB<python@mrabarnett.plus.com> wrote:
Steven D'Aprano wrote:
Following the thread "Experiment: Adding "re" to string objects.", I would like to propose the addition of two convenience functions to the re module:
def multimatch(s, *patterns): """Do a re.match on s using each pattern in patterns, returning the first one to succeed, or None if they all fail.""" for pattern in patterns: m = re.match(pattern, s) if m: return m
def multisearch(s, *patterns): """Do a re.search on s using each pattern in patterns, returning the first one to succeed, or None if they all fail.""" for pattern in patterns: m = re.search(pattern, s) if m: return m
The rationale is to make the following idiom easier:
m = re.match(s, pattern1) if not m: m = re.match(s, pattern2) if not m: m = re.match(s, pattern3) if not m: m = re.match(s, pattern4) if m: m.group()
which will become:
m = re.multimatch(s, pattern1, pattern2, pattern3, pattern4) if m: m.group()
Is there any support or objections to this proposal? Any comments?
Extend the current re.match and re.search to accept a tuple of patterns:
m = re.match((pattern1, pattern2, pattern3, pattern4), s) if m: print m.group()
This format is already used by some string methods, eg str.startswith(). _______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas
-- Gerald Britton
![](https://secure.gravatar.com/avatar/72ee673975357d43d79069ac1cd6abda.jpg?s=120&d=mm&r=g)
MRAB wrote:
Extend the current re.match and re.search to accept a tuple of patterns:
m = re.match((pattern0, pattern1, pattern2, pattern3), s) if m: print m.group()
Also give the match object an 'index' attribute indicating which pattern matched, so you can do m = re.match((pattern1, pattern2, pattern3, pattern4), s) if m: if m.index == 0: # pattern0 matched elif m.index == 1: # pattern1 matched # etc. -- Greg
![](https://secure.gravatar.com/avatar/72ee673975357d43d79069ac1cd6abda.jpg?s=120&d=mm&r=g)
Steven D'Aprano wrote:
def multimatch(s, *patterns): """Do a re.match on s using each pattern in patterns, returning the first one to succeed, or None if they all fail.""" for pattern in patterns: m = re.match(pattern, s) if m: return m
How are you supposed to tell which one matched? -- Greg
![](https://secure.gravatar.com/avatar/1e2c0b1ad603fea3195d1309cfebcbd1.jpg?s=120&d=mm&r=g)
En Wed, 22 Jul 2009 03:17:46 -0300, Greg Ewing <greg.ewing-F+z8Qja7x9Xokq/tPzqvJg@public.gmane.org> escribió:
Steven D'Aprano wrote:
def multimatch(s, *patterns): """Do a re.match on s using each pattern in patterns, returning the first one to succeed, or None if they all fail.""" for pattern in patterns: m = re.match(pattern, s) if m: return m
How are you supposed to tell which one matched?
m.re contains the matching expression -- Gabriel Genellina
![](https://secure.gravatar.com/avatar/f89277617ab0cb097e2f48d12f611ca1.jpg?s=120&d=mm&r=g)
Steven D'Aprano wrote:
Following the thread "Experiment: Adding "re" to string objects.", I would like to propose the addition of two convenience functions to the re module:
def multimatch(s, *patterns): """Do a re.match on s using each pattern in patterns, returning the first one to succeed, or None if they all fail.""" for pattern in patterns: m = re.match(pattern, s) if m: return m
def multisearch(s, *patterns): """Do a re.search on s using each pattern in patterns, returning the first one to succeed, or None if they all fail.""" for pattern in patterns: m = re.search(pattern, s) if m: return m
The rationale is to make the following idiom easier:
m = re.match(s, pattern1) if not m: m = re.match(s, pattern2) if not m: m = re.match(s, pattern3) if not m: m = re.match(s, pattern4) if m: m.group()
which will become:
m = re.multimatch(s, pattern1, pattern2, pattern3, pattern4) if m: m.group()
Is there any support or objections to this proposal? Any comments?
One of the needs I've run across is to enable the program user (possibly a non-programmer) to do logical searches on data. It would be nice if the search patterns specified by the program user could be used directly by the functions. Search functions of this type would take patterns that are more like what you would use for google or yahoo searches instead of the more complex language re requires. Ron
![](https://secure.gravatar.com/avatar/5615a372d9866f203a22b2c437527bbb.jpg?s=120&d=mm&r=g)
On Wed, 22 Jul 2009 06:02:54 pm Ron Adam wrote:
One of the needs I've run across is to enable the program user (possibly a non-programmer) to do logical searches on data.
It would be nice if the search patterns specified by the program user could be used directly by the functions. Search functions of this type would take patterns that are more like what you would use for google or yahoo searches instead of the more complex language re requires.
I'm not sure if I understand this correctly. Perhaps you could give an example or two? Also, please don't overload my simple little proposal with a multitude of new functionality. My proposal is only meant to be a lightweight convenience function. Additional functionality probably belongs as a different function, maybe even a different module. -- Steven D'Aprano
![](https://secure.gravatar.com/avatar/f89277617ab0cb097e2f48d12f611ca1.jpg?s=120&d=mm&r=g)
Steven D'Aprano wrote:
On Wed, 22 Jul 2009 06:02:54 pm Ron Adam wrote:
One of the needs I've run across is to enable the program user (possibly a non-programmer) to do logical searches on data.
It would be nice if the search patterns specified by the program user could be used directly by the functions. Search functions of this type would take patterns that are more like what you would use for google or yahoo searches instead of the more complex language re requires.
I'm not sure if I understand this correctly. Perhaps you could give an example or two?
Also, please don't overload my simple little proposal with a multitude of new functionality. My proposal is only meant to be a lightweight convenience function. Additional functionality probably belongs as a different function, maybe even a different module.
Yes, it would be a different module and not added directly to the re module. While you are thinking of simplifying re for programmers, I'm thinking of simplified searches for users. A different target and purpose. I think your functions would make this idea easier to do. It would be nice if we could do simple logical searches where. [word1 word2] ;get results with either word1 or word2 [+word1 +word2] ;get results with both word1 and word2 [word1 -word2] ;get results with word1 and not with word2 ["word one" "word two"] ;use quotes to search for phrases And possibly use '*' and '?' as simple wild cards but keep it easy to use and simple. More complex searches should use the re module directly. This would act as a filter for lists and would be suitable for adding a *simple* user search capability to many scrips and applications. An example would be to enhance pydocs search of the summery lines. Currently if you type "modules key", if the key is multiple words, it only searches on the first word. You can not do searches on multiple words or exclude results with certain words. While we could allow regular expression input to work, for many applications it is overkill and it is too complex for many users. For example I would not like to try and teach my parents all the subtleties of regular expressions when they are struggling to understand a lot more basic things. They don't want to learn how to program computers, they just want to get a recipe that has [+chicken +"tomato sauce" -onions]. Ron
![](https://secure.gravatar.com/avatar/7e4e7569d64e14de784aca9f9a8fffb4.jpg?s=120&d=mm&r=g)
On Thu, Jul 23, 2009, Ron Adam wrote:
While we could allow regular expression input to work, for many applications it is overkill and it is too complex for many users. For example I would not like to try and teach my parents all the subtleties of regular expressions when they are struggling to understand a lot more basic things. They don't want to learn how to program computers, they just want to get a recipe that has [+chicken +"tomato sauce" -onions].
This sounds like a *great* addition to PyPI.... ;-) (That is, something like this is unlikely to make it into Python unless there's code that has seen uptake in the community.) -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "The volume of a pizza of thickness 'a' and radius 'z' is given by pi*z*z*a"
![](https://secure.gravatar.com/avatar/6217deb9ff31cb986d5437d76d530c16.jpg?s=120&d=mm&r=g)
Ron Adam <rrr@ronadam.com>"
One of the needs I've run across is to enable the program user (possibly a non-programmer) to do logical searches on data.
It would be nice if the search patterns specified by the program user could be used directly by the functions. Search functions of this type would take patterns that are more like what you would use for google or yahoo searches instead of the more complex language re requires.
It's a proposition of another string matching module ("browsermatch"? :-)) +0.6 *j
![](https://secure.gravatar.com/avatar/c3eddec3d2fbdeb60b81f8fcfb6c5d23.jpg?s=120&d=mm&r=g)
Steven D'Aprano wrote:
Following the thread "Experiment: Adding "re" to string objects.", I would like to propose the addition of two convenience functions to the re module:
def multimatch(s, *patterns): """Do a re.match on s using each pattern in patterns, returning the first one to succeed, or None if they all fail.""" for pattern in patterns: m = re.match(pattern, s) if m: return m
There's a cute trick that you can use to do this that is much more efficient than testing each regex expression individually: combined_pattern = "|".join("(%s)" % p for p in patterns) combined_re = re.compile(combined_pattern) m = combined_re.match(string) return m.lastindex Basically, it combines all of the patterns into a single large regex, where each pattern is converted into a capturing group. It then returns match.lastindex, which is the index of the capturing group that matched. This is very efficient because now all of the patterns are combined into a single NFA which can prune possibilities very quickly. This works for up to 99 patterns, which is the limit on the number of capturing groups that a regex can have. I use this technique in my Python-based lexer, Reflex: http://pypi.python.org/pypi/reflex/0.1 Now, if we are talking about convenience functions, what I would really like to see is a class that wraps a string that allows matches to be done incrementally, where each successful match consumes the head of the string, leaving the remainder of the string for the next match. This can be done very efficiently since the regex functions all take a start-index parameter. Essentially, the wrapper class would update the start index each time a successful match was performed. So something like: stream = MatchStream(string) while 1: m = stream.match(combined_re) # m is a match object # Do something with m Or even an iterator over matches (this assumes that you want to use the same regex each time, which may not be the case for a parser): stream = MatchStream(string) for m in stream.match(combined_re): # m is a match object # Do something with m -- Talin
![](https://secure.gravatar.com/avatar/5ce43469c0402a7db8d0cf86fa49da5a.jpg?s=120&d=mm&r=g)
Talin wrote:
Steven D'Aprano wrote:
Following the thread "Experiment: Adding "re" to string objects.", I would like to propose the addition of two convenience functions to the re module:
def multimatch(s, *patterns): """Do a re.match on s using each pattern in patterns, returning the first one to succeed, or None if they all fail.""" for pattern in patterns: m = re.match(pattern, s) if m: return m
There's a cute trick that you can use to do this that is much more efficient than testing each regex expression individually:
combined_pattern = "|".join("(%s)" % p for p in patterns) combined_re = re.compile(combined_pattern)
m = combined_re.match(string) return m.lastindex
Basically, it combines all of the patterns into a single large regex, where each pattern is converted into a capturing group. It then returns match.lastindex, which is the index of the capturing group that matched. This is very efficient because now all of the patterns are combined into a single NFA which can prune possibilities very quickly.
This works for up to 99 patterns, which is the limit on the number of capturing groups that a regex can have.
[snip] It won't work properly if the patterns themselves contain capture groups.
![](https://secure.gravatar.com/avatar/2fc5b058e338d06a8d8f8cd0cfe48376.jpg?s=120&d=mm&r=g)
Talin schrieb:
Now, if we are talking about convenience functions, what I would really like to see is a class that wraps a string that allows matches to be done incrementally, where each successful match consumes the head of the string, leaving the remainder of the string for the next match.
This can be done very efficiently since the regex functions all take a start-index parameter. Essentially, the wrapper class would update the start index each time a successful match was performed.
So something like:
stream = MatchStream(string) while 1: m = stream.match(combined_re) # m is a match object # Do something with m
Or even an iterator over matches (this assumes that you want to use the same regex each time, which may not be the case for a parser):
stream = MatchStream(string) for m in stream.match(combined_re): # m is a match object # Do something with m
You might be interested in the undocumented re.Scanner class :) Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out.
![](https://secure.gravatar.com/avatar/764323a14e554c97ab74177e0bce51d4.jpg?s=120&d=mm&r=g)
On 2009-07-22 19:49, Greg Ewing wrote:
Georg Brandl wrote:
You might be interested in the undocumented re.Scanner class :)
It looks like it could be interesting, but with no documentation, how are we supposed to tell?
Is there *any* information about it available anywhere?
http://mail.python.org/pipermail/python-dev/2003-April/035075.html -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
![](https://secure.gravatar.com/avatar/d995b462a98fea412efa79d17ba3787a.jpg?s=120&d=mm&r=g)
2009/7/23 Robert Kern <robert.kern@gmail.com>:
On 2009-07-22 19:49, Greg Ewing wrote:
Georg Brandl wrote:
You might be interested in the undocumented re.Scanner class :)
It looks like it could be interesting, but with no documentation, how are we supposed to tell?
Is there *any* information about it available anywhere?
http://mail.python.org/pipermail/python-dev/2003-April/035075.html
Question: Is there any reason (other than lack of time) why it's undocumented? I'd be willing to write some documentation, but only if it would stand a chance of being accepted - this isn't an itch of mine, so I don't want to spend ages arguing over whether the class should be documented. The source code says "experimental stuff (see python-dev discussions for details)". I've not searched the python-dev archives yet, but it seems to me that it'll never be anything other than experimental if people don't know it's there and try it out... Paul.
![](https://secure.gravatar.com/avatar/764323a14e554c97ab74177e0bce51d4.jpg?s=120&d=mm&r=g)
On 2009-07-23 11:50, Paul Moore wrote:
2009/7/23 Robert Kern<robert.kern@gmail.com>:
On 2009-07-22 19:49, Greg Ewing wrote:
Georg Brandl wrote:
You might be interested in the undocumented re.Scanner class :) It looks like it could be interesting, but with no documentation, how are we supposed to tell?
Is there *any* information about it available anywhere? http://mail.python.org/pipermail/python-dev/2003-April/035075.html
Question: Is there any reason (other than lack of time) why it's undocumented?
http://mail.python.org/pipermail/python-dev/2003-April/035070.html -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
![](https://secure.gravatar.com/avatar/943c60e4b1686ea72a0dd48fd394dee4.jpg?s=120&d=mm&r=g)
So, in 2003 it was discussed but considered not ready for prime time. If six years is not enough to get it ready, I can't imagine what would be. On Thu, Jul 23, 2009 at 1:06 PM, Robert Kern<robert.kern@gmail.com> wrote:
On 2009-07-23 11:50, Paul Moore wrote:
2009/7/23 Robert Kern<robert.kern@gmail.com>:
On 2009-07-22 19:49, Greg Ewing wrote:
Georg Brandl wrote:
You might be interested in the undocumented re.Scanner class :)
It looks like it could be interesting, but with no documentation, how are we supposed to tell?
Is there *any* information about it available anywhere?
http://mail.python.org/pipermail/python-dev/2003-April/035075.html
Question: Is there any reason (other than lack of time) why it's undocumented?
http://mail.python.org/pipermail/python-dev/2003-April/035070.html
-- Robert Kern
"I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
_______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas
-- Gerald Britton
![](https://secure.gravatar.com/avatar/764323a14e554c97ab74177e0bce51d4.jpg?s=120&d=mm&r=g)
On 2009-07-23 16:48, Gerald Britton wrote:
So, in 2003 it was discussed but considered not ready for prime time. If six years is not enough to get it ready, I can't imagine what would be.
I'm pretty sure that no one has worked on it. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
![](https://secure.gravatar.com/avatar/5ce43469c0402a7db8d0cf86fa49da5a.jpg?s=120&d=mm&r=g)
Robert Kern wrote:
On 2009-07-23 16:48, Gerald Britton wrote:
So, in 2003 it was discussed but considered not ready for prime time. If six years is not enough to get it ready, I can't imagine what would be.
I'm pretty sure that no one has worked on it.
I worked on it in issue #2636 (turning unnamed capture groups into non-capture groups and forbidding group references and named groups) and also added a generator method.
![](https://secure.gravatar.com/avatar/a87284d140d3aea831234f77ef4f9e64.jpg?s=120&d=mm&r=g)
2009/7/22 Steven D'Aprano <steve@pearwood.info>:
The rationale is to make the following idiom easier:
m = re.match(s, pattern1) if not m: m = re.match(s, pattern2) if not m: m = re.match(s, pattern3) if not m: m = re.match(s, pattern4) if m: m.group()
which will become:
m = re.multimatch(s, pattern1, pattern2, pattern3, pattern4) if m: m.group()
Is there any support or objections to this proposal? Any comments?
I don't like it very much because it would only work for uncompiled patterns. All functions in re has a RegexObject counterpart, but multisearch() and multimatch() obviously would not. For the quoted example I'd usually try to create one regex that matches all four patterns, or use a loop: for pat in (pattern1, pattern2, pattern3, pattern4): m = re.match(s, pat) if m: m.group() break -- mvh Björn
![](https://secure.gravatar.com/avatar/5ce43469c0402a7db8d0cf86fa49da5a.jpg?s=120&d=mm&r=g)
BJörn Lindqvist wrote:
2009/7/22 Steven D'Aprano <steve@pearwood.info>:
The rationale is to make the following idiom easier:
m = re.match(s, pattern1) if not m: m = re.match(s, pattern2) if not m: m = re.match(s, pattern3) if not m: m = re.match(s, pattern4) if m: m.group()
which will become:
m = re.multimatch(s, pattern1, pattern2, pattern3, pattern4) if m: m.group()
Is there any support or objections to this proposal? Any comments?
I don't like it very much because it would only work for uncompiled patterns. All functions in re has a RegexObject counterpart, but multisearch() and multimatch() obviously would not. For the quoted example I'd usually try to create one regex that matches all four patterns, or use a loop:
for pat in (pattern1, pattern2, pattern3, pattern4): m = re.match(s, pat) if m: m.group() break
re.match and re.search will accept either a string or a compiled pattern as the first argument. Never used it myself, but...
![](https://secure.gravatar.com/avatar/5615a372d9866f203a22b2c437527bbb.jpg?s=120&d=mm&r=g)
On Thu, 23 Jul 2009 04:29:31 am BJörn Lindqvist wrote:
I don't like it very much because it would only work for uncompiled patterns. All functions in re has a RegexObject counterpart, but multisearch() and multimatch() obviously would not.
That's incorrect -- they accept pre-compiled regexes as well as strings.
pat = re.compile('a.c') multimatch("abcd", ['z.*z', pat]) <_sre.SRE_Match object at 0xb7cd8090>
For the quoted example I'd usually try to create one regex that matches all four patterns, or use a loop:
for pat in (pattern1, pattern2, pattern3, pattern4): m = re.match(s, pat) if m: m.group() break
Apart from being in a function, my proposal (which you claim to dislike) is virtually identical to that code (which you say you use). -- Steven D'Aprano
![](https://secure.gravatar.com/avatar/c49652c88a43a35bbf0095abfdae3515.jpg?s=120&d=mm&r=g)
On Wed, Jul 22, 2009 at 1:00 AM, Steven D'Aprano<steve@pearwood.info> wrote:
Following the thread "Experiment: Adding "re" to string objects.", I would like to propose the addition of two convenience functions to the re module:
def multimatch(s, *patterns): """Do a re.match on s using each pattern in patterns, returning the first one to succeed, or None if they all fail.""" for pattern in patterns: m = re.match(pattern, s) if m: return m
def multisearch(s, *patterns): """Do a re.search on s using each pattern in patterns, returning the first one to succeed, or None if they all fail.""" for pattern in patterns: m = re.search(pattern, s) if m: return m
[...]
Steven, could you show some examples of real(ish)-world use-cases for one or both of these functions? Preferably including the code that might directly follow a multimatch or multisearch call. It's probably because I haven't used regexes widely enough, but in all the potential examples I can come up with, either (1) the regexes are similar enough that they can be refactored into a single regex (e.g., just concatenated with '|'), or (2) they're distinct enough that each regex needs its own handing, so that the multimatch/multisearch would need to be followed by a multiway 'if/elif/else' anyway; in this case, it seems that little is gained. -- Mark
![](https://secure.gravatar.com/avatar/5615a372d9866f203a22b2c437527bbb.jpg?s=120&d=mm&r=g)
On Thu, 23 Jul 2009 07:38:11 am Mark Dickinson wrote:
Steven, could you show some examples of real(ish)-world use-cases for one or both of these functions? Preferably including the code that might directly follow a multimatch or multisearch call.
I'm afraid that I don't use regexes anywhere near enough to champion this proposal in the face of serious opposition, or even skepticism. If this isn't a simple enough "no-brainer", then I'm going to have to pass the baton onto somebody else (assuming anyone actually likes this idea). This idea came about from the thread started by Sean Reifschneider, proposing adding regexes to strings. I thought (and Sean seemed to agree) that these convenience functions would solve his primary use-case. So this proposal isn't scratching an itch I have.
It's probably because I haven't used regexes widely enough, but in all the potential examples I can come up with, either
(1) the regexes are similar enough that they can be refactored into a single regex (e.g., just concatenated with '|'), or
(2) they're distinct enough that each regex needs its own handing, so that the multimatch/multisearch would need to be followed by a multiway 'if/elif/else' anyway; in this case, it seems that little is gained.
These are both reasonable approaches. This proposal isn't supposed to solve every multiple-regex-handling problem. So far support for this has been luke-warm. If anyone really likes this idea, please speak up, otherwise I'll let it drop. -- Steven D'Aprano
![](https://secure.gravatar.com/avatar/5ce43469c0402a7db8d0cf86fa49da5a.jpg?s=120&d=mm&r=g)
Steven D'Aprano wrote:
On Thu, 23 Jul 2009 07:38:11 am Mark Dickinson wrote:
Steven, could you show some examples of real(ish)-world use-cases for one or both of these functions? Preferably including the code that might directly follow a multimatch or multisearch call.
I'm afraid that I don't use regexes anywhere near enough to champion this proposal in the face of serious opposition, or even skepticism. If this isn't a simple enough "no-brainer", then I'm going to have to pass the baton onto somebody else (assuming anyone actually likes this idea).
This idea came about from the thread started by Sean Reifschneider, proposing adding regexes to strings. I thought (and Sean seemed to agree) that these convenience functions would solve his primary use-case. So this proposal isn't scratching an itch I have.
It's probably because I haven't used regexes widely enough, but in all the potential examples I can come up with, either
(1) the regexes are similar enough that they can be refactored into a single regex (e.g., just concatenated with '|'), or
(2) they're distinct enough that each regex needs its own handing, so that the multimatch/multisearch would need to be followed by a multiway 'if/elif/else' anyway; in this case, it seems that little is gained.
These are both reasonable approaches. This proposal isn't supposed to solve every multiple-regex-handling problem.
So far support for this has been luke-warm. If anyone really likes this idea, please speak up, otherwise I'll let it drop.
If you want to try multiple regexes until one matches then the approach with re.match accepting a tuple of patterns would, it seems to me, to be the one that requires the smallest change and has the greatest precedence (like str.startwith).
![](https://secure.gravatar.com/avatar/6217deb9ff31cb986d5437d76d530c16.jpg?s=120&d=mm&r=g)
22-07-2009, 02:00 Steven D'Aprano <steve@pearwood.info>:
Following the thread "Experiment: Adding "re" to string objects.", I would like to propose the addition of two convenience functions to the re module:
def multimatch(s, *patterns): """Do a re.match on s using each pattern in patterns, returning the first one to succeed, or None if they all fail.""" for pattern in patterns: m = re.match(pattern, s) if m: return m
def multisearch(s, *patterns): """Do a re.search on s using each pattern in patterns, returning the first one to succeed, or None if they all fail.""" for pattern in patterns: m = re.search(pattern, s) if m: return m
The rationale is to make the following idiom easier:
m = re.match(s, pattern1) if not m: m = re.match(s, pattern2) if not m: m = re.match(s, pattern3) if not m: m = re.match(s, pattern4) if m: m.group()
which will become:
m = re.multimatch(s, pattern1, pattern2, pattern3, pattern4) if m: m.group()
Is there any support or objections to this proposal? Any comments?
It sounds nice. But why not to use simply: m = re.match(s, '|'.join(pattern1, pattern2, pattern3, pattern4)) And if we want the feature anyway, I'd prefer MRAB's:
m = re.match((pattern1, pattern2, pattern3, pattern4), s) if m: print m.group()
This format is already used by some string methods, eg str.startswith().
*** But if we are talking about convenience functions in re module, it'd be IMHO very nice to have such functions: def matchgrouping(pattern, string, flags=0, default=None): """Do a re.match on string using pattern, returning dict containing groups which could be got by index or by name.""" match = re.match(pattern, string, flags) groups = collections.DefaultDict() groups.update(enumerate(match.groups())) groups.update(match.groupdict()) return result Plus the analogous function for searching). Plus 2 analogous methods of RegexObject instances). * Then e.g. -- instead of: m = re.search(pattern, s) if m: first_group = m.group(1) surname = m.group('surname') else: first_group = None surname = None -- we could write simply: m = re.matchgrouping(pattern, s) first_group = m[1] surname = m['surname'] * And e.g. -- instead of: withip = log_re.match(logline) if withip and withip.group('ip_addr'): iplog.append(logline) -- we could write simply: if log_re.matchgrouping(logline)['ip_addr']: iplog.append(logline) What do you think about it? *j -- Jan Kaliszewski (zuo) <zuo@chopin.edu.pl>
![](https://secure.gravatar.com/avatar/6217deb9ff31cb986d5437d76d530c16.jpg?s=120&d=mm&r=g)
Jan Kaliszewski <zuo@chopin.edu.pl> wrote:
It sounds nice. But why not to use simply:
m = re.match(s, '|'.join(pattern1, pattern2, pattern3, pattern4))
Sorry, I ment of course: m = re.match('|'.join(pattern1, pattern2, pattern3, pattern4), s) ***
"""Do a re.match on string using pattern, returning dict containing groups which could be got by index or by name."""
I ment: "...returning collections.DefaultDict..." (as you can see in the code following). Regards, *j -- Jan Kaliszewski <zuo@chopin.edu.pl>
participants (14)
-
Aahz
-
BJörn Lindqvist
-
Gabriel Genellina
-
Georg Brandl
-
Gerald Britton
-
Greg Ewing
-
Jan Kaliszewski
-
Mark Dickinson
-
MRAB
-
Paul Moore
-
Robert Kern
-
Ron Adam
-
Steven D'Aprano
-
Talin