Mailman 3 Proposed convenience functions for re module - Python-ideas

newer
Re: [Python-ideas] universal set...

Proposed convenience functions for re module

older
Re: [Python-ideas] universal set...

Steven D'Aprano

July 22, 2009

midnight

Following the thread "Experiment: Adding "re" to string objects.", I would like to propose the addition of two convenience functions to the re module: def multimatch(s, *patterns): """Do a re.match on s using each pattern in patterns, returning the first one to succeed, or None if they all fail.""" for pattern in patterns: m = re.match(pattern, s) if m: return m def multisearch(s, *patterns): """Do a re.search on s using each pattern in patterns, returning the first one to succeed, or None if they all fail.""" for pattern in patterns: m = re.search(pattern, s) if m: return m The rationale is to make the following idiom easier: m = re.match(s, pattern1) if not m: m = re.match(s, pattern2) if not m: m = re.match(s, pattern3) if not m: m = re.match(s, pattern4) if m: m.group() which will become: m = re.multimatch(s, pattern1, pattern2, pattern3, pattern4) if m: m.group() Is there any support or objections to this proposal? Any comments? -- Steven D'Aprano

Show replies by date

MRAB

July 2009

12:50 a.m.

Steven D'Aprano wrote:

...

Following the thread "Experiment: Adding "re" to string objects.", I would like to propose the addition of two convenience functions to the re module:

def multimatch(s, *patterns): """Do a re.match on s using each pattern in patterns, returning the first one to succeed, or None if they all fail.""" for pattern in patterns: m = re.match(pattern, s) if m: return m

def multisearch(s, *patterns): """Do a re.search on s using each pattern in patterns, returning the first one to succeed, or None if they all fail.""" for pattern in patterns: m = re.search(pattern, s) if m: return m

The rationale is to make the following idiom easier:

m = re.match(s, pattern1) if not m: m = re.match(s, pattern2) if not m: m = re.match(s, pattern3) if not m: m = re.match(s, pattern4) if m: m.group()

which will become:

m = re.multimatch(s, pattern1, pattern2, pattern3, pattern4) if m: m.group()

Is there any support or objections to this proposal? Any comments?

Extend the current re.match and re.search to accept a tuple of patterns: m = re.match((pattern1, pattern2, pattern3, pattern4), s) if m: print m.group() This format is already used by some string methods, eg str.startswith().

Gerald Britton

1:21 a.m.

Why not: from itertools import starmap for x in starmap(re.match, *patterns): if x: break On Tue, Jul 21, 2009 at 8:50 PM, MRAB<python@mrabarnett.plus.com> wrote:

...

Steven D'Aprano wrote:

...
Following the thread "Experiment: Adding "re" to string objects.", I would like to propose the addition of two convenience functions to the re module:

def multimatch(s, *patterns): """Do a re.match on s using each pattern in patterns, returning the first one to succeed, or None if they all fail.""" for pattern in patterns: m = re.match(pattern, s) if m: return m

def multisearch(s, *patterns): """Do a re.search on s using each pattern in patterns, returning the first one to succeed, or None if they all fail.""" for pattern in patterns: m = re.search(pattern, s) if m: return m

The rationale is to make the following idiom easier:

m = re.match(s, pattern1) if not m: m = re.match(s, pattern2) if not m: m = re.match(s, pattern3) if not m: m = re.match(s, pattern4) if m: m.group()

which will become:

m = re.multimatch(s, pattern1, pattern2, pattern3, pattern4) if m: m.group()

Is there any support or objections to this proposal? Any comments?

Extend the current re.match and re.search to accept a tuple of patterns:

m = re.match((pattern1, pattern2, pattern3, pattern4), s) if m: print m.group()

This format is already used by some string methods, eg str.startswith(). _______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas

-- Gerald Britton

Greg Ewing

6:22 a.m.

MRAB wrote:

...

Extend the current re.match and re.search to accept a tuple of patterns:

m = re.match((pattern0, pattern1, pattern2, pattern3), s) if m: print m.group()

Also give the match object an 'index' attribute indicating which pattern matched, so you can do m = re.match((pattern1, pattern2, pattern3, pattern4), s) if m: if m.index == 0: # pattern0 matched elif m.index == 1: # pattern1 matched # etc. -- Greg

Greg Ewing

6:17 a.m.

Steven D'Aprano wrote:

...

def multimatch(s, *patterns): """Do a re.match on s using each pattern in patterns, returning the first one to succeed, or None if they all fail.""" for pattern in patterns: m = re.match(pattern, s) if m: return m

How are you supposed to tell which one matched? -- Greg

Gabriel Genellina

7:14 a.m.

En Wed, 22 Jul 2009 03:17:46 -0300, Greg Ewing <greg.ewing-F+z8Qja7x9Xokq/tPzqvJg@public.gmane.org> escribió:

...

Steven D'Aprano wrote:

...
def multimatch(s, *patterns): """Do a re.match on s using each pattern in patterns, returning the first one to succeed, or None if they all fail.""" for pattern in patterns: m = re.match(pattern, s) if m: return m

How are you supposed to tell which one matched?

m.re contains the matching expression -- Gabriel Genellina

Ron Adam

8:02 a.m.

Steven D'Aprano wrote:

...

Following the thread "Experiment: Adding "re" to string objects.", I would like to propose the addition of two convenience functions to the re module:

def multimatch(s, *patterns): """Do a re.match on s using each pattern in patterns, returning the first one to succeed, or None if they all fail.""" for pattern in patterns: m = re.match(pattern, s) if m: return m

def multisearch(s, *patterns): """Do a re.search on s using each pattern in patterns, returning the first one to succeed, or None if they all fail.""" for pattern in patterns: m = re.search(pattern, s) if m: return m

The rationale is to make the following idiom easier:

m = re.match(s, pattern1) if not m: m = re.match(s, pattern2) if not m: m = re.match(s, pattern3) if not m: m = re.match(s, pattern4) if m: m.group()

which will become:

m = re.multimatch(s, pattern1, pattern2, pattern3, pattern4) if m: m.group()

Is there any support or objections to this proposal? Any comments?

One of the needs I've run across is to enable the program user (possibly a non-programmer) to do logical searches on data. It would be nice if the search patterns specified by the program user could be used directly by the functions. Search functions of this type would take patterns that are more like what you would use for google or yahoo searches instead of the more complex language re requires. Ron

Steven D'Aprano

11:58 a.m.

On Wed, 22 Jul 2009 06:02:54 pm Ron Adam wrote:

...

One of the needs I've run across is to enable the program user (possibly a non-programmer) to do logical searches on data.

It would be nice if the search patterns specified by the program user could be used directly by the functions. Search functions of this type would take patterns that are more like what you would use for google or yahoo searches instead of the more complex language re requires.

I'm not sure if I understand this correctly. Perhaps you could give an example or two? Also, please don't overload my simple little proposal with a multitude of new functionality. My proposal is only meant to be a lightweight convenience function. Additional functionality probably belongs as a different function, maybe even a different module. -- Steven D'Aprano

Ron Adam

9:14 a.m.

Steven D'Aprano wrote:

...

On Wed, 22 Jul 2009 06:02:54 pm Ron Adam wrote:

...
One of the needs I've run across is to enable the program user (possibly a non-programmer) to do logical searches on data.

It would be nice if the search patterns specified by the program user could be used directly by the functions. Search functions of this type would take patterns that are more like what you would use for google or yahoo searches instead of the more complex language re requires.

I'm not sure if I understand this correctly. Perhaps you could give an example or two?

Also, please don't overload my simple little proposal with a multitude of new functionality. My proposal is only meant to be a lightweight convenience function. Additional functionality probably belongs as a different function, maybe even a different module.

Yes, it would be a different module and not added directly to the re module. While you are thinking of simplifying re for programmers, I'm thinking of simplified searches for users. A different target and purpose. I think your functions would make this idea easier to do. It would be nice if we could do simple logical searches where. [word1 word2] ;get results with either word1 or word2 [+word1 +word2] ;get results with both word1 and word2 [word1 -word2] ;get results with word1 and not with word2 ["word one" "word two"] ;use quotes to search for phrases And possibly use '*' and '?' as simple wild cards but keep it easy to use and simple. More complex searches should use the re module directly. This would act as a filter for lists and would be suitable for adding a *simple* user search capability to many scrips and applications. An example would be to enhance pydocs search of the summery lines. Currently if you type "modules key", if the key is multiple words, it only searches on the first word. You can not do searches on multiple words or exclude results with certain words. While we could allow regular expression input to work, for many applications it is overkill and it is too complex for many users. For example I would not like to try and teach my parents all the subtleties of regular expressions when they are struggling to understand a lot more basic things. They don't want to learn how to program computers, they just want to get a recipe that has [+chicken +"tomato sauce" -onions]. Ron

Aahz

12:44 p.m.

On Thu, Jul 23, 2009, Ron Adam wrote:

...

While we could allow regular expression input to work, for many applications it is overkill and it is too complex for many users. For example I would not like to try and teach my parents all the subtleties of regular expressions when they are struggling to understand a lot more basic things. They don't want to learn how to program computers, they just want to get a recipe that has [+chicken +"tomato sauce" -onions].

This sounds like a *great* addition to PyPI.... ;-) (That is, something like this is unlikely to make it into Python unless there's code that has seen uptake in the community.) -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "The volume of a pizza of thickness 'a' and radius 'z' is given by pi*z*z*a"

Jan Kaliszewski

9:25 p.m.

Ron Adam <rrr@ronadam.com>"

...

One of the needs I've run across is to enable the program user (possibly a non-programmer) to do logical searches on data.

It would be nice if the search patterns specified by the program user could be used directly by the functions. Search functions of this type would take patterns that are more like what you would use for google or yahoo searches instead of the more complex language re requires.

It's a proposition of another string matching module ("browsermatch"? :-)) +0.6 *j

Talin

4:23 p.m.

Steven D'Aprano wrote:

...

Following the thread "Experiment: Adding "re" to string objects.", I would like to propose the addition of two convenience functions to the re module:

def multimatch(s, *patterns): """Do a re.match on s using each pattern in patterns, returning the first one to succeed, or None if they all fail.""" for pattern in patterns: m = re.match(pattern, s) if m: return m

There's a cute trick that you can use to do this that is much more efficient than testing each regex expression individually: combined_pattern = "|".join("(%s)" % p for p in patterns) combined_re = re.compile(combined_pattern) m = combined_re.match(string) return m.lastindex Basically, it combines all of the patterns into a single large regex, where each pattern is converted into a capturing group. It then returns match.lastindex, which is the index of the capturing group that matched. This is very efficient because now all of the patterns are combined into a single NFA which can prune possibilities very quickly. This works for up to 99 patterns, which is the limit on the number of capturing groups that a regex can have. I use this technique in my Python-based lexer, Reflex: http://pypi.python.org/pypi/reflex/0.1 Now, if we are talking about convenience functions, what I would really like to see is a class that wraps a string that allows matches to be done incrementally, where each successful match consumes the head of the string, leaving the remainder of the string for the next match. This can be done very efficiently since the regex functions all take a start-index parameter. Essentially, the wrapper class would update the start index each time a successful match was performed. So something like: stream = MatchStream(string) while 1: m = stream.match(combined_re) # m is a match object # Do something with m Or even an iterator over matches (this assumes that you want to use the same regex each time, which may not be the case for a parser): stream = MatchStream(string) for m in stream.match(combined_re): # m is a match object # Do something with m -- Talin

MRAB

4:42 p.m.

Talin wrote:

...

Steven D'Aprano wrote:

...
Following the thread "Experiment: Adding "re" to string objects.", I would like to propose the addition of two convenience functions to the re module:

def multimatch(s, *patterns): """Do a re.match on s using each pattern in patterns, returning the first one to succeed, or None if they all fail.""" for pattern in patterns: m = re.match(pattern, s) if m: return m

There's a cute trick that you can use to do this that is much more efficient than testing each regex expression individually:

combined_pattern = "|".join("(%s)" % p for p in patterns) combined_re = re.compile(combined_pattern)

m = combined_re.match(string) return m.lastindex

Basically, it combines all of the patterns into a single large regex, where each pattern is converted into a capturing group. It then returns match.lastindex, which is the index of the capturing group that matched. This is very efficient because now all of the patterns are combined into a single NFA which can prune possibilities very quickly.

This works for up to 99 patterns, which is the limit on the number of capturing groups that a regex can have.

[snip] It won't work properly if the patterns themselves contain capture groups.

Georg Brandl

7:51 p.m.

Talin schrieb:

...

Now, if we are talking about convenience functions, what I would really like to see is a class that wraps a string that allows matches to be done incrementally, where each successful match consumes the head of the string, leaving the remainder of the string for the next match.

This can be done very efficiently since the regex functions all take a start-index parameter. Essentially, the wrapper class would update the start index each time a successful match was performed.

So something like:

stream = MatchStream(string) while 1: m = stream.match(combined_re) # m is a match object # Do something with m

Or even an iterator over matches (this assumes that you want to use the same regex each time, which may not be the case for a parser):

stream = MatchStream(string) for m in stream.match(combined_re): # m is a match object # Do something with m

You might be interested in the undocumented re.Scanner class :) Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out.

Greg Ewing

12:49 a.m.

Georg Brandl wrote:

...

You might be interested in the undocumented re.Scanner class :)

It looks like it could be interesting, but with no documentation, how are we supposed to tell? Is there *any* information about it available anywhere? -- Greg

Robert Kern

1:04 a.m.

On 2009-07-22 19:49, Greg Ewing wrote:

...

Georg Brandl wrote:

...
You might be interested in the undocumented re.Scanner class :)

It looks like it could be interesting, but with no documentation, how are we supposed to tell?

Is there *any* information about it available anywhere?

http://mail.python.org/pipermail/python-dev/2003-April/035075.html -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

Paul Moore

4:50 p.m.

2009/7/23 Robert Kern <robert.kern@gmail.com>:

...

On 2009-07-22 19:49, Greg Ewing wrote:

...
Georg Brandl wrote:

...
You might be interested in the undocumented re.Scanner class :)

It looks like it could be interesting, but with no documentation, how are we supposed to tell?

Is there *any* information about it available anywhere?

http://mail.python.org/pipermail/python-dev/2003-April/035075.html

Question: Is there any reason (other than lack of time) why it's undocumented? I'd be willing to write some documentation, but only if it would stand a chance of being accepted - this isn't an itch of mine, so I don't want to spend ages arguing over whether the class should be documented. The source code says "experimental stuff (see python-dev discussions for details)". I've not searched the python-dev archives yet, but it seems to me that it'll never be anything other than experimental if people don't know it's there and try it out... Paul.

Robert Kern

5:06 p.m.

On 2009-07-23 11:50, Paul Moore wrote:

...

2009/7/23 Robert Kern<robert.kern@gmail.com>:

...
On 2009-07-22 19:49, Greg Ewing wrote:

...
Georg Brandl wrote:

...
You might be interested in the undocumented re.Scanner class :) It looks like it could be interesting, but with no documentation, how are we supposed to tell?

Is there *any* information about it available anywhere? http://mail.python.org/pipermail/python-dev/2003-April/035075.html

Question: Is there any reason (other than lack of time) why it's undocumented?

http://mail.python.org/pipermail/python-dev/2003-April/035070.html -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

Gerald Britton

9:48 p.m.

So, in 2003 it was discussed but considered not ready for prime time. If six years is not enough to get it ready, I can't imagine what would be. On Thu, Jul 23, 2009 at 1:06 PM, Robert Kern<robert.kern@gmail.com> wrote:

...

On 2009-07-23 11:50, Paul Moore wrote:

...
2009/7/23 Robert Kern<robert.kern@gmail.com>:

...
On 2009-07-22 19:49, Greg Ewing wrote:

...
Georg Brandl wrote:

...
You might be interested in the undocumented re.Scanner class :)

It looks like it could be interesting, but with no documentation, how are we supposed to tell?

Is there *any* information about it available anywhere?

http://mail.python.org/pipermail/python-dev/2003-April/035075.html

Question: Is there any reason (other than lack of time) why it's undocumented?

http://mail.python.org/pipermail/python-dev/2003-April/035070.html

-- Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

_______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas

-- Gerald Britton

Robert Kern

10:47 p.m.

On 2009-07-23 16:48, Gerald Britton wrote:

...

So, in 2003 it was discussed but considered not ready for prime time. If six years is not enough to get it ready, I can't imagine what would be.

I'm pretty sure that no one has worked on it. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

MRAB

1:22 a.m.

Robert Kern wrote:

...

On 2009-07-23 16:48, Gerald Britton wrote:

...
So, in 2003 it was discussed but considered not ready for prime time. If six years is not enough to get it ready, I can't imagine what would be.

I'm pretty sure that no one has worked on it.

I worked on it in issue #2636 (turning unnamed capture groups into non-capture groups and forbidding group references and named groups) and also added a generator method.

BJörn Lindqvist

6:29 p.m.

2009/7/22 Steven D'Aprano <steve@pearwood.info>:

...

The rationale is to make the following idiom easier:

m = re.match(s, pattern1) if not m: m = re.match(s, pattern2) if not m: m = re.match(s, pattern3) if not m: m = re.match(s, pattern4) if m: m.group()

which will become:

m = re.multimatch(s, pattern1, pattern2, pattern3, pattern4) if m: m.group()

Is there any support or objections to this proposal? Any comments?

I don't like it very much because it would only work for uncompiled patterns. All functions in re has a RegexObject counterpart, but multisearch() and multimatch() obviously would not. For the quoted example I'd usually try to create one regex that matches all four patterns, or use a loop: for pat in (pattern1, pattern2, pattern3, pattern4): m = re.match(s, pat) if m: m.group() break -- mvh Björn

MRAB

6:44 p.m.

BJörn Lindqvist wrote:

...

2009/7/22 Steven D'Aprano <steve@pearwood.info>:

...
The rationale is to make the following idiom easier:

m = re.match(s, pattern1) if not m: m = re.match(s, pattern2) if not m: m = re.match(s, pattern3) if not m: m = re.match(s, pattern4) if m: m.group()

which will become:

m = re.multimatch(s, pattern1, pattern2, pattern3, pattern4) if m: m.group()

Is there any support or objections to this proposal? Any comments?

I don't like it very much because it would only work for uncompiled patterns. All functions in re has a RegexObject counterpart, but multisearch() and multimatch() obviously would not. For the quoted example I'd usually try to create one regex that matches all four patterns, or use a loop:

for pat in (pattern1, pattern2, pattern3, pattern4): m = re.match(s, pat) if m: m.group() break

re.match and re.search will accept either a string or a compiled pattern as the first argument. Never used it myself, but...

Steven D'Aprano

11 p.m.

On Thu, 23 Jul 2009 04:29:31 am BJörn Lindqvist wrote:

...

I don't like it very much because it would only work for uncompiled patterns. All functions in re has a RegexObject counterpart, but multisearch() and multimatch() obviously would not.

That's incorrect -- they accept pre-compiled regexes as well as strings.

...

...
...
pat = re.compile('a.c') multimatch("abcd", ['z.*z', pat]) <_sre.SRE_Match object at 0xb7cd8090>

...

For the quoted example I'd usually try to create one regex that matches all four patterns, or use a loop:

for pat in (pattern1, pattern2, pattern3, pattern4): m = re.match(s, pat) if m: m.group() break

Apart from being in a function, my proposal (which you claim to dislike) is virtually identical to that code (which you say you use). -- Steven D'Aprano

Mark Dickinson

9:38 p.m.

On Wed, Jul 22, 2009 at 1:00 AM, Steven D'Aprano<steve@pearwood.info> wrote:

...

Following the thread "Experiment: Adding "re" to string objects.", I would like to propose the addition of two convenience functions to the re module:

def multimatch(s, *patterns): """Do a re.match on s using each pattern in patterns, returning the first one to succeed, or None if they all fail.""" for pattern in patterns: m = re.match(pattern, s) if m: return m

def multisearch(s, *patterns): """Do a re.search on s using each pattern in patterns, returning the first one to succeed, or None if they all fail.""" for pattern in patterns: m = re.search(pattern, s) if m: return m

[...]

Steven, could you show some examples of real(ish)-world use-cases for one or both of these functions? Preferably including the code that might directly follow a multimatch or multisearch call. It's probably because I haven't used regexes widely enough, but in all the potential examples I can come up with, either (1) the regexes are similar enough that they can be refactored into a single regex (e.g., just concatenated with '|'), or (2) they're distinct enough that each regex needs its own handing, so that the multimatch/multisearch would need to be followed by a multiway 'if/elif/else' anyway; in this case, it seems that little is gained. -- Mark

Steven D'Aprano

11:31 p.m.

On Thu, 23 Jul 2009 07:38:11 am Mark Dickinson wrote:

...

Steven, could you show some examples of real(ish)-world use-cases for one or both of these functions? Preferably including the code that might directly follow a multimatch or multisearch call.

I'm afraid that I don't use regexes anywhere near enough to champion this proposal in the face of serious opposition, or even skepticism. If this isn't a simple enough "no-brainer", then I'm going to have to pass the baton onto somebody else (assuming anyone actually likes this idea). This idea came about from the thread started by Sean Reifschneider, proposing adding regexes to strings. I thought (and Sean seemed to agree) that these convenience functions would solve his primary use-case. So this proposal isn't scratching an itch I have.

...

It's probably because I haven't used regexes widely enough, but in all the potential examples I can come up with, either

(1) the regexes are similar enough that they can be refactored into a single regex (e.g., just concatenated with '|'), or

(2) they're distinct enough that each regex needs its own handing, so that the multimatch/multisearch would need to be followed by a multiway 'if/elif/else' anyway; in this case, it seems that little is gained.

These are both reasonable approaches. This proposal isn't supposed to solve every multiple-regex-handling problem. So far support for this has been luke-warm. If anyone really likes this idea, please speak up, otherwise I'll let it drop. -- Steven D'Aprano

MRAB

12:10 a.m.

Steven D'Aprano wrote:

...

On Thu, 23 Jul 2009 07:38:11 am Mark Dickinson wrote:

...
Steven, could you show some examples of real(ish)-world use-cases for one or both of these functions? Preferably including the code that might directly follow a multimatch or multisearch call.

I'm afraid that I don't use regexes anywhere near enough to champion this proposal in the face of serious opposition, or even skepticism. If this isn't a simple enough "no-brainer", then I'm going to have to pass the baton onto somebody else (assuming anyone actually likes this idea).

This idea came about from the thread started by Sean Reifschneider, proposing adding regexes to strings. I thought (and Sean seemed to agree) that these convenience functions would solve his primary use-case. So this proposal isn't scratching an itch I have.

...
It's probably because I haven't used regexes widely enough, but in all the potential examples I can come up with, either

(1) the regexes are similar enough that they can be refactored into a single regex (e.g., just concatenated with '|'), or

(2) they're distinct enough that each regex needs its own handing, so that the multimatch/multisearch would need to be followed by a multiway 'if/elif/else' anyway; in this case, it seems that little is gained.

These are both reasonable approaches. This proposal isn't supposed to solve every multiple-regex-handling problem.

So far support for this has been luke-warm. If anyone really likes this idea, please speak up, otherwise I'll let it drop.

If you want to try multiple regexes until one matches then the approach with re.match accepting a tuple of patterns would, it seems to me, to be the one that requires the smallest change and has the greatest precedence (like str.startwith).

Jan Kaliszewski

10:03 p.m.

22-07-2009, 02:00 Steven D'Aprano <steve@pearwood.info>:

...

Following the thread "Experiment: Adding "re" to string objects.", I would like to propose the addition of two convenience functions to the re module:

def multimatch(s, *patterns): """Do a re.match on s using each pattern in patterns, returning the first one to succeed, or None if they all fail.""" for pattern in patterns: m = re.match(pattern, s) if m: return m

def multisearch(s, *patterns): """Do a re.search on s using each pattern in patterns, returning the first one to succeed, or None if they all fail.""" for pattern in patterns: m = re.search(pattern, s) if m: return m

The rationale is to make the following idiom easier:

m = re.match(s, pattern1) if not m: m = re.match(s, pattern2) if not m: m = re.match(s, pattern3) if not m: m = re.match(s, pattern4) if m: m.group()

which will become:

m = re.multimatch(s, pattern1, pattern2, pattern3, pattern4) if m: m.group()

Is there any support or objections to this proposal? Any comments?

It sounds nice. But why not to use simply: m = re.match(s, '|'.join(pattern1, pattern2, pattern3, pattern4)) And if we want the feature anyway, I'd prefer MRAB's:

...

m = re.match((pattern1, pattern2, pattern3, pattern4), s) if m: print m.group()

This format is already used by some string methods, eg str.startswith().

*** But if we are talking about convenience functions in re module, it'd be IMHO very nice to have such functions: def matchgrouping(pattern, string, flags=0, default=None): """Do a re.match on string using pattern, returning dict containing groups which could be got by index or by name.""" match = re.match(pattern, string, flags) groups = collections.DefaultDict() groups.update(enumerate(match.groups())) groups.update(match.groupdict()) return result Plus the analogous function for searching). Plus 2 analogous methods of RegexObject instances). * Then e.g. -- instead of: m = re.search(pattern, s) if m: first_group = m.group(1) surname = m.group('surname') else: first_group = None surname = None -- we could write simply: m = re.matchgrouping(pattern, s) first_group = m[1] surname = m['surname'] * And e.g. -- instead of: withip = log_re.match(logline) if withip and withip.group('ip_addr'): iplog.append(logline) -- we could write simply: if log_re.matchgrouping(logline)['ip_addr']: iplog.append(logline) What do you think about it? *j -- Jan Kaliszewski (zuo) <zuo@chopin.edu.pl>

Jan Kaliszewski

10:14 p.m.

Jan Kaliszewski <zuo@chopin.edu.pl> wrote:

...

It sounds nice. But why not to use simply:

m = re.match(s, '|'.join(pattern1, pattern2, pattern3, pattern4))

Sorry, I ment of course: m = re.match('|'.join(pattern1, pattern2, pattern3, pattern4), s) ***

...

"""Do a re.match on string using pattern, returning dict containing groups which could be got by index or by name."""

I ment: "...returning collections.DefaultDict..." (as you can see in the code following). Regards, *j -- Jan Kaliszewski <zuo@chopin.edu.pl>

5686

Age (days ago)

5688

Last active (days ago)

List overview

Download

28 comments

14 participants

participants (14)

Aahz
BJörn Lindqvist
Gabriel Genellina
Georg Brandl
Gerald Britton
Greg Ewing
Jan Kaliszewski
Mark Dickinson
MRAB
Paul Moore
Robert Kern
Ron Adam
Steven D'Aprano
Talin

Proposed convenience functions for re module

Ron Adam

Ron Adam

Jan Kaliszewski

Talin

Jan Kaliszewski

Jan Kaliszewski

tags

participants (14)