Best way to extract from regex in if statement

George Sakkis george.sakkis at gmail.com
Fri Apr 3 22:24:19 EDT 2009


On Apr 3, 9:56 pm, Jon Clements <jon... at googlemail.com> wrote:
> On 4 Apr, 02:14, bwgoudey <bwgou... at gmail.com> wrote:
>
>
>
> > I have a lot of if/elif cases based on regular expressions that I'm using to
> > filter stdin and using print to stdout. Often I want to print something
> > matched within the regular expression and the moment I've got a lot of cases
> > like:
>
> > ...
> > elif re.match("^DATASET:\s*(.+) ", line):
> >         m=re.match("^DATASET:\s*(.+) ", line)
> >         print m.group(1))
>
> > which is ugly because of the duplication but I can't think of a nicer of way
> > of doing this that will allow for a lot of these sorts of cases. Any
> > suggestions?
> > --
> > View this message in context:http://www.nabble.com/Best-way-to-extract-from-regex-in-if-statement-...
> > Sent from the Python - python-list mailing list archive at Nabble.com.
>
> How about something like:
>
> your_regexes = [
>     re.compile('rx1'),
>     re.compile('rx2'),
>     # etc....
> ]
>
> for line in lines:
>     for rx in your_regexes:
>         m = rx.match(line)
>         if m:
>             print m.group(1)
>             break # if only the first matching regex is required,
> otherwise leave black for all
>
> Untested, but seems to make sense

Or in case you want to handle each regexp differently, you can
construct a dict {regexp : callback_function} that picks the right
action depending on which regexp matched. As for how to populate the
dict, if most methods are short expressions, lambda comes in pretty
handly, e.g.

{
  rx1: lambda match: match.group(1),
  rx2: lambda match: sum(map(int, match.groups())),
 ...
}

If not, you can combine the handler definition with the mapping update
by using a simple decorator factory such as the following (untested):

def rxhandler(rx, mapping):
  rx = re.compile(rx)
  def deco(func):
      mapping[rx] = func
      return func
   return deco

d = {}

@rxhandler("^DATASET:\s*(.+) ", d)
def handle_dataset(match):
   ...

@rxhandler("^AUTHORS:\s*(.+) ", d)
def handle_authors(match):
   ...

HTH,
George



More information about the Python-list mailing list