regex problem

bearophileHUGS at lycos.com bearophileHUGS at lycos.com
Wed Nov 22 15:46:22 EST 2006


> > line is am trying to match is
> > 1959400|Q2BYK3|Q2BYK3_9GAMM Hypothetical outer membra    29.9    0.00011   1
> >
> > regex i have written is
> > re.compile
> > (r'(\d+?)\|((P|O|Q)\w{5})\|\w{3,6}\_\w{3,5}\s+?.{25}\s{3}(\d+?\.\d)\s+?(\d\.\d+?)')
> >
> > I am trying to extract 0.0011 value from the above line.
> > why doesnt it match the group(4) item of the match ?
> >
> > any idea whats wrong  with it ?

I am not expert about REs yet, but I suggest you to use the re.VERBOSE
and split your RE in parts, like this:

example = re.compile(r"""^  \s*         # must start at the beginning +
optional whitespaces
                         ( [\[\(] )     # Group 1: opening bracket
                         \s*            # optional whitespaces
                         ( [-+]? \d+ )  # Group 2: first number
                         \s* , \s*      # optional space + comma +
optional whitespaces
                         ( [-+]? \d+ )  # Group 3: second number
                         \s*            # optional whitespaces
                         ( [\)\]] )     # Group 4: closing bracket
                         \s*  $         # optional whitespaces + must
end at the end
                      """, flags=re.VERBOSE)

This way you can debug and mantain it much better.

Bye,
bearophile




More information about the Python-list mailing list