Help with regular expression in python

Vlastimil Brom vlastimil.brom at gmail.com
Mon Aug 22 15:59:16 CEST 2011


Sorry, if I missed some further specification in the earlier thread or
if the following is oversimplification of the original problem (using
3 numbers instead of 32),
would something like the following work for your data?

>>> import re
>>> data = """2.201000e+01 2.150000e+01 2.199000e+01 : (instance: 0)       :       some description
... 2.201000e+01 2.150000e+01 2.199000e+01 : (instance: 0)       :
  some description
... 2.201000e+01 2.150000e+01 2.199000e+01 : (instance: 0)       :
  some description
... 2.201000e+01 2.150000e+01 2.199000e+01 : (instance: 0)       :
  some description"""
>>> for res in re.findall(r"(?m)^(?:(?:[-+]?(?:\d+(?:\.\d*)?|\.\d+)(?:[eE][-+]?\d+))?\s+){3}(?:.+)$", data): print res
...
2.201000e+01 2.150000e+01 2.199000e+01 : (instance: 0)       :
some description
2.201000e+01 2.150000e+01 2.199000e+01 : (instance: 0)       :
some description
2.201000e+01 2.150000e+01 2.199000e+01 : (instance: 0)       :
some description
2.201000e+01 2.150000e+01 2.199000e+01 : (instance: 0)       :
some description
>>>

i.e. all parentheses are non-capturing (?:...) and there are extra
anchors for line begining and end ^...$ with the multiline flag set
via (?m)
Each result is one matching line in this sample (if you need to acces
single numbers, you could process these matches further or use the new
regex implementation mentioned earlier by mrab (its developer) with
the new match method captures() - using an appropriate pattern with
the needed groupings).

regards,
  vbr



More information about the Python-list mailing list