Please help with regular expression finding multiple floats

Edward Dolan bytecolor at gmail.com
Fri Oct 23 11:48:45 CEST 2009


On Oct 22, 3:26 pm, Jeremy <jlcon... at gmail.com> wrote:
> My question is, how can I use regular expressions to find two OR three
> or even an arbitrary number of floats without repeating %s?  Is this
> possible?
>
> Thanks,
> Jeremy

Any time you have tabular data such as your example, split() is
generally the first choice. But since you asked, and I like fscking
with regular expressions...

import re

# I modified your data set just a bit to show that it will
# match zero or more space separated real numbers.

data =
"""
1.0000E-08

1.0000E-08 1.58024E-06 0.0048 1.0000E-08 1.58024E-06
0.0048
1.0000E-07 2.98403E-05
0.0018
foo bar
baaz
1.0000E-06 8.85470E-06
0.0026
1.0000E-05 6.08120E-06
0.0032
1.0000E-03 1.61817E-05
0.0022
1.0000E+00 8.34460E-05
0.0014
2.0000E+00 2.31616E-05
0.0017
5.0000E+00 2.42717E-05
0.0017
total 1.93417E-04
0.0012
"""

ntuple = re.compile
(r"""
# match beginning of line (re.M in the
docs)
^
# chew up anything before the first real (non-greedy -
> ?)
.*?
# named match (turn the match into a named atom while allowing
irrelevant (groups))
(?
P<ntuple>
  # match one
real
  [-+]?(\d*\.\d+|\d+\.\d*)([eE][-+]?\d
+)?
  # followed by zero or more space separated
reals
  ([ \t]+[-+]?(\d*\.\d+|\d+\.\d*)([eE][-+]?\d+)?)
*)
# match end of line (re.M in the
docs)
$
""", re.X | re.M) # re.X to allow comments and arbitrary
whitespace

print [tuple(mo.group('ntuple').split())
       for mo in re.finditer(ntuple, data)]

Now compare the previous post using split with this one. Even with the
comments in the re, it's still a bit difficult to read. Regular
expressions
are brittle. My code works fine for the data above but if you change
the
structure the re will probably fail. At that point, you have to fiddle
with
the re to get it back on course.

Don't get me wrong, regular expressions are hella fun to play with.
You have
to ask yourself, "Do I really _need_ to use a regular expression here?"



More information about the Python-list mailing list