Please help with regular expression finding multiple floats

Jeremy jlconlin at gmail.com
Fri Oct 23 10:41:25 EDT 2009


On Oct 23, 3:48 am, Edward Dolan <byteco... at gmail.com> wrote:
> On Oct 22, 3:26 pm, Jeremy <jlcon... at gmail.com> wrote:
>
> > My question is, how can I use regular expressions to find two OR three
> > or even an arbitrary number of floats without repeating %s?  Is this
> > possible?
>
> > Thanks,
> > Jeremy
>
> Any time you have tabular data such as your example, split() is
> generally the first choice. But since you asked, and I like fscking
> with regular expressions...
>
> import re
>
> # I modified your data set just a bit to show that it will
> # match zero or more space separated real numbers.
>
> data =
> """
> 1.0000E-08
>
> 1.0000E-08 1.58024E-06 0.0048 1.0000E-08 1.58024E-06
> 0.0048
> 1.0000E-07 2.98403E-05
> 0.0018
> foo bar
> baaz
> 1.0000E-06 8.85470E-06
> 0.0026
> 1.0000E-05 6.08120E-06
> 0.0032
> 1.0000E-03 1.61817E-05
> 0.0022
> 1.0000E+00 8.34460E-05
> 0.0014
> 2.0000E+00 2.31616E-05
> 0.0017
> 5.0000E+00 2.42717E-05
> 0.0017
> total 1.93417E-04
> 0.0012
> """
>
> ntuple = re.compile
> (r"""
> # match beginning of line (re.M in the
> docs)
> ^
> # chew up anything before the first real (non-greedy -> ?)
>
> .*?
> # named match (turn the match into a named atom while allowing
> irrelevant (groups))
> (?
> P<ntuple>
>   # match one
> real
>   [-+]?(\d*\.\d+|\d+\.\d*)([eE][-+]?\d
> +)?
>   # followed by zero or more space separated
> reals
>   ([ \t]+[-+]?(\d*\.\d+|\d+\.\d*)([eE][-+]?\d+)?)
> *)
> # match end of line (re.M in the
> docs)
> $
> """, re.X | re.M) # re.X to allow comments and arbitrary
> whitespace
>
> print [tuple(mo.group('ntuple').split())
>        for mo in re.finditer(ntuple, data)]
>
> Now compare the previous post using split with this one. Even with the
> comments in the re, it's still a bit difficult to read. Regular
> expressions
> are brittle. My code works fine for the data above but if you change
> the
> structure the re will probably fail. At that point, you have to fiddle
> with
> the re to get it back on course.
>
> Don't get me wrong, regular expressions are hella fun to play with.
> You have
> to ask yourself, "Do I really _need_ to use a regular expression here?"

In this simplified example I don't really need regular expressions.
However I will need regular expressions for more complex problems and
I'm trying to become more proficient at using regular expressions.  I
tried to simplify this so as not to bother the mailing list too much.

Thanks for the great suggestion.  It looks like it will work fine, but
I can't get it to work.  I downloaded the simple script you put on
http://codepad.org/Z7eWBusl  but it only prints an empty list.  Am I
missing something?

Thanks,
Jeremy



More information about the Python-list mailing list