Extracting values from text file

Sat Jun 17 21:12:23 EDT 2006

Thus spoke Preben Randhol (on 2006-06-17 23:25):

> The code is a very good starting point for me! I already
> managed to change it and I see I need to make it a bit more robust.

I think, the only thing you have to look at - is
the congruence of the regex-based filter rule  and the text.

suppose you have a text:

   Apples 34
   23 Apples, 234 Lemons 4 Eggs

(note the comma!)
and some rules:
   ...
   '(apples) Apples',
   'Apples (apples)',
   ...

the former program would handle that alright
after changing the variable assignment from:

   if k.group(1):  varname[k.group(1)] = <value>

to
   if k.group(1):  varname[k.group(1)] += <value> (note the +=)

and the result would be: 'apples = 57'. It would
add up all values corresponding to one variable.

aehhm ... would be so in Perl, but Python throws
another stone at you:

- you have to explicitly instantiate a dictionary value
  (with 0) if/before you want in-place add to it (why is that?)

- you can't have a return value from a regex object
  auto-converted to a number even if it _is_ a number
  and smells like a number (???)

with these two nasty surprises, out extractor-loop
looks like this:

for rule in filter:
       k = re.search(r'\((.+)\)', rule)         # pull out variable names ->k
       if k.group(1):                           # pull their values from text
          if not varname.has_key(k.group(1)): varname[k.group(1)] = 0;
          varname[k.group(1)] += float( \
                re.search( re.sub(r'\((.+)\)', varscanner, rule), \
                           example ).group(1)  ) # use regex in modified 'rule'

whereas the in Perl-loop, only + to += would change

for (@filter) {
    $v = $1 if s/\((.+)\)/$varscanner/;     # pull out variable names ->$1
    $varname{$v} += $1 if $example =~ /$_/; # pull their values from text
}

I'll add the complete python program which handles
all cases you mentioned.

Regards

Mirco

==>

DATA = '''
An example text file:
-----------
Some text that can span some lines.
  Apples 34
  23 Apples, 234 Lemons 4 Eggs
  56 Ducks

Some more text.
  0.5 g butter
----------------------------------'''       # data must show up before usage

filter = [                 # define filter table
     '(apples) Apples',
     'Apples (apples)',
     '(ducks) Ducks',
     '(lemons) Lemons',
     '(eggs) Eggs',
     '(butter) g butter',
]
varname = {}                            # variable names to be found in filter
varscanner = r'\\b(\S+?)\\b'            # expression used to extract values
example = DATA                          # read the appended example text,

import re
for rule in filter: # iterate over filter rules, rules will be in 'rule'
       k = re.search(r'\((.+)\)', rule) # pull out variable names ->k
       if k.group(1):                   # pull their values from text
          if not varname.has_key(k.group(1)): varname[k.group(1)] = 0;
          varname[k.group(1)] += float( \
                re.search( re.sub(r'\((.+)\)', varscanner, rule), \
                           example ).group(1)  ) # use regex in modified 'rule'

for key, val in varname.items(): print key, "\t= ", val # print what's found

<==