Extracting values from text file
Mirco Wahab
wahab at chemie.uni-halle.de
Sat Jun 17 21:12:23 EDT 2006
Thus spoke Preben Randhol (on 2006-06-17 23:25):
> The code is a very good starting point for me! I already
> managed to change it and I see I need to make it a bit more robust.
I think, the only thing you have to look at - is
the congruence of the regex-based filter rule and the text.
suppose you have a text:
Apples 34
23 Apples, 234 Lemons 4 Eggs
(note the comma!)
and some rules:
...
'(apples) Apples',
'Apples (apples)',
...
the former program would handle that alright
after changing the variable assignment from:
if k.group(1): varname[k.group(1)] = <value>
to
if k.group(1): varname[k.group(1)] += <value> (note the +=)
and the result would be: 'apples = 57'. It would
add up all values corresponding to one variable.
aehhm ... would be so in Perl, but Python throws
another stone at you:
- you have to explicitly instantiate a dictionary value
(with 0) if/before you want in-place add to it (why is that?)
- you can't have a return value from a regex object
auto-converted to a number even if it _is_ a number
and smells like a number (???)
with these two nasty surprises, out extractor-loop
looks like this:
for rule in filter:
k = re.search(r'\((.+)\)', rule) # pull out variable names ->k
if k.group(1): # pull their values from text
if not varname.has_key(k.group(1)): varname[k.group(1)] = 0;
varname[k.group(1)] += float( \
re.search( re.sub(r'\((.+)\)', varscanner, rule), \
example ).group(1) ) # use regex in modified 'rule'
whereas the in Perl-loop, only + to += would change
for (@filter) {
$v = $1 if s/\((.+)\)/$varscanner/; # pull out variable names ->$1
$varname{$v} += $1 if $example =~ /$_/; # pull their values from text
}
I'll add the complete python program which handles
all cases you mentioned.
Regards
Mirco
==>
DATA = '''
An example text file:
-----------
Some text that can span some lines.
Apples 34
23 Apples, 234 Lemons 4 Eggs
56 Ducks
Some more text.
0.5 g butter
----------------------------------''' # data must show up before usage
filter = [ # define filter table
'(apples) Apples',
'Apples (apples)',
'(ducks) Ducks',
'(lemons) Lemons',
'(eggs) Eggs',
'(butter) g butter',
]
varname = {} # variable names to be found in filter
varscanner = r'\\b(\S+?)\\b' # expression used to extract values
example = DATA # read the appended example text,
import re
for rule in filter: # iterate over filter rules, rules will be in 'rule'
k = re.search(r'\((.+)\)', rule) # pull out variable names ->k
if k.group(1): # pull their values from text
if not varname.has_key(k.group(1)): varname[k.group(1)] = 0;
varname[k.group(1)] += float( \
re.search( re.sub(r'\((.+)\)', varscanner, rule), \
example ).group(1) ) # use regex in modified 'rule'
for key, val in varname.items(): print key, "\t= ", val # print what's found
<==
More information about the Python-list
mailing list