Extract value and average

Mon Jun 8 14:08:16 EDT 2009

Tim Chase wrote:
>> I would like to extract values corresponding to variable DIHED (here
>> 4660.1650) and getting also the mean value from all DIHED.
> 
> To just pull the DIHED values, you can use this:
> 
>    import re
>    find_dihed_re = re.compile(r'\bDIHED\s*=\s*([.-e\d]+)', re.I)
>    total = count = 0
>    for line in file('file.txt'):
>      m = find_dihed_re.search(line)
>      if m:
>        str_value = m.group(1)
>        try:
>          f = float(str_value)
>          total += f
>          count += 1
>        except:
>          print "Not a float: %s" % str_value
>    print "Total:", total
>    print "Count:", count
>    if count:
>      print "Average:", total/count
> 
> If you want a general parser for the file, it takes a bit more work.

Just because I was a little bored:

   import re
   pair_re = re.compile(r'\b([^=:]+)\s*[=:]\s*([-.e\d]+)', re.I)
   def builder(fname='file.txt'):
     thing = {}
     for line in file(fname):
       if not line.strip(): continue
       line = line.upper()
       if 'NSTEP' in line:   # 1
         # it's a new thing  # 1
         if thing:           # 1
           yield thing       # 1
           thing = {}        # 1
       thing.update(dict(
         (k.strip(), float(v))
         for k,v in pair_re.findall(line)
         ))
       #if 'EWALD' in line:   # 2
       #  # it's a new thing  # 2
       #  if thing:           # 2
       #    yield thing       # 2
       #    thing = {}        # 2
     if thing:
       yield thing

   # average the various values to demo
   total = count = 0
   for thing in builder():
     total += thing.get('DIHED', 0)
     count += 1

   print "Total:", total
   print "Count:", count
   if count:
     print "Average:", total/count

This makes a more generic parser (comment/uncomment the 
corresponding "# 1" or "# 2" code based on whether a new block is 
found by a first line containing "NSTEP" or a last line 
containing "EWALD").  This yields a dictionary for each item in 
the input file.  You can pull out whichever value(s) you want to 
manipulate.

-tkc