[Tutor] reg exps
Karl Pflästerer
sigurd at 12move.de
Wed Feb 4 20:22:30 EST 2004
On 4 Feb 2004, Kim Branson <- kim.branson at csiro.au wrote:
> i have a program which will spit out data like so:
> % ~/Desktop/Dock4_osx/bin/scorer 1rx3.pdb dock_nrg.mol2
> PK= 6.08 Qual= 2.12 PMF= -159.15 PMF_rb= -144.15 SMoG= -161.27
> SMoG_H= -7.68 ChemScore= -23.86 Clash= 0.85 Int= 2.58 DockNRG=
> -24.51 AutoDock= -20.14
> so i'm working on a script which has a function (below) that checks
> for this data (the program called can spit out other data when inputs
> are bad) then grabs the matches. so now i'm making dictionaries for
> each field, using the line number as a key. my pattern match does not
> return the first match in the list in position 0.
> i.e
> [' ', ' 6.08', ' 2.12', '-159.15', '-144.15', '-161.27', '-7.68',
> '-23.86', ' 0.85', ' 2.58', '-24.51', '-20.14', '\n']
> so i'm grabbing the data from position 1 in the list etc, and working
> from there. Why is this, is this a default behaviour?
That's the default behaviour if you split the way you do it.
> note the values in the output can be negative or really large, as in
> PK= -6.08 etc, or Qual= 12.12 so i use (.*) to grab the region.
Why not `.+' so not to match an empty string?
> Oh one more thing, if you declare a global, can i simply add
> dictionary content, or should one declare and then initialise?
The latter; but do you really want such a lot of globals? That's ugly
IMO. Here a class with the dictionaries as attributes seems to me to be
the right thing.
Furthermore you can use in Python named groups for your regexps; that's
sometimes very convenient (and you needn't split here).
[Code]
What do you think about:
********************************************************************
class ScoreOrient (object):
score_lines = re.compile('''PK= (?P<score>.*) Qual= (?P<qual>.*) PMF= (?P<pmf>.*)\
PMF_rb= (?P<pmf_rb>.*) SMoG= (?P<smog>.*) SMoG_H= (?P<smog_h>.*) \
ChemScore= (?P<chemscore>.*) Clash= (?P<clash>.*) Int= (?P<int>.*) \
DockNRG= (?P<docknrg>.*) AutoDock= (?P<autodock>.*)''')
def __init__(self):
self.counter = 1
self.tables = dict(\
[(name , {}) for name in
["score", "qual", "pmf", "pmf_rb", "smog", "smog_h", "chemscore",
"clash", "int", "docknrg", "autodock"]])
results = False
def __str__(self):
s = []
for key in self.tables:
s.append(key + '\n')
s.append("-" * len(key) + '\n')
items = self.tables[key].items()
items.sort()
for key, val in items:
s.append(str(key) + ' -> ' + val + '\n')
return ''.join(s)
def _pull (self, prog = '/Users/kbranson/Desktop/Dock4_osx/bin/scorer'):
self.results = os.popen2("%s %s dock_nrg" % (prog, receptor_pdb))
return self.results[1].readlines()
def update (self):
for line in self._pull():
m = self.score_lines.search(line)
if m:
for grp, val in m.groupdict().items():
table = self.tables[grp]
table[counter] = val
self.counter += 1
********************************************************************
What is convenient here is the usage of the tables as entries in an
dictionary; the names are the same names as the names of the named
groups. This makes it extremly easy to access the right table for the
right value.
You just create an instanze of the class and call its update method to
fill the tables.
If you use named groups within a regexp a match object has an attribute:
its groupdict. The key is the name you gave the group, the value is the
matched string. That is here convenient since we can use the name of
the group to find the right table in the dictionary with the tables
since that name gets used as key.
The __str__ method is more a bit fun. You can now just print the
instance and the values in the tables are printed in a more or less nice
fashion.
I couldn't really test the code since I don't have yourt programm here
so there may be little bugs in the code (but I hope not).
Karl
--
Please do *not* send copies of replies to me.
I read the list
More information about the Tutor
mailing list