[Tutor] parsing chemical formula

Martijn Faassen M.Faassen@vet.uu.nl
Mon, 29 Mar 1999 18:34:41 +0200


Tim Wilson wrote:
> 
> I'm still a little intimidated by the OOP
> features that Python offers. I should probably just bit the bullet and dig
> in and learn it.

For starts, you might try thinking of a class instance as a Python
dictionary.

For instance the dictionary:

    # make a dictionary
    aFoo = {}
    # fill dictionary with data
    aFoo['one'] = 1
    aFoo['two'] = 2
    aFoo['three'] = 3

is quite similar to the class:

    # make a class (and also say somehow how the data inside is
structured)
    class Foo:
        def __init__(self, one, two, three):
            self.one = one
            self.two = two
            self.three = three

    # fill an instance of Foo (aFoo) with data
    aFoo = Foo(1, 2, 3)    

Of course, initialization in __init__ isn't necessary, you could do:

    class Foo:
        pass # empty class

    aFoo = Foo()
    aFoo.one = 1
    aFoo.two = 2
    aFoo.three = 3

But the nice thing about a class (as compared to a dictionary) is that
it's easy to add functions to a class that actually do something with
the data stored inside (for instance, do some calculation). By
initializing (lots of) the data in an __init__ function, you're sure
these data members are there during the class's lifetime, as __init__ is
always executed when the class is created:

    class Foo:
        def __init__(self, one, two, three):
            self.one = one
            self.two = two
            self.three = three

        def add_all(self):
            return self.one + self.two + self.three

    aFoo = Foo(1, 2, 3)
    print aFoo.add_all() # prints 6
    aBar = Foo(3, 3, 3) 
    print aBar.add_all() # prints 9

Of course, it's possible to write functions that do this with
dictionaries, but classes add some nice extra structure that is helpful.
They can be useful to bundle data and the functions that operate on that
data together, without the outside world needing to know what's going on
exactly inside the data and the functions (data hiding). Classes of
course also offer inheritance, but that's for later.

> I think I understand how a list would be useful to store
> the atoms until the total mass can be calculated. I don't see where you
> parse the user input here. 

> I'll be more specific: 

> How will it know the difference between CO (carbon monoxide)
> and Co (a cobalt atom)?

Hmm. 

You have this dictionary with as the keys the various element names (and
as value their mass), let's call it 'elements', and a string describing
a molecule called 'molecule'. An  approach may be:

# untested code! Doesn't do boundary checking! is probably slow!
def getWeightForNextElement(molecule, elements):
    # assumption: all elements are a maximum of two characters, where
the first 
    # is a capital, and the second is lowercase

    # if 'molecule' doesn't start with an atom identifier at all we
return 'None'
    if molecule[0] not in string.uppercase:
        return None
    if molecule[1] not in string.lowercase:
        # okay, we're dealing with a single char element, look it up:
        if elements.has_key(molecule[0]):
            return (elements[molecule[0]), 1) # return weight and length
of what we read
        else:
            return None # not a known element
    else:
        # okay, we're dealing with a two char element:
        if elements.has_key(molecule[0:1]):
            return (elements[molecule[0:1]), 2)  # return weight and
length of str we read
        else:
            return None # not a known element

This function, if it works at all :), could be fed with a molecule
string. If the function doesn't return None and thus recognizes the
weight, it'll return the weight value, and the length (1 or 2) of the
characters it read. You can then strip those characters from the front
of the string, feed in the string again, and get the weight of the next
character, until you read the string. Of course it doesn't work with
'()' or numbers yet.
            
> How will the program be
> able to figure out how many atoms of each type are in a molecule like
> (NH4)3PO4? 

Numbers first. You can adapt the previous function (better rewrite it
anyway, it was just a bad first attempt :) so that it recognizes if
digits are involved (string.digits). What it should do is that as soon
as it encounters a digit, it scans if any digits follow. It'll compose a
string of all digits read. It then should convert (with int(), let's
say) this string to an actual amount. It should then somehow notify the
calling function that it read a number, and the value of this number.
Since the function doesn't support this I suggested rewriting. Better
perhaps to do any weight calculations later anyway, and just return the
elements read (not their weights), too.

Parenthesis. You could do something like this:

* If you read a '(' parenthesis, add create a new empty list of
elements, and add this to the list of elements read.

* Do this whenever you see another '(' (a nested '('). So you might get
a list nested in a list nested in a list, if you have a lot of ((()))
stuff.

* Until you read a ')' parenthesis, add any read elements to the current
list (or their mass). If you read numbers, of course do the right
multiplications, or simply add as many elements to the current list as
the number indicates.

When you've read the string (and the string makes sense syntactically;
doesn't contain unknown elements or weird parenthesis such as '(()'),
you'll end up with a big master list of elements (or mass) that may
contain sublists of a similar structure.

Now you want to add the weight of it all:

# untested!
def add_everything(master_list, elements):
    sum = 0.0 
    for el in master_list:
        if el is types.ListType:
            # recursion; add everything in the sublist and add it to the
master sum
            sum = sum + add_everything(el, elements)
        else:
            sum = sum + elements[el]
    return sum

A whole long post. I hope I'm making sense somewhat and that it helps.
:) Please let us all know!

Regards,

Martijn