[Tutor] Another example of closures: a function that intercepts exceptions

Thu Dec 12 15:29:05 2002

[The following post is slightly advanced, and written in a badly rushed
fashion; I'm still at work at the moment, but thought this was a cute
example to show folks.]

Hi everyone,

During yesterday's Baypiggies Python meeting, someone brought up a
question about closures: why would anyone want to use something so
academic and weird?

Here's one concrete example where they might come in handy:

######
def wrapErrorsToDefaultValue(function, default_value=None):
    """
    wrapErrorsToDefaultValue(function, default_value=None)
        -> wrapped function

    This adds a small exception handling wrapper around a given
    function.  This wrapped function should behave similarly to the
    input, but if an exception occurs, the wrapper intercepts and
    returns the default_value instead.
    """
    def new_function(*args, **kwargs):
        try:
            return function(*args, **kwargs)
        except:
            return default_value
    return new_function
######

wrapErrorsToDefaultValue() is a function that is pretty cute: it takes in
a function, and returns a new function that's a mimic of the inputted
function... Except it acts as a safety net if the function dies:

###
>>> def divide(a, b):
...     return a / b
...
>>> divide(3, 0)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "<stdin>", line 2, in divide
ZeroDivisionError: integer division or modulo by zero
>>>
>>> wrapped_divide = wrapErrorsToDefaultValue(divide, "This is bad!")
>>> wrapped_divide(42, 2)
21
>>> wrapped_divide(42, 0)
'This is bad!'
###

The wrapped_divide() function doesn't raise an error like the first
divide()  function: rather, it returns the default value that we pass in.

A more realistic example where something like this might be useful is XML
DOM parsing: I'm finding myself diving through some XML data, but not
knowing if a certain element exists or not in the marked-up data file.
In the XML files that I'm reading, there are a list of "gene models", each
of which possibly contain a "CDS" or "PROTEIN" sequence element.  (For
people who are interested, the DTD of the data format I'm parsing is:

    ftp://ftp.tigr.org/pub/data/a_thaliana/ath1/BACS/CHR1/tigrxml.dtd
)

I'm using the pulldom module:

    http://www.python.org/doc/lib/module-xml.dom.pulldom.html

to parse these files, just because the biological sequences involved can
get quite large.  Anyway, back to my example!  Here's a bit of code that
I'm using to pull the text contents of a CDS_SEQUENCE element nested in a
MODEL element:

###
def get_cds_sequence(dom_node):
    """Tries to return the coding region sequence of a model gene."""
    if not dom_node.getElementsByTagName('MODEL'):
        return None
    first_model = dom_node.getElementsByTagName('MODEL')[0]
    if not first_model.getElementsByTagName('CDS_SEQUENCE'):
        return None
    return first_model.getElementsByTagName('CDS_SEQUENCE')[0]\
            .firstChild.data
###

I'm doing all these checks because if I'm not careful, I'll get an
IndexError.  But the code feels like it's just tiptoeing around a
minefield.

There is a good solution though: I can use exception handling to make this
code less awkward:

###
def get_cds_sequence(dom_node):
    """Tries to return the coding region sequence of a model gene."""
    try:
        first_model = dom_node.getElementsByTagName('MODEL')[0]
        return first_model.getElementsByTagName('CDS_SEQUENCE')[0]\
                .firstChild.data
    except IndexError:
        return None
###

But it's still a slight hassle having to wrap everything in try/except
blocks.  I'll be doing this for about ten of my functions, and I don't
want to wrap each function with it's own little try/except.

The code becomes even nicer, though, when I take advantage of that
wrapErrorsToDefaultValue() function:

###
def _get_cds_sequence(dom_node):
    """Tries to return the coding region sequence of a model gene."""
    first_model = dom_node.getElementsByTagName('MODEL')[0]
    return first_model.getElementsByTagName('CDS_SEQUENCE')[0]\
            .firstChild.data

def _get_protein_sequence(dom_node):
    """Tries to return the protein sequence of a model gene."""
    first_model = dom_node.getElementsByTagName('MODEL')[0]
    return first_model.getElementsByTagName('PROTEIN_SEQUENCE')[0]\
            .firstChild.data

for func, name in [ (_get_cds_sequence, 'getCdsSequence'),
                    (_get_protein_sequence, 'getProteinSequence')] :
    globals()[name] = wrapErrorsToDefaultValue(func)
###

(The only problem here is that wrapErrorsToDefaultValue() is too strong of
a safety net: it should really be weakened so that it only responses to
that specific IndexError, rather than everything.)

And we can take closures even further: the code of get_cds_sequence() and
get_protein_sequence() is almost identical, so we can tease the common
elements out:

###
def make_model_seq_function(sequence_type):
    """Makes a new function for extracting the MODEL/[sequence_type]
    of a model gene."""
    def new_function(dom_node):
        "Extracts the %s sequence of a model gene" % sequence_type
        first_model = dom_node.getElementsByTagName('MODEL')[0]
        return first_model.getElementsByTagName(sequence_type)[0]\
                .firstChild.data
    return wrapErrorsToDefaultValue(new_function)

for seq_type, name in [ ('CDS_SEQUENCE', 'getCdsSequence'),
                        ('PROTEIN_SEQUENCE', 'getProteinSequence')]:
    globals()[name] = make_model_seq_function(seq_type)
###

Anyway, I hope that made some sort of sense.  Back to work for me...
*grin*