[Tutor] What techniques should I use to make my code run faster? Informational

Thu Aug 10 00:16:37 CEST 2006

      What techniques should I use to make my code run faster?         Always profile before you optimize for speed.   You should always optimize for readability first: it's easier to tune   readable code than to read 'optimized' code, especially if the optimizations   are not effective. Before using any technique that makes the code less   readable, you should check that it's actually a bottleneck in your   application by running your application with the built-in profile.py   script. If your program spends 10% of its time running a particular   method, even if you increase its speed tenfold you've only shaved 9% off the   total running time.   
      Always use a good algorithm when it is available.   The exception to the above rule is when there are known large differences in the   time complexity of alternative algorithms. Reducing running time from   quadratic to linear, or from exponential to polynomial, is always worth   doing unless you are sure that the data sets will always be tiny (less than   a couple of dozen items).   
       Use the simplest option that could possibly work.   Don't use a regular expression if you just want to see if a string starts   with a particular substring: use .startswith instead. Don't use   .index if you just want to see if a string contains a particular   letter: use in instead. Don't use StringIO if you could   just use a list of strings. In general, keeping it simple cuts down on   bugs and makes your code more readable. Even a complicated combination of   .index calls will be much faster than a regular expression, and   probably easier to decipher if you're just matching rather than capturing   the result.   
      Build strings as a list and use ''.join at the end.   Yes, you already saw this one above under "Python Idioms", but it's such   an important one that I thought I'd mention it again.   join is a string method called on the separator,   not the list. Calling it from the empty string concatenates the pieces   with no separator, which is a Python quirk and rather surprising at first.   This is important: string building with + is quadratic time instead of    linear!   

Wrong: 
for s in strings: result += s
Right: 
result = ''.join(strings)

       Use tests for object identity when appropriate: if x is not    None rather than if x != None. It is much more efficient to   test objects for identity than equality, because identity only checks   their address in memory (two objects are identical if they are the same   object in the same physical location) and not their actual data.   
       Use dictionaries for searching, not lists. To find items in common   between two lists, make the first into a dictionary and then look for items   in the second in it. Searching a list for an item is linear-time, while   searching a dict for an item is constant time. This can often let you   reduce search time from quadratic to linear.   
       Use the built-in sort wherever possible. sort can   take a custom comparison function as a parameter, but this makes it very   slow because the function has to be called at least O(n log n) times in the   inner loop. To save time, turn the list of items into a list of tuples,   where the first element of each tuple has the precalculated value of the   function for each item (e.g. extracting a field), and the last element is the   item itself.   
      This idiom is called DSU for 'decorate-sort-undecorate.'   In the 'decorate' step, make a list of tuples   containing (transformed_value, second_key, ... , original value).   In the 'sort' step, use the built-in sort on the tuples. In the   'undecorate' step, retrieve the original list in the sorted order by   extracting the last item from each tuple. For example:   

aux_list = [i.Count, i.Name, ... i) for i in items]
aux_list.sort()    #sorts by Count, then Name, ... , then by item itself
sorted_list = [i[-1] for i in items] #extracts last item

       Use map and/or filter to apply functions to lists.   map applies a function to each item in a list (technically, sequence)   and returns a list of the results. filter applies a function to each   item in a sequence, and returns a list containing only those items for which   the function evaluated True (using the __nonzero__ built-in   method). These functions can make code much shorter. They also make it much   faster, since the loop takes place entirely in the C API and never has to   bind loop variables to Python objects.   

Worse:
strings = []
for d in data:
    strings.append(str(d))

Better:
strings = map(str, data)

       Use list comprehensions where there are conditions attached, or where    the functions are methods or take more than one parameter. These are   cases where map and filter do badly, since you have to   make up a new one-argument function that does the operation you want. This   makes them much slower, since more work is done in the Python layer. List   comprehensions are often surprisingly readable.   

Worse:
result = []
for d in data:
    if d.Count > 4:
        result.append[3*d.Count]

Better:
result = [3*d.Count for d in data if d.Count > 4]

      If you find yourself making the same list comprehension repeatedly, make   utility functions and use map and/or filter:   

def triple(x):
    """Returns 3 * x.Count: raises AttributeError if .Count missing."""
    return 3 * x.Count

def check_count(x):
    """Returns 1 if x.Count exists and is greater than 3, 0 otherwise."""
    try:
        return x.Count > 3
    except:
        return 0

result = map(triple, filter(check_count, data))

       Use function factories to create utility functions.   Often, especially if you're using map and filter a lot,   you need utility functions that convert other functions or methods to   taking a single parameter. In particular, you often want to bind some data   to the function once, and then apply it repeatedly to different objects.   In the above example, we needed a function that multiplied a particular field   of an object by 3, but what we really want is a factory that's able to return   for any field name and amount a multiplier function in that family:   

def multiply_by_field(fieldname, multiplier):
    """Returns function that multiplies field "fieldname" by multiplier."""
    def multiplier(x):
        return getattr(x, fieldname) * multiplier
    return multiplier

triple = multiply_by_field('Count', 3)
quadruple = multiply_by_field('Count', 4)
halve_sum = multiply_by_field('Sum', 0.5)

      This is a very powerful and general technique for producing functions that   might do something like search a specified field for a list of words, or   perform several actions on different fields of a particular object, etc.   It's a pain to write a lot of little functions that do very similar things,   but if they're produced by a function factory it's easy.   
       Use the operator module and reduce to get sums, products, etc.   reduce takes a function and a sequence. First it applies the   function to the first two items, then it takes the result and applies the   function to the result and the next item, takes that result and applies the   function to it and the next item, and so on until the end of the list. This   makes it very easy to accumulate items along a list (or, in fact, any   sequence). Note that Python 2.3 has a built-in sum() function (for numbers only),   making this less necessary than it used to be.   

Worse:
sum = 0
for d in data:
    sum += d
product = 1
for d in data:
    product *= d

Better:
from operator import add, mul
sum = reduce(add, data)
product = reduce(mul, data)

       Use zip and dict to map fields to names.   zip turns a pair of sequences into a list of tuples containing   the first, second, etc. values from each sequence. For example,   zip('abc', [1,2,3]) == [('a',1),('b',2),('c',3)]. You can use   this to save a lot of typing when you have fields in a known order that   you want to map to names:   

Bad:
fields = '|'.split(line)
gi = fields[0]
accession = fields[1]
description = fields[2]
#etc.
lookup = {}
lookup['GI'] = gi
lookup['Accession'] = accession
lookup['Description'] = description
#etc.

Good:
fieldnames = ['GI', 'Accession', 'Description'] #etc.
fields = '|'.split(line)
lookup = dict(zip(fieldnames, fields))

Ideal:
def FieldWrapper(fieldnames, delimiter, constructor=dict):
    """Returns function that splits a line and wraps it into an object.

    Field names are passed in as keyword args, so constructor must be
    expecting them as such.
    """
    def FieldsToObject(line):
        fields = [field.strip() for field in line.split(delimiter)]
        result = constructor(**dict(zip(fieldnames, fields)))
    return FieldsToObject

FastaFactory = FieldWrapper(['GI','Accession','Description'], '|', Fasta)
TaxonFactory = FieldWrapper(['TaxonID', 'ParentID', ...], '|', Taxon)
CodonFreqFactory = FieldWrapper(['UUU', 'UUC', 'UUA',...], ' ', CodonFreq)
#etc for similar data, including any database tables you care to wrap

---------------------------------
Talk is cheap. Use Yahoo! Messenger to make PC-to-Phone calls.  Great rates starting at 1¢/min.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/tutor/attachments/20060809/16a5be90/attachment-0001.html