Re: [Tutor] make code less ugly/more pythonic?

Wed Dec 10 15:04:03 EST 2003

ashleigh smythe wrote:
> Is that awkard and unpythonic?  

You've mainly missed a few conveniences in Python that
would make your code a little more convenient. Using
some more features would remove the need for some of
your functions, but in general it's quite normal that
functions call functions that call functions. This is
a normal way of solving problems. Divide the problem
in smaller problems and postpone the solution of those 
smaller problems by making them separate functions or
classes or modules or whatever.

Learning the features of the builtins and standard
library takes some time, and before we do that, we
are bound to reinvent the wheel... (That's why it's
reccomended that you keep the library manual handy.)

> Should the whole thing be in a class - how would I do that?

I see no compelling reason in this case, but it would
be a small change.

> The program simply parses a file that has 10,000 iterations of the
> following:

Ok. In that case it's obviously correct to work through
the file in chunks. (They could be more than one line at
a time though.) If it had been convenient to read the
whole file into memory, I would have used re.findall which
is my favourite in the re module. :)

Something like (not really tested):

import re

data = file(file_name).read()
pat = re.compile(r'Component information \(consensus fork\) = (\d+) ')
nums = [float(n) for n in pat.findall(data)]
print sum(nums)/len(nums)

(By using float() instead of int(), I avoid the problem
of having to fix the fact that int / int => int, but other
solutions would be

from __future__ import division
import re

data = file(file_name).read()
pat = re.compile(r'Component information \(consensus fork\) = (\d+) ')
nums = [int(n) for n in pat.findall(data)]
print sum(nums)/len(nums)

or

import re

data = file(file_name).read()
pat = re.compile(r'Component information \(consensus fork\) = (\d+) ')
nums = [int(n) for n in pat.findall(data)]
print sum(nums)/float(len(nums))

I don't know if there is any performance difference. But either
way, this isn't really suitable with a big file. I'll return to
your code, but I'll skip the re module (you don't need that for
such simple needs) and simplify some other things.

> def make_int(mylist):                     #turns the list of consensus 
>     an_int_list=[int(n) for n in mylist]  #fork values from strings to
>     return an_int_list                    #integers so I can average
>                                           #them.

I'd convert each item to int (or float as suggested above) before
I appended it to the list. Then this step can be skipped.

> def add_up(mylist):                       #calls function to turn
>     int_list=make_int(mylist)             #strings into          
>     asum=0                                #integers.
>     for x in int_list:                    #sums up all the values
>         asum +=x
>     return asum

Use the sum() builtin function.

> def average(mylist):                      #takes the sum of consensus
>     sum=add_up(mylist)                    #fork values and finds their  
>     length=len(mylist)                    #average.
>     average_confork=sum/length
>     return average_confork

With the advice above, this turns into:

def average(mylist):
    return sum(mylist)/len(mylist)

or possibly

def average(mylist):
    return sum(mylist)/float(len(mylist))

That's hardly worth putting in a separate function.

> mainlist=[]                               #initialize the list of 
>                                           #consensus fork values.

Global variables should be avoided!!!

> def search(file_to_search):               
>     infile=open(file_to_search, 'r')
>     for line in infile:
>         if re.search('Component', line):  #searches the input file for
>             confork=line[45:49]           #line containing cons. fork,
>             mainlist.append(confork)      #removes the value and puts 
>     return mainlist                       #it into a list.

And since you return mainlist, you could have defined it inside your
search function instead.

> def aveconfork(file_to_search):           #this executes the whole thing
>     thelist=search(file_to_search)        #so I start the program with 

But you never use that return value! (You could have used
"thelist" instead of "mainlist" in the next line of code.)

>     finalaverage=average(mainlist) <= HERE!        #aveconfork.aveconfork
>     return finalaverage                   #('file_to_search')

Maybe something like this would be an improvement?

from __future__ import division # Get new style division: int / int => float

def search(file_to_search):
    infile = file(file_to_search)
    mainlist = []
    for line in infile:
        if 'Component' in line:
            confork = int(line[45:49])
            mainlist.append(confork)
    return mainlist

def aveconf(file_to_search):
    list_of_numbers = search(file_to_search)
    final_average = sum(list_of_numbers)/len(list_of_numbers)
    return final_average 

There is one thing I still don't like. The search function
would be more generic if it worked with any sequence. I'd
rather open the file in "aveconf". That brings us to:

from __future__ import division 

def search(lines):
    mainlist = []
    for line in lines:
        if 'Component' in line:
            confork = int(line[45:49])
            mainlist.append(confork)
    return mainlist

def aveconf(file_to_search):
    my_lines = file(file_to_search)
    list_of_numbers = search(my_lines)
    final_average = sum(list_of_numbers)/len(list_of_numbers)
    return final_average 

-- 
Magnus Lycka, Thinkware AB
Alvans vag 99, SE-907 50 UMEA, SWEDEN
phone: int+46 70 582 80 65, fax: int+46 70 612 80 65
http://www.thinkware.se/  mailto:magnus at thinkware.se