Program inefficiency?

Sat Sep 29 12:11:33 EDT 2007

hall.jeff at gmail.com wrote:
> Is there a solution here that I'm missing? What am I doing that is so
> inefficient?
>   

Hi Jeff,

Yes, it seems you have plenty of performance leaks.
Please see my notes below.

> def massreplace():
>     editfile = open("pathname\editfile.txt")
>     filestring = editfile.read()
>     filelist = filestring.splitlines()
> ##    errorcheck = re.compile('(a name=)+(.*)(-)+(.*)(></a>)+')
>     for i in range(len(filelist)):
>         source = open(filelist[i])
>
>   

Read this post: 
http://mail.python.org/pipermail/python-list/2004-August/275319.html
Instead of reading the whole document, storing it in a variable, 
splitting it and the iterating, you could simply do:

def massreplace():
    editfile = open("pathname\editfile.txt")
    for source in editfile:

>         starttext = source.read()
>         interimtext = replacecycle(starttext)
>         (...)
>   

Excuse me, but this is insane. Do just one call (or none at all, I don't 
see why you need to split this into two functions) and let the function 
manage the replacement "layers".

I'm skipping the next part (don't want to understand all your logic now).

> (...)
>
> def replacecycle(starttext):
>   

Unneeded, IMHO.

>     p1= re.compile('(href=|HREF=)+(.*)(#)+(.*)( )+(.*)(">)+')
>     (...)
>     interimtext = p100.sub(q2, interimtext)
>   

Same euphemism applies here. I might be wrong, but I'm pretty confident 
you can make all this in one simple regex.
Anyway, although regexes are supposed to be cached, don't need to define 
them every time the function gets called. Do it once, outside the 
function. At the very least you save one of the most important 
performance hits in python, function calls. Read this: 
http://wiki.python.org/moin/PythonSpeed/PerformanceTips
Also, if you are parsing HTML consider using BeautifulSoup or 
ElementTree, or something (particularly if you don't feel particularly 
confident with regexes).

Hope you find this helpful.
Pablo