Program inefficiency?
Pablo Ziliani
pablo at decode.com.ar
Sat Sep 29 12:11:33 EDT 2007
hall.jeff at gmail.com wrote:
> Is there a solution here that I'm missing? What am I doing that is so
> inefficient?
>
Hi Jeff,
Yes, it seems you have plenty of performance leaks.
Please see my notes below.
> def massreplace():
> editfile = open("pathname\editfile.txt")
> filestring = editfile.read()
> filelist = filestring.splitlines()
> ## errorcheck = re.compile('(a name=)+(.*)(-)+(.*)(></a>)+')
> for i in range(len(filelist)):
> source = open(filelist[i])
>
>
Read this post:
http://mail.python.org/pipermail/python-list/2004-August/275319.html
Instead of reading the whole document, storing it in a variable,
splitting it and the iterating, you could simply do:
def massreplace():
editfile = open("pathname\editfile.txt")
for source in editfile:
> starttext = source.read()
> interimtext = replacecycle(starttext)
> (...)
>
Excuse me, but this is insane. Do just one call (or none at all, I don't
see why you need to split this into two functions) and let the function
manage the replacement "layers".
I'm skipping the next part (don't want to understand all your logic now).
> (...)
>
> def replacecycle(starttext):
>
Unneeded, IMHO.
> p1= re.compile('(href=|HREF=)+(.*)(#)+(.*)( )+(.*)(">)+')
> (...)
> interimtext = p100.sub(q2, interimtext)
>
Same euphemism applies here. I might be wrong, but I'm pretty confident
you can make all this in one simple regex.
Anyway, although regexes are supposed to be cached, don't need to define
them every time the function gets called. Do it once, outside the
function. At the very least you save one of the most important
performance hits in python, function calls. Read this:
http://wiki.python.org/moin/PythonSpeed/PerformanceTips
Also, if you are parsing HTML consider using BeautifulSoup or
ElementTree, or something (particularly if you don't feel particularly
confident with regexes).
Hope you find this helpful.
Pablo
More information about the Python-list
mailing list