Beginner's performance problem

Tue Apr 9 16:42:31 EDT 2002

In article <slrnab68mh.15p.gerhard at lilith.my-fqdn.de>,
Gerhard =?iso-8859-15?Q?H=E4ring?=  <gerhard at bigfoot.de> wrote:
>Mark Charsley wrote in comp.lang.python:
>> For reasons best not gone into, I needed to correct the case of a whole 
>> bunch of SQL code. As such I created the little script below...
>> 
>> It reads in a source file, then for each line it does a case-insensitive 
>> search for each TableName and ColumnName (as contained in the "names" 
>> collection), checks that it's not just matching a substring, and then 
>> corrects the case of the word if necessary.
>>
>> [snip code]
>>
>> 1) why is it so slow? I wouldn't have thought that processing 13,000 lines 
>> * 200 names shouldn't take more than a few seconds on a modern PC.
>
>String manipulation is costly in Python because strings are immutable.
>Every string manipulation creates a new string object in effect. A
>good approach is to tokenize your string into a list (which is
>mutable) and operate on the list only, then join the list into a
>string again.
>
>I haven't tried but an approach using the re module should be really
>fast, and shorter. Untested code follows:
>
>import re
>names = ["ALGORITHM_CLASS", ...]
>
>text = open("ThirteenThousandLines.sql").read()
>for name in names:
>    regex = re.compile(name, re.I) # case insensitive search for name
>    text = regex.sub(name, text)   # replace all occurences
>print text
				.
				.
				.
It's part of the Python Way.  Write a seven-line solution,
and it works.  Write a hand-tuned hundred-line alternative,
and not only does it take a day longer to get right, but it
runs two orders of magnitude slower.  Guido's Master Plan
emphasizes this element of behavorial conditioning.
-- 

Cameron Laird <Cameron at Lairds.com>
Business:  http://www.Phaseit.net
Personal:  http://starbase.neosoft.com/~claird/home.html