Beginner's performance problem
Cameron Laird
claird at starbase.neosoft.com
Tue Apr 9 16:42:31 EDT 2002
In article <slrnab68mh.15p.gerhard at lilith.my-fqdn.de>,
Gerhard =?iso-8859-15?Q?H=E4ring?= <gerhard at bigfoot.de> wrote:
>Mark Charsley wrote in comp.lang.python:
>> For reasons best not gone into, I needed to correct the case of a whole
>> bunch of SQL code. As such I created the little script below...
>>
>> It reads in a source file, then for each line it does a case-insensitive
>> search for each TableName and ColumnName (as contained in the "names"
>> collection), checks that it's not just matching a substring, and then
>> corrects the case of the word if necessary.
>>
>> [snip code]
>>
>> 1) why is it so slow? I wouldn't have thought that processing 13,000 lines
>> * 200 names shouldn't take more than a few seconds on a modern PC.
>
>String manipulation is costly in Python because strings are immutable.
>Every string manipulation creates a new string object in effect. A
>good approach is to tokenize your string into a list (which is
>mutable) and operate on the list only, then join the list into a
>string again.
>
>I haven't tried but an approach using the re module should be really
>fast, and shorter. Untested code follows:
>
>import re
>names = ["ALGORITHM_CLASS", ...]
>
>text = open("ThirteenThousandLines.sql").read()
>for name in names:
> regex = re.compile(name, re.I) # case insensitive search for name
> text = regex.sub(name, text) # replace all occurences
>print text
.
.
.
It's part of the Python Way. Write a seven-line solution,
and it works. Write a hand-tuned hundred-line alternative,
and not only does it take a day longer to get right, but it
runs two orders of magnitude slower. Guido's Master Plan
emphasizes this element of behavorial conditioning.
--
Cameron Laird <Cameron at Lairds.com>
Business: http://www.Phaseit.net
Personal: http://starbase.neosoft.com/~claird/home.html
More information about the Python-list
mailing list