An Un-optimization anecdote with text-processing

Chad Netzer chad at vision.arc.nasa.gov
Sat Jun 19 08:25:53 EDT 1999


Phil Mayes wrote:

> A raw email always has a blank line between the header and the body.
> So you can read it in and find the gap by looking for 2 EOLs:
>
> [Code snipped]
> --
> Phil Mayes    pmayes AT olivebr DOT com

Well, I was in the process of starting to write a Perl program
for a simple text processing task I have to do, when this post
steered me back to my language of choice.  As I wrote my script,
which is called by the Unix "find" command, I got to thinking
about how it was anti-optimized...  Here is what I wrote:

import string
import sys

# Way stupid algorithm, but I'm running it overnight :)
filename = sys.argv[1]
f = open(filename, 'r')
text = f.read()
while 1:
    i = string.find(text, r'/"')
    if i == -1: break
    text = text[:i+1] + "index.html" + text[i+1:]

f.close()
f = open(filename, 'w')
f.write(text)
f.close()


As you can see, the core loop scans the entire file for a string
match, builds a new string by concatenation, then starts scanning from
the beginning yet again.  It almost doesn't seem like it could be made
much worse, unless one deliberately tried.

Which brings me to my new game...  Rather than finding a nice hack
that would speed things up and use less lines of code, how about
finding the hack that uses the least amount of code to make this as
slow as possible?  Adding pointless operations, or other obfuscations
don't count.  Try to make this operation as elegantly short, and as
painfully slow as possible...

For example:

I had originally coded this with string.index() using a try/except
pair to break out of the loop, which took more lines of code but is
probably faster than my loop (it avoids the 'if' statement in the
loop, for the overhead of one exception).  So I changed it to be
shorter and (possibly) slower.  Can anyone think of other elegant
unoptimizations?

Chad Netzer

PS.  I'm not suggesting running elaborate benchmarks or anything, just
hoping to hear some thoughts from the real Python masters.  It's
Saturday morning, 5:18 AM, and I'm STILL at work, so that might
explain the loopy idea.






More information about the Python-list mailing list