An Un-optimization anecdote with text-processing
Bernhard Herzog
herzog at online.de
Sat Jun 19 14:05:16 EDT 1999
Chad Netzer <chad at vision.arc.nasa.gov> writes:
> Well, I was in the process of starting to write a Perl program
> for a simple text processing task I have to do, when this post
> steered me back to my language of choice. As I wrote my script,
> which is called by the Unix "find" command, I got to thinking
> about how it was anti-optimized... Here is what I wrote:
>
> import string
> import sys
>
> # Way stupid algorithm, but I'm running it overnight :)
> filename = sys.argv[1]
> f = open(filename, 'r')
> text = f.read()
> while 1:
> i = string.find(text, r'/"')
> if i == -1: break
> text = text[:i+1] + "index.html" + text[i+1:]
>
> f.close()
> f = open(filename, 'w')
> f.write(text)
> f.close()
>
>
>
> As you can see, the core loop scans the entire file for a string
> match, builds a new string by concatenation, then starts scanning from
> the beginning yet again. It almost doesn't seem like it could be made
> much worse, unless one deliberately tried.
>
> Which brings me to my new game... Rather than finding a nice hack
> that would speed things up and use less lines of code, how about
> finding the hack that uses the least amount of code to make this as
> slow as possible? Adding pointless operations, or other obfuscations
> don't count. Try to make this operation as elegantly short, and as
> painfully slow as possible...
>
> For example:
>
> I had originally coded this with string.index() using a try/except
> pair to break out of the loop, which took more lines of code but is
> probably faster than my loop (it avoids the 'if' statement in the
> loop, for the overhead of one exception). So I changed it to be
> shorter and (possibly) slower. Can anyone think of other elegant
> unoptimizations?
My first attempt at a shortest implementation for the string replacement
part is this:
Python 1.5.2 (#3, Apr 21 1999, 16:51:54) [GCC 2.7.2.3] on linux2
Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
>>> from string import *
>>> s = ' "www.python.org/" xyzzy/plugh "fee/fie/foe/foo/" '
>>> join(split(s, '/"'), '/index.html"')
' "www.python.org/index.html" xyzzy/plugh "fee/fie/foe/foo/index.html" '
>>>
But I fear it's also one of the fastest...
Or how about:
>>> reduce(lambda x,y: x[-1] == '/' and y == '"' and x+'index.html"' or x+y, s)
' "www.python.org/index.html" xyzzy/plugh "fee/fie/foe/foo/index.html" '
you don't even need the string module. This is veerrryyyy
sssssllllllooooooowwwwwwww because the result string is copied over and
over again and you call a python function for every char.
To give you an impression of the speed difference:
>>> def test1(s):
.. reduce(lambda x,y: x[-1] == '/' and y == '"' and x+'index.html"' or x+y, s)
..
>>> def test2(s):
.. join(split(s, '/"'), '/index.html"')
..
>>> import time
>>> from string import *
>>> s = open("data.html").read()
>>> t = time.clock(); test1(s); print time.clock() - t
69.49
>>> t = time.clock(); test2(s); print time.clock() - t
0.01
>>>
--
Bernhard Herzog | Sketch, a python based drawing program
herzog at online.de | http://www.online.de/home/sketch/
More information about the Python-list
mailing list