optimization question

Peter Hansen peter at engcorp.com
Sun Aug 11 20:15:15 CEST 2002


Andrew Koenig wrote:
> 
> Peter> Anyway, you are optimizing before you've profiled and determined
> Peter> you have a problem.
> 
> Correct.
> 
> The application I'm thinking of will do lots of comparisons of the form
> 
>         s[i:j] == t
> 
> [...] And if I don't do
> something about it in advance, it will be a real nuisance to find all of
> the places where the code needs to change.
> 
> So, for example, in this case, I believe it is worth writing a function
> that might look like this:
> 
>         def eqsub(s, i, j, t):
>                 return s[i:j] == t

Defining your comparison as such a routine would certainly be the most
readable and maintainable way to go about it (perhaps with a more descriptive
name), regardless of optimization issues.  Therefore do it that way 
and don't worry about performance issues yet.

> and call this function everywhere instead of doing direct comparisons.
> Then the optimization becomes trivial:
> 
>         def eqsub(s, i, j, t):
>                 return (len(t) == j-i) and s[i:j] == t
> 
> which avoids building the substrings unless necessary.

That optimization might be adequate, or it might not be.  You might
find that building the useless extra string with s[i:j] is
still a problem in the final system.  In that case, you will 
easily be able to replace the eqsub() routine with an extension
written in C, for example, and guarantee that you never build
a string you don't need at all.

I understand now more about what you're doing; yes, it's sometimes
useful to consider these things in advance.  I don't think it 
changes the fact that readable *working* code is where you should
start, and that optimization should still never be done without
profiling to determine the bottlenecks.

Some of this attitude comes from XP's "YAGNI" (you ain't gonna
need it)... maybe you'll find after all the coding that the
overall algorithm or concept was flawed, and that you don't
want the s[i:j] == t comparison at all.  You'll find that out
much faster if you just code it, cleanly, and get it working.
Or you can take the time you saved, to write the C extension
that will put performance through the roof.

...nothing I'm sure you haven't heard before.

Cheers,
-Peter



More information about the Python-list mailing list