Some optimization tale

Stephan Diehl stephan.diehlNOSPAM at gmx.net
Sun Dec 28 07:32:24 EST 2003


Terry Reedy wrote:

[...]
> 
> and easily messed up;-)  If len(item) < len(prefix), item[i] throws
> exception.  For this approach to work, prefix should be set as shortest
> string of m in preliminary loop.  Also, you reslice m several times.  Do
> it once before outer loop.
> 
>> It is just not nessesary to compare all strings in the list.
> 
> Every string has to be compared to something at least once.

You are right, of course. I have to admit that I've been too sloppy in my
descriptions (and too sloppy in my thinking).

>
 
[...]

>> Michael Dyck then pointed out that instead of using 'sort', 'min' and
> 'max'
>> should be used. While tests suggest that this is true, I have no idea why
>> that should be, since finding a minimum or maximum uses some sorting
> anyway
> 
> No.  max and min each do a linear scan.  No sorting.  But each does at
> least as many character comparisons as modified f1 or f2.  The speedup is
> from looping and comparing in C, even though at least twice as many
> compares are done.

Makes sense.

> 
>> You might have realized that the optimization so far was done one the
> number
>> of strings. There is still another dimension to optimize in and that is
> the
>> actual string comparing.
>> Raymond Hettinger suggests using a binary search:
> 
> Since this only affects the final comparison of min and max, and not the n
> comparisons done to calculate each, the effect is minimal and constant,
> independent of number of strings.
> 
> Since this compares slices rather than chars in each loop, I wonder
> whether
> this is really faster than linear scan anyway.  I would like to see timing
> of f5 with min/max of f4 combined with linear scan of f3.  (Basically, f3
> with sort removed and min/max added.)  Since you changed two parts of f3
> to get f4, we cannot be sure that both changes are each an improvement
> even though the combination of two is.
> 
> def f5(seq):
>     if not seq: return ''
>     s1 = min(seq)
>     s2 = max(seq)
>     n = min(len(s1), len(s2))
>     if not n: return '' # not required since s1[0:0] == ''
>     for i in xrange(n) :
>         if s1[i] != s2[i] :
>             return s1[0:i]
>     return s1[0:n]

Your f5 function runs virtually at the same speed than Raymonds version.
Even with growing string length, there is no dicernable difference.

Cheers

Stephan
> 
> Terry J. Reedy





More information about the Python-list mailing list