Some optimization tale
Stephan Diehl
stephan.diehlNOSPAM at gmx.net
Sun Dec 28 13:32:24 CET 2003
Terry Reedy wrote:
[...]
>
> and easily messed up;-) If len(item) < len(prefix), item[i] throws
> exception. For this approach to work, prefix should be set as shortest
> string of m in preliminary loop. Also, you reslice m several times. Do
> it once before outer loop.
>
>> It is just not nessesary to compare all strings in the list.
>
> Every string has to be compared to something at least once.
You are right, of course. I have to admit that I've been too sloppy in my
descriptions (and too sloppy in my thinking).
>
[...]
>> Michael Dyck then pointed out that instead of using 'sort', 'min' and
> 'max'
>> should be used. While tests suggest that this is true, I have no idea why
>> that should be, since finding a minimum or maximum uses some sorting
> anyway
>
> No. max and min each do a linear scan. No sorting. But each does at
> least as many character comparisons as modified f1 or f2. The speedup is
> from looping and comparing in C, even though at least twice as many
> compares are done.
Makes sense.
>
>> You might have realized that the optimization so far was done one the
> number
>> of strings. There is still another dimension to optimize in and that is
> the
>> actual string comparing.
>> Raymond Hettinger suggests using a binary search:
>
> Since this only affects the final comparison of min and max, and not the n
> comparisons done to calculate each, the effect is minimal and constant,
> independent of number of strings.
>
> Since this compares slices rather than chars in each loop, I wonder
> whether
> this is really faster than linear scan anyway. I would like to see timing
> of f5 with min/max of f4 combined with linear scan of f3. (Basically, f3
> with sort removed and min/max added.) Since you changed two parts of f3
> to get f4, we cannot be sure that both changes are each an improvement
> even though the combination of two is.
>
> def f5(seq):
> if not seq: return ''
> s1 = min(seq)
> s2 = max(seq)
> n = min(len(s1), len(s2))
> if not n: return '' # not required since s1[0:0] == ''
> for i in xrange(n) :
> if s1[i] != s2[i] :
> return s1[0:i]
> return s1[0:n]
Your f5 function runs virtually at the same speed than Raymonds version.
Even with growing string length, there is no dicernable difference.
Cheers
Stephan
>
> Terry J. Reedy
More information about the Python-list
mailing list