improving a huge double-for cycle

Sat Sep 20 13:22:35 EDT 2008

Harald Luessen a écrit :
> On Thu, 18 Sep 2008 Bruno Desthuilliers wrote:
> 
>> # Harald : uncomment this and run test_results. As far as I can tell, it
>> # doesn't yields the expected results
>>
>> ## IN7 = IN[:]
>> ## def sortk7(n):
>> ##     return n.coordinates[0]
>>
>> ## def doubles7():
>> ##     "is ordering better ? - Nope Sir, it's broken..."
>> ##     IN7.sort(key=sortk)
>> ##     SN = []
>> ##     sn_append = SN.append
>> ##     in_len = len(IN)
>> ##     for i in xrange(in_len):
>> ##         node_i = IN[i]
>> ##         coords_i = node_i.coordinates
>> ##         for j in xrange(i+1, in_len):
>> ##             if coords_i[0] == IN[j].coordinates[0]:
>> ##                 if coords_i[1] == IN[j].coordinates[1]:
>> ##                     sn_append(node_i)
>> ##             else:
>> ##                 break
>> ##     return SN
> ...
>> Results here (py2.5, gentoo linux, athlonxp1800, 512 ram):
>>
>>>>> test_results()
>> True
>>>>> test_times()
>> doubles0 : 1.55667901039
>> doubles1 : 0.719144105911
>> doubles2 : 0.703393936157
>> doubles3 : 0.700654983521
>> doubles4 : 0.706257104874
>> doubles5 : 0.528184890747
>> doubles6 : 0.461633205414
>> doubles8 : 0.0134379863739
>> doubles9 : 0.0108540058136
> 
> When you change my code then do it right. :-)
> You forgot to change the IN to IN7 at _every_ place.
> And the sortk should be sortk7 in _both_ places.

doh :(

Sorry Harald, my fault, you're right...

> I never let the code run before myself. I just wrote it 
> in the newsreader. But now i did and I have a second 
> version as bonus.
> 
> 
> IN7 = IN[:]
> 
> def sortk7(n):
>     return n.coordinates[0], n.coordinates[1]
> 
> def doubles7():
>     IN7.sort(key=sortk7)
>     SN = []
>     sn_append = SN.append
>     in_len = len(IN7)
>     for i in xrange(in_len):
>         node_i = IN7[i]
>         coords_i = node_i.coordinates
>         for j in xrange(i+1, in_len):
>             if coords_i[0] == IN7[j].coordinates[0]:
>                 if coords_i[1] == IN7[j].coordinates[1]:
>                     sn_append(node_i)
>             else:
>                 break
>     return SN
> 
> 
> def comp7( x, y ):
>      return cmp( x.coordinates, y.coordinates )
> 
> def doubles7a():
>     IN7.sort( comp7 )
>     SN = []
>     sn_append = SN.append
>     in_len = len(IN7)
>     for i in xrange(in_len):
>         node_i = IN7[i]
>         for j in xrange(i+1, in_len):
>             node_j = IN7[j]
>             if comp7( node_i, node_j ) == 0:
>                 sn_append(node_i)
>             else:
>                 break
>     return SN
> 
> 
> Here are the results. (py2.5, WindowsXP, Pentium4, 2.6GHz, 1.5GB):
> My version is not so bad.
> 
> doubles0 : 1.03830598582
> doubles1 : 0.47943719104
> doubles2 : 0.487412506338
> doubles3 : 0.475924733451
> doubles4 : 0.466548681466
> doubles5 : 0.340487967046
> doubles6 : 0.278480365521
> doubles7 : 0.0953190978183
> doubles7a : 0.0784233750379
> doubles8 : 0.010236496538
> doubles9 : 0.00742803903848
> 

Point taken. Now there's one thing I find questionnable with your 
solution (which is probably why I didn't bother double-checking it when 
it *appeared* to be broken) : you assume that it's ok to loose original 
ordering, which I strongly suspect is not the case for the OP use case, 
and given the OP use case list's size, working on a copy might not be an 
acceptable solution.

But anyway: this is not an excuse for me having broken your code. My 
apologies.