[Tutor] Aschenputtel problem - Shorter is not better?

Fri Sep 16 19:17:22 CEST 2005

Well, I have some comments ^_^

First, about the function 6, it is exactly equivalent (and a little bit
quicker) to use:

===8<======8<======8<======8<======8<======8<======8<======8<======8<===
def cinderella6(original_list, condition):
    """Using while and list popping."""

    list1 = []
    i = 0
    try:
      while True:
        if not condition(original_list[i]):
          list1.append(original_list.pop(i))
        else:
          i += 1
    except IndexError:
      pass
    return original_list, list1
===8<======8<======8<======8<======8<======8<======8<======8<======8<===

That is, put the try block outside the loop ... however, if this method
is the quickest it is the only destructive one !

Then, you shouldn't time the creation of the list together with
algorithm. Thus the main loop should be:

===8<======8<======8<======8<======8<======8<======8<======8<======8<===
def test(list_length=100, num_func_calls=1000):
    """Profile different implementations of cinderella idiom."""

    global olist, rlist
    olist = range( list_length )
    rlist = array( olist )
    l = []
    for i in range(1,10):
        func = "cinderella%i" % i
        if i == 8:
          stmt = """%s(rlist, check)""" % (func,)
        else:
          stmt = """%s(olist, check)""" % (func,)
        timer = timeit.Timer(stmt,"from __main__ import %s, check,
array, compress, olist, rlist" % func)
        l.append(timer.repeat(3, int(num_func_calls)))
        print "%s: (1) %f, (2) %f, (3) %f" % (func, l[-1][0], l[-1][1],
l[-1][2])
===8<======8<======8<======8<======8<======8<======8<======8<======8<===

Well, I added the creation of a Numeric array ... as the function 8 is
supposed to work with it. So here are the corrected versions of the
functions 7 et 8 :

===8<======8<======8<======8<======8<======8<======8<======8<======8<===
def cinderella7(original_list, condition):
    """Using Numeric and array compression."""

    selection = array( [condition(i) for i in original_list] )
    list1 = compress(selection, original_list)
    list2 = compress(~selection, original_list)
    return list1, list2

def cinderella8(original_list, condition):
    """Using Numeric and array compression."""

    selection = condition( original_list )
    list1 = compress(selection, original_list)
    list2 = compress(~selection, original_list)
    return list1, list2
===8<======8<======8<======8<======8<======8<======8<======8<======8<===

In the end, Numeric rocks :) Solution 8 is even faster than solution 6 !

List length: 100, Function calls: 3 x 1000

cinderella1: (1) 0.228345, (2) 0.356115, (3) 0.229700
cinderella2: (1) 0.306550, (2) 0.308528, (3) 0.312639
cinderella3: (1) 0.469183, (2) 0.471858, (3) 0.474165
cinderella4: (1) 0.311316, (2) 0.327631, (3) 0.449626
cinderella5: (1) 0.309293, (2) 0.305037, (3) 0.293698
cinderella6: (1) 0.096513, (2) 0.095803, (3) 0.096810
cinderella7: (1) 0.661629, (2) 0.786803, (3) 0.673522
cinderella8: (1) 0.197300, (2) 0.198183, (3) 0.198790

List length: 1000, Function calls: 3 x 1000

cinderella1: (1) 2.248727, (2) 2.267243, (3) 2.301675
cinderella2: (1) 3.023581, (2) 3.046691, (3) 3.076899
cinderella3: (1) 29.806438, (2) 28.742620, (3) 31.848374
cinderella4: (1) 3.037407, (2) 3.162915, (3) 3.227520
cinderella5: (1) 4.260166, (2) 4.831273, (3) 3.991498
cinderella6: (1) 0.989976, (2) 1.346485, (3) 0.938494
cinderella7: (1) 5.095143, (2) 6.546663, (3) 6.203648
cinderella8: (1) 0.686484, (2) 0.686852, (3) 0.688778

List length: 10000, Function calls: 3 x 1000

cinderella6: (1) 8.399953, (2) 8.712989, (3) 8.998076
cinderella8: (1) 6.499856, (2) 6.388803, (3) 6.108556

Pierre

Christopher Arndt a écrit :
> Terry Kemmerer schrieb:
> 
>>
>>"Bearing in mind that shorter is not necessarily better..."
>>
>>Wouldn't shorter, as a rule of thumb, as in less language statements, mean fewer 
>>executions, and therefore faster?
> 
> 
> Well, see for yourself...
> 
> I wrote a little benchmarking script for different solutions to my problem. It
> turns out, that the original "traditional for-loop" approach is not only the
> most readable but also one of the fastest.
> 
> Here are my timings (Python 2.4, Athlon64 1.8 Ghz, Ubuntu Hoary) (see the
> attached script for the functions names):
> 
> List length: 100, Function calls: 3 x 1000
> 
> cinderella1: (1) 0.188769, (2) 0.189790, (3) 0.188901
> cinderella2: (1) 0.263702, (2) 0.254624, (3) 0.254050
> cinderella3: (1) 0.430694, (2) 0.270807, (3) 0.265921
> cinderella4: (1) 0.154761, (2) 0.142940, (3) 0.139610
> cinderella5: (1) 0.145273, (2) 0.139491, (3) 0.133930
> cinderella6: (1) 0.144191, (2) 0.139056, (3) 0.141840
> cinderella7: (1) 0.737320, (2) 0.726961, (3) 0.732526
> 
> List length: 1000, Function calls: 3 x 1000
> 
> cinderella1: (1) 1.639245, (2) 0.975745, (3) 0.971775
> cinderella2: (1) 1.337324, (2) 1.330888, (3) 1.319431
> cinderella3: (1) 18.772992, (2) 18.764382, (3) 18.759283
> cinderella4: (1) 1.140375, (2) 1.120827, (3) 1.119281
> cinderella5: (1) 1.460923, (2) 1.444996, (3) 1.444842
> cinderella6: (1) 1.481483, (2) 1.469315, (3) 1.478990
> cinderella7: (1) 7.211108, (2) 7.208998, (3) 7.202029
> 
> "cinderella1" ist the traditional approach, "cinderella3" builds up a list of
> positives and then checks that against the original list by using the "in"
> operator. This scales very badly for longer lists, as you can see in the second
> timing.
> 
> "cinderella7" ist the Numeric approach proposed by Pierre Barbier de Reuille.
> It doesn't compare very well, probably because it has to iterate 3 times over
> the original list and once more over the list of positives.
> 
> The solution using sets ("cinderella4") seems to do very well too, but has
> several limitations:
> 
> - does not preserver order of elements
> - elements must be unique
> - only available from Python 2.3
> 
> I leave it as an axcercise to the reader to try how the different soltutions
> behave when condition(item) is more expansive (i.e takes longer).
> 
> Chris
> 
> 
-- 
Pierre Barbier de Reuille

INRA - UMR Cirad/Inra/Cnrs/Univ.MontpellierII AMAP
Botanique et Bio-informatique de l'Architecture des Plantes
TA40/PSII, Boulevard de la Lironde
34398 MONTPELLIER CEDEX 5, France

tel   : (33) 4 67 61 65 77    fax   : (33) 4 67 61 56 68