Sorting: too different times. Why?

Diez B. Roggisch deets at nospam.web.de
Sun Nov 22 12:06:01 CET 2009


n00m schrieb:
> Any comment:
> 
> class Vector:
>     def __init__(self, x, y):
>         self.x = x
>         self.y = y
>     def __cmp__(self, v):
>         if self.x < v.x and self.y > v.y:
>             return -1
>         return 0
> 
> def v_cmp(v1, v2):
>     if v1.x < v2.x and v1.y > v2.y:
>         return -1
>     return 0
> 
> from random import randint
> from time import time
> 
> a = []
> for i in range(200000):

Use xrange instead (unless you are under python3), because for loops you 
don't need the full list range creates - xrange is just a generator.

>     a += [Vector(randint(0, 500000), randint(0, 500000))]

Better use .append here, looks nicer and should also be a bit faster.

> b = a[:]
> c = a[:]
> 
> print 'Sorting...'
> 
> t = time()
> b.sort(cmp=v_cmp)
> print time() - t
> 
> t = time()
> c.sort()
> print time() - t
> 
> print b == c
> 
> 
> 
>>>> ===================================== RESTART ======
>>>>
> Sorting...
> 0.906000137329
> 6.57799983025

I think the main reason is that method-dispatch is more expensive than 
function-dispatch. The former must create a bound method before calling, 
the latter just works out of the box.

Things get better if you do this:

t = time()
c.sort(cmp=Vector.__cmp__)
print time() - t


Although not the exact same performance - I get

Sorting...
0.677843093872
1.4283311367
True

Diez




More information about the Python-list mailing list