[Tutor] Testing if a number occurs more than once [profiling
a function using profile.run()]
Danny Yoo
dyoo@hkn.eecs.berkeley.edu
Mon Dec 2 17:10:03 2002
On Mon, 2 Dec 2002, Yann Le Du wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On Mon, 2 Dec 2002, Michael Williams wrote:
>
> > as not only convoluted, but slow (this is the dominant loop of the
> > program). There must be a better way! Any suggestions?
>
> try :
>
> def morethanone(l):
> for n in l:
> if l.count(n)>1:
> return 1
> return 0
This is an expensive way to do the uniqueness check, since l.count()
itself will be doing a loop to run through every element in l. There are
really two nested loops that Python is running through.
If we use morethanone() on long lists, in a worst-case scenario, the time
it takes to check for uniqueness will be proportional to the number of
elements, squared. It's squared because we can roughly estimate the cost
of l.count()ing to be len(l), and we're doing the l.count()ing len(l)
times, so the total cost is roughly len(l)**2.
We can test this performance empirically by using the nice 'profile'
module:
http://www.python.org/doc/lib/profile-instant.html
For example:
###
>>> import profile
>>> def morethanone(l):
... for n in l:
... if l.count(n) > 1:
... return 1
... return 0
...
>>> l1, l2, l3 = range(1000), range(10000), range(100000)
>>> profile.run('morethanone(l1)')
3 function calls in 0.180 CPU seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.140 0.140 0.140 0.140 <stdin>:1(morethanone)
1 0.000 0.000 0.140 0.140 <string>:1(?)
1 0.040 0.040 0.180 0.180 profile:0(morethanone(l1))
0 0.000 0.000 profile:0(profiler)
>>> profile.run('morethanone(l2)')
3 function calls in 15.630 CPU seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 15.630 15.630 15.630 15.630 <stdin>:1(morethanone)
1 0.000 0.000 15.630 15.630 <string>:1(?)
1 0.000 0.000 15.630 15.630 profile:0(morethanone(l2))
0 0.000 0.000 profile:0(profiler)
>>> profile.run('morethanone(l3)')
# ... time passes
###
Actually, 'morethanone(l3)' was taking way way too long, so I cancelled
it. But already we can see the difference between the first two calls of
morethanone(l1) and morethanone(l2) --- the second call takes about 100
times as long to finish as the first, and we'd expect morethanone(l3) to
take about 0.180 seconds * (100)*2 = 1800 seconds = 30 minutes. Ouch.
A better approach could be to first sort() the list of numbers, and run
through adjacent pairs of elements. The duplicates, if they exist, will
clump up together during the sort, and will be easy to pick up when we
compare adjacent elements.
###
>>> def morethanone2(l):
... l.sort()
... for i in range(1, len(l)):
... if l[i-1] == len[i]:
... return 1
... return 0
...
>>> def morethanone2(l):
... l.sort()
... for i in range(1, len(l)):
... if l[i-1] == l[i]:
... return 1
... return 0
...
>>> profile.run('morethanone2(l1)')
3 function calls in 0.000 CPU seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.000 0.000 <stdin>:1(morethanone2)
1 0.000 0.000 0.000 0.000 <string>:1(?)
1 0.000 0.000 0.000 0.000 profile:0(morethanone2(l1))
0 0.000 0.000 profile:0(profiler)
>>> profile.run('morethanone2(l2)')
3 function calls in 0.010 CPU seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.010 0.010 0.010 0.010 <stdin>:1(morethanone2)
1 0.000 0.000 0.010 0.010 <string>:1(?)
1 0.000 0.000 0.010 0.010 profile:0(morethanone2(l2))
0 0.000 0.000 profile:0(profiler)
>>> profile.run('morethanone2(l3)')
3 function calls in 0.160 CPU seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.160 0.160 0.160 0.160 <stdin>:1(morethanone2)
1 0.000 0.000 0.160 0.160 <string>:1(?)
1 0.000 0.000 0.160 0.160 profile:0(morethanone2(l3))
0 0.000 0.000 profile:0(profiler)
###
Here, we can see that the time taken as we bump up the size of the input
grows a little faster than linear:
size of input time it takes
------------ -------------
1000 0.0
10000 0.010
100000 0.160
which makes sense: sort()ing isn't too expensive, and we just need to do a
single scan across the sorted list to find duplicates.
Does anyone want to check how Michael's original morethanone() function
performs?
Good luck to you!