Is there such an idiom?

Ron Adam rrr at ronadam.com
Mon Mar 20 05:15:00 CET 2006


Per wrote:
> Thanks Ron,
>  surely set is the simplest way to understand the question, to see
> whether there is a non-empty intersection. But I did the following
> thing in a silly way, still not sure whether it is going to be linear
> time.
> def foo():
>     l = [...]
>     s = [...]
>     dic = {}
>     for i in l:
>         dic[i] = 0
>     k=0
>     while k <len(s):
>         if s[k] in dic:
>             return True
>         else: pass
>         k+=1
>     if k == len(s):
>         return False
> 
> 
> I am still a rookie, and partly just migrated from Haskell...
> I am not clear about how making one of the lists a dictionary is
> helpful

Lets compare them by checking different length with no overlap which is 
the worst case situation.


## is_interstion comparison

def ii_set(a, b):
     return len(set(a).intersection(b))>0

def ii_dict(l, s):
     dic = {}
     for i in l:
         dic[i] = 0
     for i in s:
         if i in dic:
             return True
     return False

def ii_dict2(l, s):
     dic = dict.fromkeys(l)
     for i in s:
         if i in dic:
             return True
     return False

import time

foos = [ii_set, ii_dict, ii_dict2]
lsts = [10, 100, 1000, 10000, 100000, 1000000]

for f in foos:
     for lst in lsts:
         a = range(lst)
         b = range(lst, lst*2)
         start = time.clock()
         assert f(a,b) == False
         t = time.clock()-start
         print f.__name__, lst, t
     print



ii_set 10 1.25714301678e-005
ii_set 100 2.45841301059e-005
ii_set 1000 0.000162031766607
ii_set 10000 0.0020256764477
ii_set 100000 0.0238681173166
ii_set 1000000 0.23067484834

ii_dict 10 2.31873045317e-005
ii_dict 100 6.73269926764e-005
ii_dict 1000 0.000442234976792
ii_dict 10000 0.0047891561637
ii_dict 100000 0.0502407428877
ii_dict 1000000 0.506360165887

ii_dict2 10 2.70984161395e-005
ii_dict2 100 5.55936578532e-005
ii_dict2 1000 0.000317358770458
ii_dict2 10000 0.00366638776716
ii_dict2 100000 0.0394256811969
ii_dict2 1000000 0.39200764343


The sets solution seems to be the fastest.  And it is usually is when 
you can move more of your task into the built-in methods which are 
programmed in C.

 From what I recently read here (not sure where) in another thread, in 
python 2.3 sets were implemented as a python module using dictionaries, 
and in 2.4 it was written in C code based on the dictionary C code.

Cheers,
    Ron




More information about the Python-list mailing list