Slow comparison between two lists

Jani Tiainen redetin at gmail.com
Thu Oct 23 14:46:05 CEST 2008


On 23 loka, 15:24, Peter Otten <__pete... at web.de> wrote:
> Jani Tiainen wrote:
> > I have rather simple 'Address' object that contains streetname,
> > number, my own status and x,y coordinates for it. I have two lists
> > both containing approximately 30000 addresses.
>
> > I've defined __eq__ method in my class like this:
>
> >     def __eq__(self, other):
> >         return self.xcoord == other.xcoord and \
> >             self.ycoord == other.ycoord and \
> >             self.streetname == other.streetname and \
> >             self.streetno == other.streetno
>
> > But it turns out to be very, very slow.
>
> > Then I setup two lists:
>
> > list_external = getexternal()
> > list_internal = getinternal()
>
> > Now I need get all all addresses from 'list_external' that are not in
> > 'list_internal', and mark them as "new".
>
> > I did it like this:
>
> > for addr in list_external:
> >     if addr not in list_internal:
> >         addr.status = 1 # New address
>
> > But in my case running that loop takes about 10 minutes. What I am
> > doing wrong?
>
> Even if list_external and list_internal contain the same items you need
> about len(list_external)*(len(list_internal)/2), or 45 million comparisons.
> You can bring that down to 30000*some_small_factor if you make your
> addresses hashable and put the internal ones into a dict or set
>
> def __hash__(self):
>    return hash((self.xcoord, self.yccord, self.streetname, self.streetno))
> def __eq__(self, other):
>    # as above
>
> Then
>
> set_internal = set(list_internal)
> for addr in list_external:
>     if addr not in set_internal:
>         addr.status = 1
>
> Note that the attributes relevant for hash and equality must not change
> during this process.

Very complete answer, thank you very much.

I tried that hash approach and sets but it seemed to get wrong results
first time and it was all due my hash method.

Now it takes like 2-3 seconds to do all that stuff and result seem to
be correct. Apparently I have lot to learn about Python... :)

--

Jani Tiainen



More information about the Python-list mailing list