testing for uniquness in a large list

bearophile bearophileHUGS at lycos.com
Wed Oct 20 18:30:13 CEST 2004


Alex Martelli's solution is *very* good, but there is a sampling
problem: putting a simple printing inside his program:

if not (len(results) % 5000):
    print len(results)

You can see that it slows down a lot when the dictionary contains
about 100000-120000 different sequences, because there are many
collisions, and it keeps slowing down. Probably a little speed up of
this code cannot help. A different algoritm can be useful.
I don't know... direct sequences generation doesn't seem possibile.
Maybe a partially direct generation can be okay.

Hugs,
bearophile



More information about the Python-list mailing list