matching strings in a large set of strings

Karin Lagesen karin.lagesen at bio.uio.no
Thu Apr 29 11:38:28 CEST 2010


Hello.

I have approx 83 million strings, all 14 characters long. I need to be
able to take another string and find out whether this one is present
within the 83 million strings.

Now, I have tried storing these strings as a list, a set and a dictionary.
I know that finding things in a set and a dictionary is very much faster
than working with a list, so I tried those first. However, I run out of
memory building both the set and the dictionary, so what I seem to be left
with is the list, and using the in method.

I imagine that there should be a faster/better way than this?

TIA,

Karin




More information about the Python-list mailing list