matching strings in a large set of strings

Stefan Behnel stefan_ml at behnel.de
Thu Apr 29 06:23:52 EDT 2010


Karin Lagesen, 29.04.2010 11:38:
> I have approx 83 million strings, all 14 characters long. I need to be
> able to take another string and find out whether this one is present
> within the 83 million strings.
>
> Now, I have tried storing these strings as a list, a set and a dictionary.
> I know that finding things in a set and a dictionary is very much faster
> than working with a list, so I tried those first. However, I run out of
> memory building both the set and the dictionary, so what I seem to be left
> with is the list, and using the in method.
>
> I imagine that there should be a faster/better way than this?

Try one of the dbm modules in the stdlib. They give you dictionary-like 
lookups on top of a persistent database.

Stefan




More information about the Python-list mailing list