matching strings in a large set of strings

Duncan Booth duncan.booth at invalid.invalid
Thu Apr 29 10:40:14 EDT 2010

MRAB <python at> wrote:

> Karin Lagesen wrote:
>> Hello.
>> I have approx 83 million strings, all 14 characters long. I need to
>> be able to take another string and find out whether this one is
>> present within the 83 million strings.
>> Now, I have tried storing these strings as a list, a set and a
>> dictionary. I know that finding things in a set and a dictionary is
>> very much faster than working with a list, so I tried those first.
>> However, I run out of memory building both the set and the
>> dictionary, so what I seem to be left with is the list, and using the
>> in method. 
>> I imagine that there should be a faster/better way than this?
> You could sort the list and then use a binary chop (Google can tell
> you what that is if you don't know already).

... and Python comes batteries included. You can use bisect.bisect_left to 
do the binary chop and then you just have to check whether it found the 

Duncan Booth

More information about the Python-list mailing list