matching strings in a large set of strings
News123
news1234 at free.fr
Sat May 1 07:48:02 EDT 2010
Dennis Lee Bieber wrote:
> On Thu, 29 Apr 2010 11:38:28 +0200, "Karin Lagesen"
> <karin.lagesen at bio.uio.no> declaimed the following in comp.lang.python:
>
>> Hello.
>>
>> I have approx 83 million strings, all 14 characters long. I need to be
>> able to take another string and find out whether this one is present
>> within the 83 million strings.
>>
>>
> So don't load them into memory... First use a file-based (not memory
>
>
> That lets you do a binary search on the file. Much faster than a
> linear search (linear search will average out to 41.5M read operations;
> binary should be around 10000 reads)
Don't you meant 27 reads instead of 41.5 M reads?
>>> from math import log
>>> log(83e6)/log(2)
26.306608000671101
>>>
N
More information about the Python-list
mailing list