Advice on optimium data structure for billion long list?

Darrell news at dorb.com
Sun May 13 08:12:57 EDT 2001


Working with billion's of anything will take a long time.

Have you considered parallel or distributed algorithms?

Use a dictionary until it reaches some huge size.
Then use some method to prune its size. Storing the pruned values to disk.
And loop.

>From a recent posting I recall that dictionaries won't give back memory.
In your case that would be a speed up after it peaks.

My best guess.
--Darrell

"Mark blobby Robinson" wrote:
> Hey guys,
>
> I'd just like to pick the best of the worlds python brains if thats ok.
> I am building a program that is pattern searching in DNA sequences and
> generating a list of combinations of 3 patterns that meet certain
> criteria. My problem is that this list could potentially get as large as
> ~1.4 billion entries. Now originally I was using a dictionary with the
> key as a 15 length string (the patterns catted) and the value simply a
> count of the number of hots for that pattern combination. Obviously that
> hit memory problems pretty soon,  so started storing the information as
[snip]





More information about the Python-list mailing list