Advice on optimium data structure for billion long list?
news at dorb.com
Sun May 13 08:12:57 EDT 2001
Working with billion's of anything will take a long time.
Have you considered parallel or distributed algorithms?
Use a dictionary until it reaches some huge size.
Then use some method to prune its size. Storing the pruned values to disk.
>From a recent posting I recall that dictionaries won't give back memory.
In your case that would be a speed up after it peaks.
My best guess.
"Mark blobby Robinson" wrote:
> Hey guys,
> I'd just like to pick the best of the worlds python brains if thats ok.
> I am building a program that is pattern searching in DNA sequences and
> generating a list of combinations of 3 patterns that meet certain
> criteria. My problem is that this list could potentially get as large as
> ~1.4 billion entries. Now originally I was using a dictionary with the
> key as a 15 length string (the patterns catted) and the value simply a
> count of the number of hots for that pattern combination. Obviously that
> hit memory problems pretty soon, so started storing the information as
More information about the Python-list