Writing huge Sets() to disk

Martin MOKREJŠ mmokrejs at ribosome.natur.cuni.cz
Mon Jan 10 15:33:32 EST 2005


Istvan Albert wrote:
> Martin MOKREJŠ wrote:
> 
> 
>> But nevertheless, imagine 1E6 words of size 15. That's maybe 1.5GB of raw
>> data. Will sets be appropriate you think?
> 
> 
> You started out with 20E20 then cut back to 1E15 keys
> now it is down to one million but you claim that these
> will take 1.5 GB.

I gave up the theoretical approach. Practically, I might need up
to store maybe those 1E15 keys.

So you say 1 million words is better to store in dictionary than
in a set and use your own function to get out those unique or common
words?

> 
> On my system storing 1 million words of length 15
> as keys of a python dictionary is around 75MB.

Fine, that's what I wanted to hear. How do you improve the algorithm?
Do you delay indexing to the very latest moment or do you let your
computer index 999 999 times just for fun?

> 
> I.




More information about the Python-list mailing list