Writing huge Sets() to disk
Paul McGuire
ptmcg at austin.rr._bogus_.com
Mon Jan 10 12:00:18 EST 2005
"Martin MOKREJĀ©" <mmokrejs at ribosome.natur.cuni.cz> wrote in message
news:mailman.448.1105373471.22381.python-list at python.org...
> Hi,
> I have sets.Set() objects having up to 20E20 items,
> each is composed of up to 20 characters. Keeping
> them in memory on !GB machine put's me quickly into swap.
> I don't want to use dictionary approach, as I don't see a sense
> to store None as a value. The items in a set are unique.
>
> How can I write them efficiently to disk? To be more exact,
> I have 20 sets. _set1 has 1E20 keys of size 1 character.
>
> alphabet = ('G', 'A', 'V', 'L', 'I', 'P', 'S', 'T', 'C', 'M', 'A', 'Q',
'F', 'Y', 'W', 'K', 'R', 'H', 'D', 'E')
> for aa1 in alphabet:
> # l = [aa1]
> #_set1.add(aa1)
> for aa2 in alphabet:
> # l.append(aa2)
> #_set2.add(''.join(l))
> [cut]
>
> The reason I went for sets instead of lists is the speed,
> availability of unique, common and other methods.
> What would you propose as an elegant solution?
> Actually, even those nested for loops take ages. :(
> M.
_set1 only has 19 keys of size 1 character - 'A' is duplicated.
Assuming you replace 'A' with another character (say 'Z'), then here is what
you get:
_set1 - 20 elements (20**1)
_set2 - 400 elements (20**2)
_set3 - 8000 elements (20**3)
...
_set20 - 20**20 ~ 10 ^ (1.301*20) or 1E26
If you have no duplicates in your alphabet, then you wont need to use sets,
every combination will be unique. In this case, just use a generator.
Here's a recursive generator approach that may save you a bunch of nested
editing (set maxDepth as high as you dare):
alphabet = ('G', 'A', 'V', 'L', 'I', 'P', 'S', 'T', 'C', 'M', 'Z', 'Q', 'F',
'Y', 'W', 'K', 'R', 'H', 'D', 'E')
maxDepth = 3
def genNextCombo(root=list(),depth=1):
for a in alphabet:
thisRoot = root + [a]
yield "".join( thisRoot )
if depth < maxDepth:
for c in genNextCombo(thisRoot, depth+1):
yield c
for c in genNextCombo():
print c # or write to file, or whatever
-- Paul
More information about the Python-list
mailing list