Persistent sets

Skip Montanaro skip at pobox.com
Thu Jun 5 23:33:30 EDT 2003


Related to the trigraph thread, I thought it would be nice if the Set
output by the three-character n-gram generator was persistent because it
takes awhile to generate.  I was pleasantly surprised to see how easy it is:

    import sets
    import shelve

    class PersistentSet(sets.Set):
        def __init__(self, file, iterable=None):
            self._data = shelve.open(file)
            if iterable is not None:
                self._update(iterable)

I haven't tried it for anything other than the little generator, and while
obviously slower to generate than if I had used a regular set, it saves a
lot of startup time when you have a large mostly read-only set.

    >>> s = psets.PersistentSet("trigraphs")
    >>> "abc" in s
    True
    >>> "xzz" in s
    False
    >>> "aaa" in s
    False

The above trigraph set contains 8106 elements and was generated from
/usr/share/dict/words, which on my current machine contains 234937 words.

Skip





More information about the Python-list mailing list