[Python-Dev] set() and frozenset()

Raymond Hettinger raymond.hettinger at verizon.net
Sat Nov 15 13:19:39 EST 2003


The C implementation of sets is now ready for your experimentation and
comment.  If fulfills most of Greg's proposal at:
http://www.python.org/peps/pep-0218.html

The files are at: nondist\sandbox\setobj 
Build it with:  python setup.py build -g install 
Test it with:  python_d test_set.py



The differences from sets.py are:

. The containers are now named set() and frozenset().  The lowercase
convention is used to make the names consistent with other builtins like
list() and dict().  User feedback said that the name ImmutableSet was
unwieldy (?sp), so frozenset() was chosen as a more succinct
alternative.

. There is no set.update() method because that duplicated
set.union_update().

. There is no automatic conversion from the mutable type to the
non-mutable type.  User feedback revealed that this was never needed.
Also, removing it simplified the code considerably.  The result is more
straight-forward and a lot less magical.  David Eppstein provided code
examples demonstrating that set() and frozenset() are just right for
implementing common graph algorithms and NFA/DFAs.  

. The __deepcopy__() method will be implemented in copy.py instead of
setmodule.c.  This is consistent with other builtin containers and keeps
all the deepcopying knowledge in one place.  Also, the code is much
simpler in pure python and I wanted avoid importing the copy module
inside setobject.c.

. The __getstate__() and __setstate__() methods were replaced by
__reduce__().  Pickle sizes were made much smaller by saving just the
keys instead of key:True pairs.

. There is no equivalent of BaseSet.  This saves adding another builtin
and it is not a burden to write isinstance(s, (set, frozenset)).



The difference from PEP 218 is:

. There is not a special syntax for constructing sets.  Once generator
expressions are implemented, special notations become superfluous.  It
is simple enough to write:  s = set(iterable).



Though the implementation is basically done and ready for you guys to
experiment with, I still have a few open items:

. Expand the unittests to include all of the applicable tests from the
existing test_sets.py

. Refactor the error exits to use goto and XDECREF.

. Do one more detailed (line-by-line review).

. Write the docs.

. Recast the extension module to be a builtin object.
 
. Note, the original sets.py will be left unchanged so that code written
for it will continue to run without modification.  For those interested
in speed and pickle size, it is simple enough to search-and-replace
"sets.Set" with "set".



Raymond Hettinger





More information about the Python-Dev mailing list