[Python-Dev] I think my set module is ready for prime time; comments?

Eric S. Raymond esr@thyrsus.com
Mon, 22 Jan 2001 12:41:59 -0500


\section{\module{set} ---
         Basic set algebra for Python}

\declaremodule{standard}{set}
\modulesynopsis{Basic set algebra operations on sequences.}
\moduleauthor{Eric S. Raymond}{esr@thyrsus.com}
\sectionauthor{Eric S. Raymond}{esr@thyrsus.com}

The \module{set} module defines functions for treating lists and other
sequences as mathematical sets, and defines a set class that uses
these operations natively and overloads Python's standard operator set.

The \module{set} functions work on any sequence type and return lists.
The set methods can take a set or any sequence type as an argument.
Set or sequence elements may be of any type and may be mutable.
Comparisons and membership tests of elements against sequence objects
are done using \keyword{in}, and so can be customized by supplying a 
suitable \method{__getattr__} method for the sequence type.

The running time of these functions is O(n**2) in the worst case
unless otherwise noted.  For cases that can be short-circuited by 
cardinality comparisons, this has been done.

\begin{funcdesc}{setify}{list1}
Returns a list of the argument sequence's elements with duplicates removed.
\end{funcdesc}

\begin{funcdesc}{union}{list1, list2}
Set union.  All elements of both sets or sequences are returned.
\end{funcdesc}

\begin{funcdesc}{intersection}{list1, list2}
Set intersection.  All elements common to both sets or sequences are returned.
\end{funcdesc}

\begin{funcdesc}{difference}{list1, list2}
Set difference.  All elements of the first set or sequence not present
in the second are returned.
\end{funcdesc}

\begin{funcdesc}{symmetric_difference}{list1, list2}
Set symmetric difference.  All elements present in one sequence or the other
but not in both are returned.
\end{funcdesc}

\begin{funcdesc}{cartesian}{list1, list2}
Returns a list of tuples consisting of all possible pairs of elements
from the first and second sequences or sets.
\end{funcdesc}

\begin{funcdesc}{equality}{list1, list2}
Set comparison.  Return 1 if the two sets or sequences contain exactly
the same elements, 0 or otherwise.
\end{funcdesc}

\begin{funcdesc}{subset}{list1, list2}
Set subset test.  Return 1 if all elements of the fiorst set or
sequence are members of the second, 0 otherwise.
\end{funcdesc}

\begin{funcdesc}{proper_subset}{list1, list2}
Set subset test, excluding equality.  Return 1 if the arguments fail a
set equality test, and all elements of the fiorst set or sequence are
members of the second, 0 otherwise.
\end{funcdesc}

\begin{funcdesc}{powerset}{list1}
Return the set of all subsets of the argument set or
sequence. Warning: this produces huge results from small arguments and
is O(2**n) in both running time and space requirements; you can
readily run yourself out of memory using it.
\end{funcdesc}

\subsection{set Objects \label{set-objects}}

A \class{set} instance uses the \module{set} module functions to
implement set semantics on the list it contains, and to support 
a full set of Python list methods and operaors.  Thus, the set
methods can take a set or any sequence type as an argument.  

A set object contains a single data member:

\begin{memberdesc}{elements}
List containing the elements of the set.  
\end{memberdesc}

Set objects can be treated as mutable sequences; they support the
special methods 
\method{__len__}, 
\method{__getattr__},
\method{__setattr__}, 
and \method{__delattr__}.  
Through
\method{__getattr__}, they support the memebership test via
\keyword{in}. All the standard mutable-sequence methods
\method{list}, 
\method{append}, 
\method{extend}, 
\method{count}, 
\method{index}, 
\method{insert} (the index argument is ignored), 
\method{pop}, 
\method{remove}, 
\method{reverse}, 
and \method{sort}
are also supported.  After method calls that add elements
(\method{setattr},
\method{append}, \method{extend}, \method{insert}), the
elements of the data member are re-setified, so it is not possible to
introduce duplicates.

Calling \function{repr()} on a set returns the result of calling
\function{repr} on its element list.  Calling \function{str()} returns
a representation resembling mathematical notation for the set; an
open set bracket, followed by a comma-separated list of \function{str()}
representations of the elements, followed by a close set brackets.

Set objects support the following Python operators:

\begin {tableiii}{l|l|l}{code}{Operator}{Function}{Description}
\lineiii{|,+}{union}{Union}
\lineiii{&}{intersection}{Intersection}
\lineiii{-}{difference}{Difference}
\lineiii{^}{symmetric_difference}{Symmetric differe}
\lineiii{*}{cartesian}{Cartesian product}
\lineiii{==}{equality}{Equality test}
\lineiii{!=,<>}{}{Inequality test}
\lineiii{<}{proper_subset}{Proper-subset test}
\lineiii{<=}{subset}{Subset test}
\lineiii{>}{}{Proper superset test}
\lineiii{>=}{}{Superset test}
\end {tableiii}

-- 
		<a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a>

Government is actually the worst failure of civilized man. There has
never been a really good one, and even those that are most tolerable
are arbitrary, cruel, grasping and unintelligent.
	-- H. L. Mencken