[Python-checkins] CVS: python/nondist/peps pep-0218.txt,1.4,1.5

Greg Wilson gvwilson@users.sourceforge.net
Mon, 07 May 2001 12:51:12 -0700


Update of /cvsroot/python/python/nondist/peps
In directory usw-pr-cvs1:/tmp/cvs-serv28802

Modified Files:
	pep-0218.txt 
Log Message:
Updating PEP to reflect prototype implementation

Index: pep-0218.txt
===================================================================
RCS file: /cvsroot/python/python/nondist/peps/pep-0218.txt,v
retrieving revision 1.4
retrieving revision 1.5
diff -C2 -r1.4 -r1.5
*** pep-0218.txt	2000/12/14 17:11:17	1.4
--- pep-0218.txt	2001/05/07 19:51:10	1.5
***************
*** 2,9 ****
  Title: Adding a Built-In Set Object Type
  Version: $Revision$
! Author: gvwilson@nevex.com (Greg Wilson)
  Status: Draft
  Type: Standards Track
! Python-Version: 2.1
  Created: 31-Jul-2000
  Post-History: 
--- 2,9 ----
  Title: Adding a Built-In Set Object Type
  Version: $Revision$
! Author: gvwilson@ddj.com (Greg Wilson)
  Status: Draft
  Type: Standards Track
! Python-Version: 2.2
  Created: 31-Jul-2000
  Post-History: 
***************
*** 12,26 ****
  Introduction
  
!     This PEP proposes adding sets as a built-in type in Python.
  
  
  Rationale
  
-     One of Python's greatest strengths as a teaching language is its
-     clarity.  Its syntax and object model are so clean, and so simple,
-     that it can serve as "executable pseudocode".  Anything that makes
-     it even better suited for this role will help increase its use in
-     school and college courses.
- 
      Sets are a fundamental mathematical structure, and are very
      commonly used in algorithm specifications.  They are much less
--- 12,29 ----
  Introduction
  
!     This PEP proposes adding a Set module to the standard Python
!     library, and to then make sets a built-in Python type if that
!     module is widely used.  After explaining why sets are desirable,
!     and why the common idiom of using dictionaries in their place is
!     inadequate, we describe how we intend built-in sets to work, and
!     then how the preliminary Set module will behave.  The penultimate
!     section discusses the mutability (or otherwise) of sets and set
!     elements, and the solution which the Set module will implement.
!     The last section then looks at alternatives that were considered,
!     but discarded.
  
  
  Rationale
  
      Sets are a fundamental mathematical structure, and are very
      commonly used in algorithm specifications.  They are much less
***************
*** 43,59 ****
  
  
! Proposal
  
!     We propose adding a set type to Python.  This type will be an
!     unordered collection of unique values, just as a dictionary is an
!     unordered collection of key/value pairs.  Constant sets will be
!     represented using the usual mathematical notation, so that
!     "{1, 2, 3}" will be a set of three integers.
  
!     In order to avoid ambiguity, the empty set will be written "{,}",
      rather than "{}" (which is already used to represent empty
      dictionaries).  We feel that this notation is as reasonable as the
      use of "(3,)" to represent single-element tuples; a more radical
!     strategy is discussed in the "Alternatives" section.
  
      Iteration and comprehension will be implemented in the obvious
--- 46,64 ----
  
  
! Long-Term Proposal
  
!     The long-term goal of this PEP is to add a built-in set type to
!     Python.  This type will be an unordered collection of unique
!     values, just as a dictionary is an unordered collection of
!     key/value pairs.  Constant sets will be represented using the
!     usual mathematical notation, so that "{1, 2, 3}" will be a set of
!     three integers.
  
!     In order to avoid ambiguity, the empty set will be written "{-}",
      rather than "{}" (which is already used to represent empty
      dictionaries).  We feel that this notation is as reasonable as the
      use of "(3,)" to represent single-element tuples; a more radical
!     strategy is discussed in the "Alternatives" section, and more
!     readable than the earlier proposal "{,}".
  
      Iteration and comprehension will be implemented in the obvious
***************
*** 67,170 ****
  
      will produce a set containing the squares of all elements in S,
! 
!     Membership will be tested using "in" and "not in".
! 
!     The binary operators '|', '&', '-', and "^" will implement set
!     union, intersection, difference, and symmetric difference.  Their
!     in-place equivalents will have the obvious semantics.  (We feel
!     that it is more sensible to overload the bitwise operators '|' and
!     '&', rather than the arithmetic operators '+' and "*', because
!     there is no arithmetic equivalent of '^'.)
! 
!     The method "add" will add an element to a set.  This is different
!     from set union, as the following example shows:
! 
!         >>> {1, 2, 3} | {4, 5, 6}
!         {1, 2, 3, 4, 5, 6}
! 
!         >>> {1, 2, 3}.add({4, 5, 6})
!         {1, 2, 3, {4, 5, 6}}
! 
!     Note that we expect that items can also be added to sets using
!     in-place union of temporaries, i.e. "S |= {x}" instead of
!     "S.add(x)".
! 
!     Elements will be deleted from sets using a "remove" method, or
!     using "del":
! 
!         >>> S = {1, 2, 3}
!         >>> S.remove(3)
!         >>> S
!         {1, 2}
!         >>> del S[1]
!         >>> S
!         {2}
! 
!     The "KeyError" exception will be raised if an attempt is made to
!     remove an element which is not in a set.  This definition of "del"
!     is consistent with that used for dictionaries:
! 
!         >>> D = {1:2, 3:4}
!         >>> del D[1]
!         >>> D
!         {3:4}
! 
!     A new method "dict.keyset" will return the keys of a dictionary as
!     a set.  A corresponding method "dict.valueset" will return the
!     dictionary's values as a set.
! 
!     A built-in converter "set()" will convert any sequence type to a
!     set; converters such as "list()" and "tuple()" will be extended to
!     handle sets as input.
! 
! 
! Open Issues
! 
!     One major issue remains to be resolved: will sets be allowed to
!     contain mutable values, or will their values be required to
!     immutable (as dictionary keys are)?  The disadvantages of allowing
!     only immutable values are clear --- if nothing else, it would
!     prevent users from creating sets of sets.
! 
!     However, no efficient implementation of sets of mutable values has
!     yet been suggested.  Hashing approaches will obviously fail (which
!     is why mutable values are not allowed to be dictionary keys).
!     Even simple-minded implementations, such as storing the set's
!     values in a list, can give incorrect results, as the following
!     example shows:
! 
!         >>> a = [1, 2]
!         >>> b = [3, 4]
!         >>> S = [a, b]
!         >>> a[0:2] = [3, 4]
!         >>> S
!         [[3, 4], [3, 4]]
! 
!     One way to solve this problem would be to add observer/observable
!     functionality to every data structure in Python, so that
!     structures would know to update themselves when their contained
!     values mutated.  This is clearly impractical given the current
!     code base, and the performance penalties (in both memory and
!     execution time) would probably be unacceptable anyway.
  
  
  Alternatives
  
!     A more conservative alternative to this proposal would be to add a
!     new built-in class "Set", rather than adding new syntax for direct
!     expression of sets.  On the positive side, this would not require
!     any changes to the Python language definition.  On the negative
!     side, people would then not be able to write Python programs using
!     the same notation as they would use on a whiteboard.  We feel that
!     the more Python supports standard pre-existing notation, the
!     greater the chances of it being adopted as a teaching language.
! 
!     A radical alternative to the (admittedly clumsy) notation "{,}" is
!     to re-define "{}" to be the empty collection, rather than the
!     empty dictionary.  Operations which made this object non-empty
!     would silently convert it to either a dictionary or a set; it
!     would then retain that type for the rest of its existence.  This
!     idea was rejected because of its potential impact on existing
!     Python programs.  A similar proposal to modify "dict.keys" and
      "dict.values" to return sets, rather than lists, was rejected for
      the same reasons.
--- 72,188 ----
  
      will produce a set containing the squares of all elements in S,
!     Membership will be tested using "in" and "not in", and basic set
!     operations will be implemented by a mixture of overloaded
!     operators:
! 
!         |               union
!         &               intersection
!         ^               symmetric difference
!         -               asymmetric difference
! 
!     and methods:
! 
!         S.add(x)        Add "x" to the set.
! 
!         S.update(s)     Add all elements of sequence "s" to the set.
! 
!         S.remove(x)     Remove "x" from the set.  If "x" is not
!                         present, this method raises a LookupError
!                         exception.
! 
!         S.discard(x)    Remove "x" from the set if it is present, or
!                         do nothing if it is not.
! 
!         S.popitem()     Remove and return an arbitrary element,
!                         raising a LookupError if the element is not
!                         present.
! 
!     and one new built-in conversion function:
! 
!         set(x)          Create a set containing the elements of the
!                         collection "x".
! 
!     Notes:
! 
!     1. We propose using the bitwise operators "|&" for intersection
!        and union.  While "+" for union would be intuitive, "*" for
!        intersection is not (very few of the people asked guessed what
!        it did correctly).
! 
!     2. We considered using "+" to add elements to a set, rather than
!        "add".  However, Guido van Rossum pointed out that "+" is
!        symmetric for other built-in types (although "*" is not).  Use
!        of "add" will also avoid confusion between that operation and
!        set union.
! 
!     3. Sets raise "LookupError" exceptions, rather than "KeyError" or
!        "ValueError", because set elements are neither keys nor values.
! 
! Short-Term Proposal
! 
!     In order to determine whether there is enough demand for sets to
!     justify making them a built-in type, and to give users a chance to
!     try out the semantics we propose for sets, our short-term proposal
!     is to add a "Set" class to the standard Python library.  This
!     class will have the operators and methods described above; it will
!     also have named methods corresponding to all of the operations: a
!     "union" method for "|", and a "union_update" method for "|=", and
!     so on.
! 
!     This class will use a dictionary internally to contain set values.
!     In order to avoid having to duplicate values (e.g. for iteration
!     through the set), the class will rely on the iterators which are
!     scheduled to appear in Python 2.2.
! 
!     Tim Peters believes that the class's constructor should take a
!     single sequence as an argument, and populate the set with that
!     sequence's elements.  His argument is that in most cases,
!     programmers will be created sets from pre-existing sequences, so
!     that common case should be usual.  However, this would require
!     users to remember an extra set of parentheses when initializing a
!     set with known values:
! 
!     >>> Set((1, 2, 3, 4))       # case 1
! 
!     On the other hand, feedback from a small number of novice Python
!     users (all of whom were very experienced with other languages)
!     indicates that people will find a "parenthesis-free" syntax more
!     natural:
! 
!     >>> Set(1, 2, 3, 4)         # case 2
! 
!     On the other, other hand, if Python does adopt a dictionary-like
!     notation for sets in the future, then case 2 will become
!     redundant.  We have therefore adopted the first strategy, in which
!     the initializer takes a single sequence argument.
! 
! 
! Mutability
! 
!     The most difficult question to resolve in this proposal was
!     whether sets ought to be able to contain mutable elements.  A
!     dictionary's keys must be immutable in order to support fast,
!     reliable lookup.  While it would be easy to require set elements
!     to be immutable, this would preclude sets of sets (which are
!     widely used in graph algorithms and other applications).
! 
!     At Tim Peters' suggestion, we will implement the following
!     compromise.  A set may only contain immutable elements, but is
!     itself mutable *until* its hash code is calculated.  As soon as
!     that happens, the set is "frozen", i.e. becomes immutable.  Thus,
!     a set may be used as a dictionary key, or as a set element, but
!     cannot be updated after this is done.  Peters reports that this
!     behavior rarely causes problems in practice.
  
  
  Alternatives
  
!     An alternative to the notation "{-}" for the empty set would be to
!     re-define "{}" to be the empty collection, rather than the empty
!     dictionary.  Operations which made this object non-empty would
!     silently convert it to either a dictionary or a set; it would then
!     retain that type for the rest of its existence.  This idea was
!     rejected because of its potential impact on existing Python
!     programs.  A similar proposal to modify "dict.keys" and
      "dict.values" to return sets, rather than lists, was rejected for
      the same reasons.