[Python-3000] sets in P3K?

Mon Apr 24 19:48:27 CEST 2006

On 4/24/06, Alex Martelli <aleaxit at gmail.com> wrote:
> On 4/24/06, Greg Wilson <gvwilson at cs.utoronto.ca> wrote:
> > Interesting --- I think that being able to write down a data structure
> > using the same sort of notation you'd use on a whiteboard in a high school
> > math class is one of the great strengths of scripting languages, and one
> > of the things that makes it possible to use Python, Perl, and Ruby as
> > configuration languages (instead of the XML that Java/C# users have to put
> > up with).  I think most newcomers will find:
> >
> > x = {2, 3, 5, 7}
> >
> > more appealing than:
> >
> > x = set(2, 3, 5, 7)
> >
> > though I don't have any data to support that.
>
> It's exactly because we have no data that we can have a cool debate;-).
>
> Python doesn't mean to support set-theory notation -- it doesn't even
> have the most elementary operators, such as the epsilon-like thingy
> for membership test, the big-funky-U for union and reverse-that for
> intersection.  Anybody expecting to use set-theory notation will  be
> disappointed nearly instantly, just as soon as they try to DO anything
> with their sets, they'll have to use words rather than funky mathlike
> glyphs.

I totally disagree. There are different conventions for set-theory
notation, and mapping the non-ASCII symbols on other operators is a
standard convention (hey, we use * to mean multiplication!).

Python has 'in' for <epsilon>', '&' for <reverse-U> and '|' for
<regular U>. That's a very sensible mapping (and I believe the latter
two are common in programming languages, not just for Booleans, but
for sets). I think that being able to write {1,2,3} instead of funky
set([1,2,3])  would be a decent addition to the list.

> So, since you have to code anyway "if z in <someset>" rather than "if
> z <epsilonthingy> <someset>", for example, then even for the "funky
> mathlike glyphlover" there's little added value if <someset> can be
> expressed with funky glyphs rather than spelled out into readable
> words. When it comes to operation, we support operator-glyphs such as
> & and |, which deviate from set-theoretical notation (and towards
> Boolean notation instead) in a way that I suspect might be even more
> irking to the set-theory-glyphs lover, and confusing to the high
> school student (who might get an F for confusing & with
> reverse-U-intersection, if his or her classes cover both settheory and
> Boolean logic -- though in the latter case they're more likely to have
> to use reverse-V for ``and'', and, once again, Python can't do that).
>
> IOW, it doesn't seem to me that the "high school whiteboard" test can
> be even very roughly approximated, so focusing on it for
> literals-notation only may do more harm than good; while introducing
> "nice readable words with a little punctuation to help out" from the
> very start (literals of most types, except numbers and strings) keeps
> things and expectations a bit more consistent with each other, as well
> as producing code that's easier to read out loud.

Reading out loud is a lost cause anyway -- you explicitly have to read
all the squiggly marks to be the least unambiguous.

Nobody said anything about high-school whiteboards.

It's simply that sets will remain second-class citizens as long as we
don't have a literal notation for them. {1,2,3} is certainly
unambiguously the best notation for set literals -- and it isn't even
ambiguous with dictionaries (nor hard to parse with our LL1 parser).
The *only* point of contention (and AFAIK the only issue that Barry
commented on) is that there's an ambiguity for the empty set, which
*could* be resolved by continuing to use set() for that. I believe
that mathematicians use a crossed-out little oh to indicate the empty
set, so a notational discontinuity is not unheard of.

OTOH mathematicians (whether in high school or not) write things like
{x | 2 < x < 10}, which is of course the origin of our list
comprehensions and generator expressions. Therefor I think it makes
sense that {F(x) for x in S if P(x)}, ough to be valid syntax if we
support {1, 2, 3} -- IOW the form {<genexp>} should mean the same as
set(<genexp>).

--
--Guido van Rossum (home page: http://www.python.org/~guido/)