Re: [Python-Dev] I think my set module is ready for prime time; comments?

Greg Wilson: Meta-question: do people want to continue to discuss sets on the general python-dev list, or take it out-of-line (e.g. to an egroups list)? I'm finding all of the discussion very useful, but I realize that many readers might prefer to concentrate on the 2.1 release...
Jeremy Hylton <jeremy@alum.mit.edu>:
The tests showed that dictionary-based sets were always faster. small tests (3 operations), the difference was about 10 percent. larger tests (88 operations), the difference ranged from 180 to almost 700 percent.
Eric Raymond <esr@thyrsus.com>: Not surprising. 88 elements is getting pretty large.
Greg Wilson: Really? I was testing my implementation with sets of email addresses grep'd out of old mail folders --- typical sizes were several thousand elements.
From: Christopher Petrilli <petrilli@amber.org> Unfortunately, for me, a Python implementation of Sets is only interesting academicaly. Any time I've needed to work with them at a large scale, I've needed them *much* faster than Python could achieve without a C extension.
Greg Wilson: I had been expecting to implement this in C, not in pure Python, for performance.
From: Christopher Petrilli <petrilli@amber.org> In the "scripting" problem domain, I would agree that Sets would rarely reach large sizes, and so a algorithm which performed in quadratic time might be fine,
Greg Wilson: I strongly disagree (see the email address example above --- it was the first thing that occurred to me to try). I am still hoping to find a sub-quadratic (preferably sub-linear) implementation. I can do it in C++ with observer/observable (contained items notify containers of changes in value, sets store all equivalent items in the same bucket), but that doesn't really help...
From: Ka-Ping Yee <ping@lfw.org> The only change that needs to be made to support sets of immutable elements is to provide "in" on dictionaries...
and:
From: Neil Schemenauer <nas@arctrix.com> ...if sets are added to the core...we would use the implementation of PyDict but drop the values.
Unfortunately, if values are required to be immutable, then sets of sets aren't possible... :-( Thanks, everyone, Greg

[Greg Wilson]
... Unfortunately, if values are required to be immutable, then sets of sets aren't possible... :-(
Sure they are. I wrote about how before, and Moshe put up a simple implementation as a SourceForge patch. Not bulletproof, though: "consentng adults". No matter *what* you implement, I'll find *some* way to trick it into believing my sets are immutable <wink>, so don't worry about that. Bulletproof is very hard, and is a minority distraction at best. IIRC, SETL had "by value" semantics when inserting a set into another set as an element, and had some exceedingly hairy copy-on-write scheme under the covers to make that bearably quick. That may be wrong, though. Herman Venter's Slim (Sets, Lists and Maps) language does work that way (Guido, Herman was a friend of the departed Stoffel Erasmus, who you may recall fondly from Python's very early days -- if *that* doesn't make sets attractive to you, nothing will <wink>). Ah! Meant to post this before: http://birch.eecs.lehigh.edu/~bacon/setlprog.ps.gz That's a readable and very good intro to SETL Classic. People pondering computerized sets should at least catch up with what was common knowledge 30 years ago <wink>.
participants (2)
-
Greg Wilson
-
Tim Peters