I think that at this point the possibilities for doing sets
come down to four options:
1. use lists
visible changes: new methods l.include, l.exclude
invisible changes: faster 'in'
usage: s = [1, 2], s.include(3), s.exclude(3),
if item in s, for item in s
2. use dicts
visible changes: for/if x in dict means keys
accept dicts without values (e.g. {1, 2})
new special non-printing value ": Present"
new method d.insert(x) means d[x] = Present
invisible changes: none
usage: s = {1, 2}, s.insert(3), del s[3],
if item in s, for item in s
3. new type
visible changes: set() built-in
new
On Mon, 20 Mar 2000, Ka-Ping Yee wrote:
I think that at this point the possibilities for doing sets come down to four options:
1. use lists 2. use dicts 3. new type 4. do nothing
5. new Python module with a class "Set"
(The issues are similar to #3, but this has the advantage of not changing
the interpreter)
--
Moshe Zadka
Moshe Zadka wrote:
On Mon, 20 Mar 2000, Ka-Ping Yee wrote:
I think that at this point the possibilities for doing sets come down to four options:
1. use lists 2. use dicts 3. new type 4. do nothing
5. new Python module with a class "Set" (The issues are similar to #3, but this has the advantage of not changing the interpreter)
Perhaps someone could take Aaron's kjbuckets and write a Python emulation for it (I think he's even already done something like this for gadfly). Then the emulation could go into the core and if people want speed they can install his extension (the emulation would have to detect this and use the real thing then). -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
"MAL" == M -A Lemburg
writes:
MAL> Perhaps someone could take Aaron's kjbuckets and write a Python MAL> emulation for it (I think he's even already done something like MAL> this for gadfly). Then the emulation could go into the core and MAL> if people want speed they can install his extension (the MAL> emulation would have to detect this and use the real thing MAL> then). I've been waiting for Tim Peters to say something about sets, but I'll chime in with what I recall him saying last time a discussion like this came up on c.l.py. (I may misremember, in which case I'll at least draw him into the discussion in order to correct me <0.5 wink>.) The problem with a set module is that there are a number of different ways to implement them -- in C using kjbuckets is one example. Each approach is appropriate for some applications, but not for every one. A set is pretty simple to build from a list or a dictionary, so we leave it to application writers to write the one that is appropriate for their application. Jeremy
Jeremy wrote:
The problem with a set module is that there are a number of different ways to implement them -- in C using kjbuckets is one example.
Nah. Sets are pretty unambiguous. They're also easy, and boring. The interesting stuff is graphs and operations like composition, closure and transpositions. That's also where stuff gets ambiguous. E.g., what's the right behavior when you invert {'a':1,'b':1}? Hint: any answer you give will be met by the wrath of God. I would love this stuff, and as a faithful worshipper of Our Lady of Corrugated Ironism, I could probably live with whatever rules are arrived at; but I'm afraid I would have to considerably enlarge my kill file. - Gordon
On Tue, 21 Mar 2000, Gordon McMillan wrote:
E.g., what's the right behavior when you invert {'a':1,'b':1}? Hint: any answer you give will be met by the wrath of God.
Isn't "wrath of God" translated into Python is "an exception"?
raise ValueError("dictionary is not 1-1")
seems fine to me.
--
Moshe Zadka
On Tue, 21 Mar 2000, Jeremy Hylton wrote:
"MAL" == M -A Lemburg
writes: MAL> Perhaps someone could take Aaron's kjbuckets and write a Python MAL> emulation for it (I think he's even already done something like MAL> this for gadfly). Then the emulation could go into the core and MAL> if people want speed they can install his extension (the MAL> emulation would have to detect this and use the real thing MAL> then). I've been waiting for Tim Peters to say something about sets, but I'll chime in with what I recall him saying last time a discussion like this came up on c.l.py. (I may misremember, in which case I'll at least draw him into the discussion in order to correct me <0.5 wink>.)
The problem with a set module is that there are a number of different ways to implement them -- in C using kjbuckets is one example. Each approach is appropriate for some applications, but not for every one. A set is pretty simple to build from a list or a dictionary, so we leave it to application writers to write the one that is appropriate for their application.
Yah... +1 on what Jeremy said. Leave them out of the distro since we can't do them Right for all people. Cheers, -g -- Greg Stein, http://www.lyra.org/
Marc> Perhaps someone could take Aaron's kjbuckets and write a Python Marc> emulation for it ... Any reason why kjbuckets and friends have never been placed in the core? If, as it seems from the discussion, a set type is a good thing to add to the core, it seems to me that Aaron's code would be a good candidate implementation/foundation. -- Skip Montanaro | http://www.mojam.com/ skip@mojam.com | http://www.musi-cal.com/
"SM" == Skip Montanaro
writes:
SM> Any reason why kjbuckets and friends have never been placed in SM> the core? If, as it seems from the discussion, a set type is SM> a good thing to add to the core, it seems to me that Aaron's SM> code would be a good candidate implementation/foundation. It would seem to me that distutils is a better way to go for kjbuckets. The core already has basic sets (via dictionaries). We're pretty much just quibbling about efficiency, API, and syntax, aren't we? -Barry
Jeremy Hylton wrote:
The problem with a set module is that there are a number of different ways to implement them -- in C using kjbuckets is one example. Each approach is appropriate for some applications, but not for every one.
For me, anyway, this is not about trying to engineer a universally perfect solution into Python -- it's about providing some simple, basic, easy-to-understand functionality that takes care of the common case. For example, dictionaries are simple, their workings are easy enough to understand, and they aren't written to efficiently support things like inversion and composition because most of the time no one needs to do these things. The same holds true for sets. All i would want is something i can put things into, and take things out of, and ask about what's inside. Barry Warsaw wrote:
It would seem to me that distutils is a better way to go for kjbuckets. The core already has basic sets (via dictionaries). We're pretty much just quibbling about efficiency, API, and syntax, aren't we?
Efficiency: Hashtables have proven quite adequate for dicts, so i think they're quite adequate for sets. API and syntax: I believe the goal is obvious, because Python already has very nice notation ("in", "not in") -- it just doesn't work quite the way one would want. It works semantically right on lists, but they're a little slow. It doesn't work on dicts, but we can make it so. Here is where my "explanation metric" comes into play. How much additional explaining do you have to do in each case to answer the question "what do i do when i need a set"? 1. Use lists. Explain that "include()" means "append if not already present", and "exclude()" means "remove if present". You are done. 2. Use dicts. Explain that "for x in dict" iterates over the keys, and "if x in dict" looks for a key. Explain what happens when you write "{1, 2, 3}", and the special non-printing value constant. Explain how to add elements to a set and remove elements from a set. 3. Create a new type. Explain that there exists another type "set" with methods "insert" and "remove". Explain how to construct sets. Explain how "in" and "not in" work, where this type fits in with the other types, and when to choose this type over other types. 4. Do nothing. Explain that dictionaries can be used as sets if you assign keys a dummy value, use "del" to remove keys, iterate over "dict.keys()", and use "dict.has_key()" to test membership. This is what motivated my proposal for using lists: it requires by far the least explanation. This is no surprise because a lot of things about lists have been explained already. My preference in terms of elegance is about equal for 1, 2, 3, with 4 distinctly behind; but my subjective ranking of "explanation complexity" (as in "how to get there from here") is 1 < 4 < 3 < 2. -- ?!ng
BAW> It would seem to me that distutils is a better way to go for BAW> kjbuckets. The core already has basic sets (via dictionaries). BAW> We're pretty much just quibbling about efficiency, API, and syntax, BAW> aren't we? Yes (though I would quibble with your use of the word "quibbling" ;-). If new syntax is in the offing as some have proposed, why not go for a more efficient implementation at the same time? I believe Aaron has maintained that kjbuckets is generally more efficient than Python's dictionary object. Skip
On Tue, 21 Mar 2000, Skip Montanaro wrote:
BAW> It would seem to me that distutils is a better way to go for BAW> kjbuckets. The core already has basic sets (via dictionaries). BAW> We're pretty much just quibbling about efficiency, API, and syntax, BAW> aren't we?
If new syntax is in the offing as some have proposed,
FWIW, I'm against new syntax. The core-language has changed quite a lot between 1.5.2 and 1.6 -- * strings have grown methods * there are unicode strings * "in" operator overloadable The second change even includes a syntax change (u"some string") whose variants I'm still not familiar enough to comment on (ru"some\string"? ur"some\string"? Both legal?). I feel too many changes destabilize the language (this might seem a bit extreme, considering I pushed towards one of the changes), and we should try to improve on things other then the core -- one of these is a more hierarchical standard library, and a standard distribution mechanism, to rival CPAN -- then anyone could import data.sets.kjbuckets With only a trivial
import dist dist.install("data.sets.kjbuckets")
why not go for a more efficient implementation at the same time?
Because Python dicts are "pretty efficient", and it is not a trivial
question to check optimiality in this area: tests can be rigged to prove
almost anything with the right test-cases, and there's no promise we'll
choose the "right ones".
--
Moshe Zadka
Skip> If new syntax is in the offing as some have proposed, Moshe> FWIW, I'm against new syntax. The core-language has changed quite Moshe> a lot between 1.5.2 and 1.6 -- I thought we were talking about Py3K, where syntax changes are somewhat more expected. Just to make things clear, the syntax change I was referring to was the value-less dict syntax that someone proposed a few days ago: myset = {"a", "b", "c"} Note that I wasn't necessarily supporting the proposal, only acknowledging that it had been made. In general, I think we need to keep straight where people feel various proposals are going to fit. When a thread goes for more than a few messages it's easy to forget. -- Skip Montanaro | http://www.mojam.com/ skip@mojam.com | http://www.musi-cal.com/
On Tue, 21 Mar 2000, Skip Montanaro wrote:
Skip> If new syntax is in the offing as some have proposed,
Moshe> FWIW, I'm against new syntax. The core-language has changed quite Moshe> a lot between 1.5.2 and 1.6 --
I thought we were talking about Py3K
My argument was strictly a 1.x argument. I'm hoping to get sets it in 1.7 or 1.8.
In general, I think we need to keep straight where people feel various proposals are going to fit.
You're right. I'll start prefixing my posts accordingally.
--
Moshe Zadka
participants (8)
-
Barry A. Warsaw
-
Gordon McMillan
-
Greg Stein
-
Jeremy Hylton
-
Ka-Ping Yee
-
M.-A. Lemburg
-
Moshe Zadka
-
Skip Montanaro