[Python-Dev] Re: PEP 218 (sets); moving set.py to Lib
Guido van Rossum
guido@python.org
Sun, 25 Aug 2002 17:56:46 -0400
[François]
> Allow me some random thoughts. (Aren't they always random, anyway? :-)
Maybe you could contribute some not-so-random code? Words we've got
enough. :-)
> When I saw some of the suggestions on this list for "generating"
> elements of a cartesian product, despite sometimes elegant, I
> thought "Too much done, too soon.". But the truth is that I did not
> give the thing a serious try, I'm not sure I would be able to offer
> anything better.
>
> One nice thing, with a dict or a set, is that we can quickly access
> how many entries there are in there. Is there some internal way to
> efficiently fetch the N'th element, from the order in which the keys
> would be naturally listed? If not, one could always pay some extra
> temporary memory to build a list of these keys first. If you have
> to "generate" a cartesian product for N sets, you could set up a
> compound counter as a list of N indices, the K'th meant to run from
> 0 up to the cardinality C[K] of the K'th set, and devise simple
> recipes to yield the element of the product represented by the
> counter, and to bump it. Moreover, it would be trivial to equip
> this generator with a `__len__' function able to predict the
> cardinality CCC of the whole result, and quite easy being able to
> transform any KKK between 0 and NNN into an equivalent compound
> counter, and from there, access any member of the cartesian product
> at constant speed, without generating it all.
Since the user can easily multiply the length of the input sets
together, what's the importance of the __len__?
And what's the use case for randomly accessing the members of a
cartesian product? IMO, the Cartesian product is mostly useful for
abstract matehematical though, not for solving actual programming
problems.
> All the above is pretty simple, and meant to introduce a few
> suggestions that might solve once and for all, if we could do it
> well enough, a re-occurring request on the Python list about how to
> produce permutations and al. We might try to rewrite the recipes
> behind a "generating" cartesian product of many sets, illustrated
> above, into a similar generating function able to produce all
> permutations of a single set. So let's say:
>
> Set([1, 2, 3]).permutations()
>
> would lazily produce the equivalent of:
>
> Set([(1, 2, 3), (1, 3, 2), (2, 1, 3), (2, 3, 1), (3, 1, 2), (3, 2, 1)])
>
> That generator could offer a `__len__' function predicting the
> cardinality CCC of the result, and some trickery could be used to
> map integers from 0 to CCC into various permutations. Once on the
> road, it would be worth offering combinations and arrangements just
> as well, and maybe also offering the "power" set, I mean here the
> set of all subsets of a given set, all with predictable cardinality
> and constant speed access to any member by index.
Obviously you were inspired here by Eric Raymond's implementation of
the powerset generator...
But again I ask, what's the practical use of random access to
permutations? (I know there are plenty of uses of permutations, but I
doubt the need for random access.)
> Yet, many questions or objections may rise. Using tuples to
> represent each element of a cartesian product of many sets is pretty
> natural, but it is slightly less clear that a tuple is the best fit
> for representing an ordered set as in permutations and arrangements,
> as tuples may allow elements to be repeated, while an ordered set
> does not. I think that sets are to be preferred over tuples for
> returning combinations or subsets.
You must have temporarily stopped thinking clearly there. Seen as
sets, all permutations of a set's elements are the same! The proper
output for a generator of permutations is a list; for generality, its
input should be any iterable (which includes sets). If the input
contains duplicates, well, that's the caller's problem.
For combinations, sets are suitable as output, but again, I think it
would be just a suitable to take a list and generate lists -- after
all the lists are trivially turned into sets.
> While it is natural to speak and think of subsets of a set, or
> permutations, arrangements and combinations of a set, some people
> might prefer to stay closer to an underlying implementations with
> lists (sublists of a list, permutations, arrangements or
> combinations of a list), and would feel that going through sets is
> an unwelcome detour for their applications. Indeed, what's the real
> use and justficiation for hashing keys and such things, when one
> wants nothing else than arrangements from a list?
Right.
> Another aspect worth some thinking is that permutations, in
> particular, are mathematical objects in themselves: we can notably
> multiply permutations or take the inverse of a permutation.
That would be a neat class indeed. How useful it would be in practice
remains to be seen. Do you do much ad-hoc permutation calculations?
> Arrangements are in fact permutations over combinations elements.
> Some thought is surely needed for properly reflecting mathematical
> elegance into how the set API is extended for the above, and not
> merely burying that elegance under practical concerns.
And, on the other hand, practicality beats purity.
> Some people may think that these are all problems which are
> orthogonal to the design of a basic set feature, and which should be
> addressed in separate Python modules. On the other hand, I think
> that a better and nicer integration might result if all these things
> were thought together, and thought sooner than later. Moreover, my
> intuition tells me that with some care and luck (both are needed),
> these extra set features could be small enough additions to the
> `sets' module to not be worth another one. Besides, if appropriate,
> such facilities would surely add a lot of zest and punch into the
> interest raised by the `sets' module when it gets published.
I'd rather see the zest added to Python as a whole -- sets are a tiny
part, and if you read PEP 218, you'll see that the sets module is only
a modest first step of that PEP's program.
--Guido van Rossum (home page: http://www.python.org/~guido/)