[Python-Dev] Re: PEP 218 (sets); moving set.py to Lib

François Pinard pinard@iro.umontreal.ca
24 Aug 2002 14:41:16 -0400

[Magnus Lie Hetland]

> I guess the idea was to use lazy sets for some such operations. Then
> you could build complex expressions through cartesian products, unions,
> intersections, set differences, set comprehensions etc. without actually
> constructing the full set.

Allow me some random thoughts.  (Aren't they always random, anyway? :-)

When I saw some of the suggestions on this list for "generating" elements of
a cartesian product, despite sometimes elegant, I thought "Too much done,
too soon.".  But the truth is that I did not give the thing a serious try,
I'm not sure I would be able to offer anything better.

One nice thing, with a dict or a set, is that we can quickly access how many
entries there are in there.  Is there some internal way to efficiently fetch
the N'th element, from the order in which the keys would be naturally listed?
If not, one could always pay some extra temporary memory to build a list
of these keys first.  If you have to "generate" a cartesian product for
N sets, you could set up a compound counter as a list of N indices, the
K'th meant to run from 0 up to the cardinality C[K] of the K'th set, and
devise simple recipes to yield the element of the product represented by
the counter, and to bump it.  Moreover, it would be trivial to equip this
generator with a `__len__' function able to predict the cardinality CCC of
the whole result, and quite easy being able to transform any KKK between
0 and NNN into an equivalent compound counter, and from there, access any
member of the cartesian product at constant speed, without generating it all.

All the above is pretty simple, and meant to introduce a few suggestions
that might solve once and for all, if we could do it well enough, a
re-occurring request on the Python list about how to produce permutations
and al.  We might try to rewrite the recipes behind a "generating" cartesian
product of many sets, illustrated above, into a similar generating function
able to produce all permutations of a single set.  So let's say:

   Set([1, 2, 3]).permutations()

would lazily produce the equivalent of:

   Set([(1, 2, 3), (1, 3, 2), (2, 1, 3), (2, 3, 1), (3, 1, 2), (3, 2, 1)])

That generator could offer a `__len__' function predicting the cardinality
CCC of the result, and some trickery could be used to map integers from 0 to
CCC into various permutations.  Once on the road, it would be worth offering
combinations and arrangements just as well, and maybe also offering the
"power" set, I mean here the set of all subsets of a given set, all with
predictable cardinality and constant speed access to any member by index.

Yet, many questions or objections may rise.  Using tuples to represent
each element of a cartesian product of many sets is pretty natural, but
it is slightly less clear that a tuple is the best fit for representing
an ordered set as in permutations and arrangements, as tuples may allow
elements to be repeated, while an ordered set does not.  I think that sets
are to be preferred over tuples for returning combinations or subsets.

While it is natural to speak and think of subsets of a set, or permutations,
arrangements and combinations of a set, some people might prefer to stay
closer to an underlying implementations with lists (sublists of a list,
permutations, arrangements or combinations of a list), and would feel that
going through sets is an unwelcome detour for their applications.  Indeed,
what's the real use and justficiation for hashing keys and such things,
when one wants nothing else than arrangements from a list?

Another aspect worth some thinking is that permutations, in particular, are
mathematical objects in themselves: we can notably multiply permutations or
take the inverse of a permutation.  Arrangements are in fact permutations
over combinations elements.  Some thought is surely needed for properly
reflecting mathematical elegance into how the set API is extended for the
above, and not merely burying that elegance under practical concerns.

Some people may think that these are all problems which are orthogonal
to the design of a basic set feature, and which should be addressed in
separate Python modules.  On the other hand, I think that a better and
nicer integration might result if all these things were thought together,
and thought sooner than later.  Moreover, my intuition tells me that with
some care and luck (both are needed), these extra set features could be
small enough additions to the `sets' module to not be worth another one.
Besides, if appropriate, such facilities would surely add a lot of zest and
punch into the interest raised by the `sets' module when it gets published.

François Pinard   http://www.iro.umontreal.ca/~pinard