[Python-Dev] Re: PEP 218 (sets); moving set.py to Lib

François Pinard pinard@iro.umontreal.ca
26 Aug 2002 10:09:45 -0400

[Guido van Rossum]

> Since the user can easily multiply the length of the input sets together,
> what's the importance of the __len__?  And what's the use case for
> randomly accessing the members of a cartesian product?

For cartesian product, permutations, combinations, arrangements and power
sets, I do not see real use to __len__ or random accessing of members
besides explicit looping (or maybe saving an indice for later resumption).
In my own case, I guess iterators (or generators) would leave me happy enough
that I do not really need more.

> IMO, the Cartesian product is mostly useful for abstract matehematical
> though, not for solving actual programming problems.

One practical application pops to mind.  People might progressively use
and abuse the paradigm of looping over the members of a cartesian product,
instead of relying on nests of embedded loops over each of the set members.

> > So let's say:
> >    Set([1, 2, 3]).permutations()
> > would lazily produce the equivalent of:
> >    Set([(1, 2, 3), (1, 3, 2), (2, 1, 3), (2, 3, 1), (3, 1, 2), (3, 2, 1)])
> > 
> > That generator could offer a `__len__' function predicting the
> > cardinality CCC of the result, and some trickery could be used to map
> > integers from 0 to CCC into various permutations.  Once on the road,
> > it would be worth offering combinations and arrangements just as well,
> > and maybe also offering the "power" set, I mean here the set of all
> > subsets of a given set, all with predictable cardinality and constant
> > speed access to any member by index.

> Obviously you were inspired here by Eric Raymond's implementation of
> the powerset generator...

I do not think so.  I was inspired by the remembering I have from the SSP
package (something like Social Science Package - a FORTRAN library - this
was before SPSS).  It was especially clever about enumerating permutations.

> But again I ask, what's the practical use of random access to permutations?

The only one I see is for resuming an interrupted enumeration.

> > Using tuples to represent each element of a cartesian product of many
> > sets is pretty natural, but it is slightly less clear that a tuple
> > is the best fit for representing an ordered set as in permutations
> > and arrangements, as tuples may allow elements to be repeated, while
> > an ordered set does not.  I think that sets are to be preferred over
> > tuples for returning combinations or subsets.

> You must have temporarily stopped thinking clearly there.

Maybe not, but I do not always express myself clearly.  Sorry.

> Seen as sets, all permutations of a set's elements are the same!

It's no use seeing each individual permutation as a set, of course.  But the
_set_ of all permutations, each of which being a tuple, is meaningful,
at least from the fact that permutations are conceptually unordered.
However, it might be (sometimes, maybe not that often) useful to enumerate
permutations in some canonical order.

> For combinations, sets are suitable as output, but again, I think it
> would be just a suitable to take a list and generate lists -- after
> all the lists are trivially turned into sets.

Quite agreed.

> > Another aspect worth some thinking is that permutations, in
> > particular, are mathematical objects in themselves: we can notably
> > multiply permutations or take the inverse of a permutation.

> That would be a neat class indeed.  How useful it would be in practice
> remains to be seen.  Do you do much ad-hoc permutation calculations?

A few common algorithms also make use of permutations without naming them.
`sort' applies a precise permutation, and it is often convenient to inverse
that permutation to recover the original order after having enriched the
resulting structure, say.  I quite often resorted to the above trick.
Notice that `string.translate' applies a permutation.

In another life, I wrote a program (unknown outside CDC Cyber space) that
was efficiently comparing possibly big files, and I remember I had to work
a lot with permutation arithmetic.  This was a bit specialised, however.

I once wrote a C application named `recode' that mainly does charset
conversions.  It does some arithmetic on permutations at a few places,
either for optimisation or while seeking reversibility, and this is really
nothing far stretched or un-natural.  I sometimes plan to parallel `recode'
with a Python implementation, because from experience, prototyping ideas in
C is rather painful, while I foresee it would be far lot easier in Python.
Undoubtedly then, I would formalise permutations.

> > Some thought is surely needed for properly reflecting mathematical
> > elegance into how the set API is extended for the above, and not
> > merely burying that elegance under practical concerns.

> And, on the other hand, practicality beats purity.

Only when both conflict without any hope of resolution.  But when
practicality and purity can coexist, that's even better.  So much better
in fact, that it's always worth very seriously trying to seek coexistence.

François Pinard   http://www.iro.umontreal.ca/~pinard