Executive summary:  Thanks for discussion everyone.  I'm now convinced that itertools.permutations is fine as it is.  I am not totally convinced that multiset_permutations doesn't belong in itertools, or else there should be a standard combinatorics library.

On Sun, Oct 13, 2013 at 5:27 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 13 October 2013 17:38, Neil Girdhar <mistersheik@gmail.com> wrote:
> My intuition is that we want Python to be "complete".  Many other languages
> can find the permutations of a multiset.  Python has a permutations
> function.  Many people on stackoverflow expected that function to be able to
> find those permutations.

Nope, we expressly *don't* want the standard library to be "complete",
because that would mean growing to the size of PyPI (or larger).
There's always going to be scope for applications to adopt new domain
specific dependencies with more in-depth support than that provided by
the standard library.

By complete I meant that just as if you were to add the "error function, erf" to math, you would want to add an equivalent version to cmath.  When I saw the permutation function in itertools, I expected that it would work on both sets and multisets, or else there would be another function that did.  

Many standard library modules are in fact deliberately designed as
"stepping stone" modules that will meet the needs of code which have
an incidental relationship to that task, but will need to be replaced
with something more sophisticated for code directly related to that
domain. Many times, that means they will ignore as irrelevant
distinctions that are critical in certain contexts, simply because
they don't come up all that often outside those specific domains, and
addressing them involves making the core module more complicated to
use for more typical cases.

Good point. 

In this case, the proposed alternate permutations mechanism only makes
a difference when:

1. The data set contains equivalent values
2. Input order is not considered significant, so exchanging equivalent
values should *not* create a new permutation (i.e. multiset
permutations rather than sequence permutations).

If users aren't likely to encounter situations where that makes a
difference, then providing both in the standard library isn't being
helpful, it's being actively user hostile by asking them to make a
decision they're not yet qualified to make for the sake of the few
experts that specifically need . Hence Raymond's request for data
modelling problems outside the "learning or studying combinatorics"
context to make the case for standard library inclusion.

Interestingly, I just found another language which has the equivalent
of the currrent behaviour of itertools.permutations: Haskell has it as
Data.List.permutations. As far as I can tell, Haskell doesn't offer
support for multiset permutations in the core, you need an additional
package like Math.Combinatorics (see:
http://hackage.haskell.org/package/multiset-comb-0.2.3/docs/Math-Combinatorics-Multiset.html#g:4).

Since iterator based programming in Python is heavily inspired by
Haskell, this suggests that the current behaviour of
itertools.permutations is appropriate and that Raymond is right to be
dubious about including multiset permutations support directly in the
standard library.


You've convinced me that itertools permutations is doing the right thing :)  I'm not sure if multiset permutations should be in the standard library or not.  It is very useful.
 
Those interested in improving the experience of writing combinatorics
code in Python may wish to look into helping out with the
combinatorics package on PyPI:
http://phillipmfeldman.org/Python/for_developers.html (For example,
politely approach Phillip to see if he is interested in hosting it on
GitHub or BitBucket, providing Sphinx docs on ReadTheDocs, improving
the PyPI metadata, etc - note I have no experience with this package,
it's just the first hit for "python combinatorics")

> One suggestion: Why not make it so that itertools.permutations checks if its
> argument is an instance of collections.Mapping?  If it is, we could
> interpret the items as a mapping from elements to positive integers, which
> is a compact representation of a multiset.  Then, it could do the right
> thing for that case.

If you want to go down the path of only caring about hashable values,
you may want to argue for a permutations method on collections.Counter
(it's conceivable that approach has the potential to be even faster
than an approach based on accepting and processing an arbitrary
iterable, since it can avoid generating repeated values in the first
place).

A Counter based multiset permutation algorithm was actually posted to
python-list back in 2009, just after collections.Counter was
introduced: https://mail.python.org/pipermail/python-list/2009-January/521685.html


Nice find!  
 
I just created an updated version of that recipe and posted it as
https://bitbucket.org/ncoghlan/misc/src/default/multiset_permutations.py


  Why not just define multiset_permutations to accept a dict (dict is a base class of Counter)?  Since you're going to convert from an iterable (with duplicates) to a dict (via Counter) anyway, why not accept it as such.  Users who want an interface similar to itertools.permutations can pass their iterable through Counter first.

Cheers,
Nick.

--
Nick Coghlan   |   ncoghlan@gmail.com   |   Brisbane, Australia