Executive summary: Thanks for discussion everyone. I'm now convinced that itertools.permutations is fine as it is. I am not totally convinced that multiset_permutations doesn't belong in itertools, or else there should be a standard combinatorics library. On Sun, Oct 13, 2013 at 5:27 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 13 October 2013 17:38, Neil Girdhar <mistersheik@gmail.com> wrote:
My intuition is that we want Python to be "complete". Many other languages can find the permutations of a multiset. Python has a permutations function. Many people on stackoverflow expected that function to be able to find those permutations.
Nope, we expressly *don't* want the standard library to be "complete", because that would mean growing to the size of PyPI (or larger). There's always going to be scope for applications to adopt new domain specific dependencies with more in-depth support than that provided by the standard library.
By complete I meant that just as if you were to add the "error function, erf" to math, you would want to add an equivalent version to cmath. When I saw the permutation function in itertools, I expected that it would work on both sets and multisets, or else there would be another function that did.
Many standard library modules are in fact deliberately designed as "stepping stone" modules that will meet the needs of code which have an incidental relationship to that task, but will need to be replaced with something more sophisticated for code directly related to that domain. Many times, that means they will ignore as irrelevant distinctions that are critical in certain contexts, simply because they don't come up all that often outside those specific domains, and addressing them involves making the core module more complicated to use for more typical cases.
Good point.
In this case, the proposed alternate permutations mechanism only makes a difference when:
1. The data set contains equivalent values 2. Input order is not considered significant, so exchanging equivalent values should *not* create a new permutation (i.e. multiset permutations rather than sequence permutations).
If users aren't likely to encounter situations where that makes a difference, then providing both in the standard library isn't being helpful, it's being actively user hostile by asking them to make a decision they're not yet qualified to make for the sake of the few experts that specifically need . Hence Raymond's request for data modelling problems outside the "learning or studying combinatorics" context to make the case for standard library inclusion.
Interestingly, I just found another language which has the equivalent of the currrent behaviour of itertools.permutations: Haskell has it as Data.List.permutations. As far as I can tell, Haskell doesn't offer support for multiset permutations in the core, you need an additional package like Math.Combinatorics (see:
http://hackage.haskell.org/package/multiset-comb-0.2.3/docs/Math-Combinatori... ).
Since iterator based programming in Python is heavily inspired by Haskell, this suggests that the current behaviour of itertools.permutations is appropriate and that Raymond is right to be dubious about including multiset permutations support directly in the standard library.
You've convinced me that itertools permutations is doing the right thing :) I'm not sure if multiset permutations should be in the standard library or not. It is very useful.
Those interested in improving the experience of writing combinatorics code in Python may wish to look into helping out with the combinatorics package on PyPI: http://phillipmfeldman.org/Python/for_developers.html (For example, politely approach Phillip to see if he is interested in hosting it on GitHub or BitBucket, providing Sphinx docs on ReadTheDocs, improving the PyPI metadata, etc - note I have no experience with this package, it's just the first hit for "python combinatorics")
One suggestion: Why not make it so that itertools.permutations checks if its argument is an instance of collections.Mapping? If it is, we could interpret the items as a mapping from elements to positive integers, which is a compact representation of a multiset. Then, it could do the right thing for that case.
If you want to go down the path of only caring about hashable values, you may want to argue for a permutations method on collections.Counter (it's conceivable that approach has the potential to be even faster than an approach based on accepting and processing an arbitrary iterable, since it can avoid generating repeated values in the first place).
A Counter based multiset permutation algorithm was actually posted to python-list back in 2009, just after collections.Counter was introduced: https://mail.python.org/pipermail/python-list/2009-January/521685.html
Nice find!
I just created an updated version of that recipe and posted it as https://bitbucket.org/ncoghlan/misc/src/default/multiset_permutations.py
Why not just define multiset_permutations to accept a dict (dict is a base class of Counter)? Since you're going to convert from an iterable (with duplicates) to a dict (via Counter) anyway, why not accept it as such. Users who want an interface similar to itertools.permutations can pass their iterable through Counter first. Cheers,
Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia