Extremely weird itertools.permutations

"It is universally agreed that a list of n distinct symbols has n! permutations. However, when the symbols are not distinct, the most common convention, in mathematics and elsewhere, seems to be to count only distinct permutations." — http://stackoverflow.com/questions/6534430/why-does-pythons-itertools-permut.... Should we consider fixing itertools.permutations and to output only unique permutations (if possible, although I realize that would break code). It is completely non-obvious to have permutations returning duplicates. For a non-breaking compromise what about adding a flag? Best, Neil

Note that if permutations is made to return only unique permutations, the behaviour of defining unique elements by index can be recovered using: ([it[index] for index in indexes] for indexes in itertools.permutations(range(len(it)))) On Friday, October 11, 2013 2:38:33 PM UTC-4, Neil Girdhar wrote:

On Fri, Oct 11, 2013 at 11:38:33AM -0700, Neil Girdhar wrote:
I dispute this entire premise. Take a simple (and stereotypical) example, picking balls from an urn. Say that you have three Red and Two black balls, and randomly select without replacement. If you count only unique permutations, you get only four possibilities: py> set(''.join(t) for t in itertools.permutations('RRRBB', 2)) {'BR', 'RB', 'RR', 'BB'} which implies that drawing RR is no more likely than drawing BB, which is incorrect. The right way to model this experiment is not to count distinct permutations, but actual permutations: py> list(''.join(t) for t in itertools.permutations('RRRBB', 2)) ['RR', 'RR', 'RB', 'RB', 'RR', 'RR', 'RB', 'RB', 'RR', 'RR', 'RB', 'RB', 'BR', 'BR', 'BR', 'BB', 'BR', 'BR', 'BR', 'BB'] which makes it clear that there are two ways of drawing BB compared to six ways of drawing RR. If that's not obvious enough, consider the case where you have two thousand red balls and two black balls -- do you really conclude that there are the same number of ways to pick RR as BB? So I disagree that counting only distinct permutations is the most useful or common convention. If you're permuting a collection of non-distinct values, you should expect non-distinct permutations. I'm trying to think of a realistic, physical situation where you would only want distinct permutations, and I can't.
Should we consider fixing itertools.permutations and to output only unique permutations (if possible, although I realize that would break code).
Absolutely not. Even if you were right that it should return unique permutations, and I strongly disagree that you were, the fact that it would break code is a deal-breaker. -- Steven

I think it's pretty indisputable that permutations are formally defined this way (and I challenge you to find a source that doesn't agree with that). I'm sure you know that your idea of using permutations to evaluate a multinomial distribution is not efficient. A nicer way to evaluate probabilities is to pass your set through a collections.Counter, and then use the resulting dictionary with scipy.stats.multinomial (if it exists yet). I believe most people will be surprised that len(permutations(iterable)) does count unique permutations. Best, Neil On Fri, Oct 11, 2013 at 10:06 PM, Steven D'Aprano <steve@pearwood.info>wrote:

Related to, but not quite the same as Steven D'Aprano's point, I would find it very strange for itertools.permutations() to return a list that was narrowed to equal-but-not-identical items. This is why I've raised the example of 'items=[Fraction(3,1), Decimal(3.0), 3.0]' several times. I've created the Fraction, Decimal, and float for distinct reasons to get different behaviors and available methods. When I want to look for the permutations of those I don't want "any old random choice of equal values" since presumably I've given them a type for a reason. On the other hand, I can see a little bit of sense that 'itertools.permutations([3,3,3,3,3,3,3])' doesn't *really* need to tell me a list of 7!==5040 things that are exactly the same as each other. On the other hand, I don't know how to generalize that, since my feeling is far less clear for 'itertools.permutations([1,2,3,4,5,6,6])' ... there's redundancy, but there's also important information in the probability and count of specific sequences. My feeling, however, is that if one were to trim down the results from a permutations-related function, it is more interesting to me to only eliminate IDENTICAL items, not to eliminate merely EQUAL ones. On Fri, Oct 11, 2013 at 7:37 PM, Neil Girdhar <mistersheik@gmail.com> wrote:
-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

I honestly think that Python should stick to the mathematical definition of permutations rather than some kind of consensus of the tiny minority of people here. When next_permutation was added to C++, I believe the whole standards committee discussed it and they came up with the thing that makes the most sense. The fact that dict and set use equality is I think the reason that permutations should use equality. Neil On Fri, Oct 11, 2013 at 10:48 PM, David Mertz <mertz@gnosis.cx> wrote:

On 12 Oct 2013 12:56, "Neil Girdhar" <mistersheik@gmail.com> wrote:
I honestly think that Python should stick to the mathematical definition
of permutations rather than some kind of consensus of the tiny minority of people here. When next_permutation was added to C++, I believe the whole standards committee discussed it and they came up with the thing that makes the most sense. The fact that dict and set use equality is I think the reason that permutations should use equality. Why should the behaviour of hash based containers limit the behaviour of itertools? Python required a permutation solution that is memory efficient and works with arbitrary objects, so that's what itertools provides. However, you'd also like a memory efficient iterator for *mathematical* permutations that pays attention to object values and filters out equivalent results. I *believe* the request is equivalent to giving a name to the following genexp: (k for k, grp in groupby(permutations(sorted(input)))) That's a reasonable enough request (although perhaps more suited to the recipes section in the itertools docs), but conflating it with complaints about the way the existing iterator works is a good way to get people to ignore you (especially now the language specific reasons for the current behaviour have been pointed out, along with confirmation of the fact that backwards compatibility requirements would prohibit changing it even if we wanted to). Cheers, Nick.
find it very strange for itertools.permutations() to return a list that was narrowed to equal-but-not-identical items. permutations-related function, it is more interesting to me to only eliminate IDENTICAL items, not to eliminate merely EQUAL ones. this way (and I challenge you to find a source that doesn't agree with that). I'm sure you know that your idea of using permutations to evaluate a multinomial distribution is not efficient. A nicer way to evaluate probabilities is to pass your set through a collections.Counter, and then use the resulting dictionary with scipy.stats.multinomial (if it exists yet). python-ideas+unsubscribe@googlegroups.com.

What you propose, Nick, is definitely different from the several functions that have been bandied about here. I.e.
def nick_permutations(items, r=None): ... return (k for k, grp in groupby(permutations(sorted(items),r)))
["".join(p) for p in nonredundant_permutations('aaabb', 3)] ['aaa', 'aab', 'aba', 'abb', 'baa', 'bab', 'bba']
If I'm thinking of this right, what you give is equivalent to the initial flawed version of 'nonredundant_permutations()' that I suggested, which used '!=' rather than the correct '>' in comparing to the 'last' tuple. FWIW, I deliberately chose the name 'nonredundant_permutations' rather than MRAB's choice of 'unique_permutations' because I think what the filtering does is precisely NOT to give unique ones. Or rather, not to give ALL unique ones, but only those defined by equivalence (i.e. rather than identity). My name is ugly, and if there were to be a function like it in itertools, a better name should be found. But such a name should emphasize that it is "filter by equivalence classes" ... actually, maybe this suggests another function which is instead "filter by identity of tuples", potentially also added to itertools. On Fri, Oct 11, 2013 at 9:35 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

Hi Nick, Rereading my messages, I feel like I haven't been as diplomatic as I wanted. Like everyone here, I care a lot about Python and I want to see it become as perfect as it can be made. If my wording has been too strong, it's only out of passion for Python. I acknowledged in my initial request that it would be impossible to change the default behaviour of itertools.permutations. I understand that that ship has sailed. I think my best proposal is to have an efficient distinct_permutations function in itertools. It should be in itertools so that it is discoverable. It should be a function rather one of the recipes proposed to make it as efficient as possible. (Correct me if I'm wrong, but like the set solution, groupby is also not so efficient.) I welcome the discussion and hope that the most efficient implementation someone here comes up with will be added one day to itertools. Best, Neil On Sat, Oct 12, 2013 at 12:35 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:

On Oct 11, 2013, at 23:55, Neil Girdhar <mistersheik@gmail.com> wrote:
I think my best proposal is to have an efficient distinct_permutations function in itertools. It should be in itertools so that it is discoverable. It should be a function rather one of the recipes proposed to make it as efficient as possible. (Correct me if I'm wrong, but like the set solution, groupby is also not so efficient.)
I welcome the discussion and hope that the most efficient implementation someone here comes up with will be added one day to itertools.
I think getting something onto PyPI (whether as part of more-itertools or elsewhere) and/or the ActiveState recipes (and maybe StackOverflow and CodeReview) is the best way to get from here to there. Continuing to discuss it here, you've only got the half dozen or so people who are on this list and haven't tuned out this thread to come up with the most efficient implementation. Put it out in the world and people will begin giving you comments/bug reports/rants calling you an idiot for missing the obvious more efficient way to do it, and then you can use their code. And then, when you're satisfied with it, you have a concrete proposal for something to add to itertools in python X.Y+1 instead of some implementation to be named later to add one day. I was also going to suggest that you drop the argument about whether this is the one true definition of sequence permutation and just focus on whether it's a useful thing to have, but it looks like you're way ahead of me there, so never mind.

Neil Girdhar writes:
Is there an agreed mathematical definition of permutations of *sequences*? Every definition I can find refers to permutations of *sets*. I think any categorist would agree that there are a large number of maps of _Sequence_ to _Set_, in particular the two obviously useful ones[1]: the one that takes each element of the sequence to a *different* element of the corresponding set, and the one that takes equal elements of the sequence to the *same* element of the corresponding set. The corresponding set need not be the underlying set of the sequence, and which one is appropriate presumably depends on applications.
To the negligible (in several senses of the word) fraction of humanity that participates in C++ standardization. Python is not C++ (thanking all the Roman and Greek gods, and refusing to identify Zeus with Jupiter, nor Aphrodite with Venus<wink/>).
The fact that dict and set use equality is I think the reason that permutations should use equality.
Sequences are not sets, and dict is precisely the wrong example for you to use, since it makes exactly the point that values that are identical may be bound to several different keys. We don't unify keys in a dict just because the values are identical (or equal). Similar, in representing a sequence as a set, we use a set of ordered pairs, with the first component an unique integer indicating position, and the second the sequence element. Since there are several useful mathematical ways to convert sequences to sets, and in particular one very similar, if not identical, to the one you like is enshrined in the very convenient constructor set(), I think it's useful to leave it as it is.
It is universally agreed that a list of n distinct symbols has n! permutations.
But that's because there's really no sensible definition of "underlying set" for such a list except the set containing exactly the same elements as the list.[2] But there is no universal agreement that "permutations of a list" is a sensible phrase. For example, although the Wikipedia article Permutation refers to lists of permutations, linked list representations of data, to the "list of objects" for use in Cauchy's notation, and to the cycle representation as a list of sequences, it doesn't once refer to permutation of a list. They're obvious not averse to discussing lists, but the word use for the entity being permuted is invariably "set". Footnotes: [1] And some maps not terribly useful for our purposes, such as one that maps all sequences to a singleton. [2] A categorist would disagree, but that's not interesting.

I'm sorry, but I can't find a reference supporting the statement that the current permutations function is consistent with the mathematical definition. Perhaps you would like to find a reference? A quick search yielded the book "the Combinatorics of Permutations": http://books.google.ca/books?id=Op-nF-mBR7YC&lpg=PP1 Please look in the chapter "Permutation of multisets". Best, Neil On Sat, Oct 12, 2013 at 2:34 AM, Steven D'Aprano <steve@pearwood.info>wrote:

On 12 Oct 2013 17:18, "Neil Girdhar" <mistersheik@gmail.com> wrote:
I'm sorry, but I can't find a reference supporting the statement that the
current permutations function is consistent with the mathematical definition. Perhaps you would like to find a reference? A quick search yielded the book "the Combinatorics of Permutations": http://books.google.ca/books?id=Op-nF-mBR7YC&lpg=PP1 Please look in the chapter "Permutation of multisets". Itertools effectively produces the permutation of (index, value) pairs. Hence Steven's point about the permutations of a list not being mathematically defined, so you have to decide what set to map it to in order to decide what counts as a unique value. The mapping itertools uses considers position in the iterable relevant so exchanging two values that are themselves equivalent is still considered a distinct permutation since their original position is taken into account. Like a lot of mathematics, it's a matter of paying close attention to which entities are actually being manipulated and how the equivalence classes are being defined :) Hence the current proposal amounts to adding another variant that provides the permutations of an unordered multiset instead of those of a set of (index, value) 2-tuples (with the indices stripped from the results). One interesting point is that combining collections.Counter.elements() with itertools.permutations() currently does the wrong thing, since itertools.permutations() *always* considers iterable order significant, while for collections.Counter.elements() it's explicitly arbitrary. Cheers, Nick.
wrote: python-ideas+unsubscribe@googlegroups.com.

On Sat, Oct 12, 2013 at 8:07 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
I hadn't thought about it, but as I read the docs for 3.4 (and it's the same back through 2.7), not only would both of these be permissible in a Python implementation:
list(collections.Counter({'a':2,'b':1}).elements()) ['a', 'a', 'b']
Or:
list(collections.Counter({'a':2,'b':1}).elements()) ['b', 'a', 'a']
But even this would be per documentation (although really unlikely as an implementation):
list(collections.Counter({'a':2,'b':1}).elements()) ['a', 'b', 'a']
-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

Yes, you're right and I understand what's been done although like the 30 upvoters to the linked stackoverflow question, I find the current behaviour surprising and would like to see a distinct_permutations function. How do I start to submit a patch? Neil On Sat, Oct 12, 2013 at 11:07 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:

On Oct 12, 2013, at 11:56 AM, Neil Girdhar <mistersheik@gmail.com> wrote:
, I find the current behaviour surprising and would like to see a distinct_permutations function. How do I start to submit a patch?
You can submit your patch at http://bugs.python.org and assign it to me (the module designer and maintainer). That said, the odds of it being accepted are slim. There are many ways to write combinatoric functions (Knuth has a whole book on the subject) and I don't aspire to include multiple variants unless there are strong motivating use cases. In general, if someone wants to eliminate duplicates from the population, they can do so easily with: permutations(set(population), n) The current design solves the most common use cases and it has some nice properties such as: * permutations is a subsequence of product * no assumptions are made about the comparability or orderability of members of the population * len(list(permutations(range(n), r))) == n! / (n-r)! just like you were taught in school * it is fast For more exotic needs, I think is appropriate to look outside the standard library to more full-featured combinatoric libraries (there are several listed at pypi.python.org). Raymond

On 10/12/2013 05:44 PM, Raymond Hettinger wrote:
+1 About the only improvement I can see would be a footnote in the itertools doc table that lists the different combinatorics. Being a naive permutations user myself I would have made the mistake of thinking that "r-length tuples, all possible orderings, no repeated elements" meant no repeated values. The longer text for permutations makes it clear how it works. My rst-foo is not good enough to link from the table down into the permutation text where the distinction is made clear. If no one beats me to a proposed patch I'll see if I can figure it out. -- ~Ethan~

Hi Raymond, I agree with you on the consistency point with itertools.product. That's a great point. However, permutations(set(population)) is not the correct way to take the permutations of a multiset. Please take a look at how permutations are taken from a multiset from any of the papers I linked or any paper that you can find on the internet. The number of permutations of multiset is n! / \prod a_i! for a_i are the element counts — just like I was taught in school. There is currently no fast way to find these permutations of a multiset and it is a common operation for solving problems. What is needed, I think is a function multiset_permutations that accepts an iterable. Best, Neil On Sat, Oct 12, 2013 at 8:44 PM, Raymond Hettinger < raymond.hettinger@gmail.com> wrote:

On Sat, Oct 12, 2013 at 05:44:38PM -0700, Raymond Hettinger wrote:
In fairness Raymond, the proposal is not to eliminate duplicates from the population, but from the permutations themselves. Consider the example I gave earlier, where you're permuting "RRRBB" two items at a time. There are 20 permutations including duplicates, but sixteen of them are repeated: py> list(''.join(t) for t in permutations("RRRBB", 2)) ['RR', 'RR', 'RB', 'RB', 'RR', 'RR', 'RB', 'RB', 'RR', 'RR', 'RB', 'RB', 'BR', 'BR', 'BR', 'BB', 'BR', 'BR', 'BR', 'BB'] py> set(''.join(t) for t in permutations("RRRBB", 2)) {'BR', 'RB', 'RR', 'BB'} But if you eliminate duplicates from the population first, you get only two permutations: py> list(''.join(t) for t in permutations(set("RRRBB"), 2)) ['BR', 'RB'] If it were just a matter of calling set() on the output of permutations, that would be trivial enough. But, you might care about order, or elements might not be hashable, or you might have a LOT of permutations to generate before discarding: population = "R"*1000 + "B"*500 set(''.join(t) for t in permutations(population, 2)) # takes a while... In my opinion, if unique_permutations is no more efficient than calling set on the output of permutations, it's not worth it. But if somebody can come up with an implementation which is significantly more efficient, without making unreasonable assumptions about orderability, hashability or even comparability, then in my opinion that might be worthwhile. -- Steven

On Oct 12, 2013, at 6:47 PM, Steven D'Aprano <steve@pearwood.info> wrote:
the proposal is not to eliminate duplicates from the population, but from the permutations themselves.
I'm curious about the use cases for this. Other than red/blue marble examples and some puzzle problems, does this come-up in any real problems? Do we actually need this? Raymond

My intuition is that we want Python to be "complete". Many other languages can find the permutations of a multiset. Python has a permutations function. Many people on stackoverflow expected that function to be able to find those permutations. One suggestion: Why not make it so that itertools.permutations checks if its argument is an instance of collections.Mapping? If it is, we could interpret the items as a mapping from elements to positive integers, which is a compact representation of a multiset. Then, it could do the right thing for that case. Best, Neil On Sat, Oct 12, 2013 at 11:03 PM, Raymond Hettinger < raymond.hettinger@gmail.com> wrote:

On 13 October 2013 17:38, Neil Girdhar <mistersheik@gmail.com> wrote:
Nope, we expressly *don't* want the standard library to be "complete", because that would mean growing to the size of PyPI (or larger). There's always going to be scope for applications to adopt new domain specific dependencies with more in-depth support than that provided by the standard library. Many standard library modules are in fact deliberately designed as "stepping stone" modules that will meet the needs of code which have an incidental relationship to that task, but will need to be replaced with something more sophisticated for code directly related to that domain. Many times, that means they will ignore as irrelevant distinctions that are critical in certain contexts, simply because they don't come up all that often outside those specific domains, and addressing them involves making the core module more complicated to use for more typical cases. In this case, the proposed alternate permutations mechanism only makes a difference when: 1. The data set contains equivalent values 2. Input order is not considered significant, so exchanging equivalent values should *not* create a new permutation (i.e. multiset permutations rather than sequence permutations). If users aren't likely to encounter situations where that makes a difference, then providing both in the standard library isn't being helpful, it's being actively user hostile by asking them to make a decision they're not yet qualified to make for the sake of the few experts that specifically need . Hence Raymond's request for data modelling problems outside the "learning or studying combinatorics" context to make the case for standard library inclusion. Interestingly, I just found another language which has the equivalent of the currrent behaviour of itertools.permutations: Haskell has it as Data.List.permutations. As far as I can tell, Haskell doesn't offer support for multiset permutations in the core, you need an additional package like Math.Combinatorics (see: http://hackage.haskell.org/package/multiset-comb-0.2.3/docs/Math-Combinatori...). Since iterator based programming in Python is heavily inspired by Haskell, this suggests that the current behaviour of itertools.permutations is appropriate and that Raymond is right to be dubious about including multiset permutations support directly in the standard library. Those interested in improving the experience of writing combinatorics code in Python may wish to look into helping out with the combinatorics package on PyPI: http://phillipmfeldman.org/Python/for_developers.html (For example, politely approach Phillip to see if he is interested in hosting it on GitHub or BitBucket, providing Sphinx docs on ReadTheDocs, improving the PyPI metadata, etc - note I have no experience with this package, it's just the first hit for "python combinatorics")
If you want to go down the path of only caring about hashable values, you may want to argue for a permutations method on collections.Counter (it's conceivable that approach has the potential to be even faster than an approach based on accepting and processing an arbitrary iterable, since it can avoid generating repeated values in the first place). A Counter based multiset permutation algorithm was actually posted to python-list back in 2009, just after collections.Counter was introduced: https://mail.python.org/pipermail/python-list/2009-January/521685.html I just created an updated version of that recipe and posted it as https://bitbucket.org/ncoghlan/misc/src/default/multiset_permutations.py Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Executive summary: Thanks for discussion everyone. I'm now convinced that itertools.permutations is fine as it is. I am not totally convinced that multiset_permutations doesn't belong in itertools, or else there should be a standard combinatorics library. On Sun, Oct 13, 2013 at 5:27 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
By complete I meant that just as if you were to add the "error function, erf" to math, you would want to add an equivalent version to cmath. When I saw the permutation function in itertools, I expected that it would work on both sets and multisets, or else there would be another function that did.
Good point.
You've convinced me that itertools permutations is doing the right thing :) I'm not sure if multiset permutations should be in the standard library or not. It is very useful.
Nice find!
I just created an updated version of that recipe and posted it as https://bitbucket.org/ncoghlan/misc/src/default/multiset_permutations.py
Why not just define multiset_permutations to accept a dict (dict is a base class of Counter)? Since you're going to convert from an iterable (with duplicates) to a dict (via Counter) anyway, why not accept it as such. Users who want an interface similar to itertools.permutations can pass their iterable through Counter first. Cheers,

On Sun, Oct 13, 2013 at 9:56 PM, Neil Girdhar <mistersheik@gmail.com> wrote:
By complete I meant that just as if you were to add the "error function, erf" to math, you would want to add an equivalent version to cmath.
An interesting choice of example. *Why* would you want to do so? Since you bring this up, I assume you're already aware that math.erf exists but cmath.erf does not. I believe there are good, practical reasons *not* to add cmath.erf, in spite of the existence of math.erf. Not least of these is that cmath.erf would be significantly more complicated to implement and of significantly less interest to users. And perhaps there's a parallel with itertools.permutations and the proposed itertools.multiset_permutations here... -- Mark

Actually I didn't notice that. It seems weird to find erf in math, but erf for complex numbers in scipy.special. It's just about organization and user discovery. I realize that from the developer's point of view, erf for complex numbers is complicated, but why does the user care? On Mon, Oct 14, 2013 at 8:29 AM, Mark Dickinson <dickinsm@gmail.com> wrote:

On 14 October 2013 13:37, Neil Girdhar <mistersheik@gmail.com> wrote:
This is the first time I've seen a suggestion that there should be cmath.erf. So I would say that most users don't care about having a complex error function. Whoever would take the time to implement the complex error function might instead spend that time implementing and maintaining something that users do care about. Oscar

On Mon, Oct 14, 2013 at 08:37:59AM -0400, Neil Girdhar wrote:
99% of users don't care about math.errf at all. Of those who do, 99% don't care about cmath.errf. I'd like to see cmath.errf because I'm a maths junkie, but if I were responsible for *actually doing the work* I'd make the same decision to leave cmath.errf out and leave it for a larger, more complete library like scipy. There are an infinitely large number of potential programs which could in principle be added to Python's std lib, and only a finite number of person-hours to do the work. And there are costs to adding software to the std lib, not just benefits. -- Steven

On 15 October 2013 11:29, Neil Girdhar <mistersheik@gmail.com> wrote:
You make a good point. It was just a random example to illustrate that desire for completeness.
The desire for conceptual purity and consistency is a good one, it just needs to be balanced against the practical constraints of writing, maintaining, documenting, teaching and learning the standard library. "It isn't worth the hassle" is the answer to a whole lot of "Why not X?" questions in software development :) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Mon, Oct 14, 2013 at 6:39 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
"It isn't worth the hassle" is the answer to a whole lot of "Why not X?" questions in software development :)
Sometimes it's not worth the hassle on either side. If this is added to the standard library and I write code that uses it, my code won't be backwards compatible with older versions of Python. So I'll either have to not support older Python versions or use an alternative implementation. If this is on pypi then that's not an issue. Not everything useful should be in the standard library. If I had come across a need for this, I'd have just used unique_everseen(permuations(...)) until performance became an issue. --- Bruce I'm hiring: http://www.cadencemd.com/info/jobs Latest blog post: Alice's Puzzle Page http://www.vroospeak.com Learn how hackers think: http://j.mp/gruyere-security

On Oct 13, 2013, at 1:56 PM, Neil Girdhar <mistersheik@gmail.com> wrote:
I'm now convinced that itertools.permutations is fine as it is. I am not totally convinced that multiset_permutations doesn't belong in itertools,
Now that we have a good algorithm, I'm open to adding this to itertools, but it would need to have a name that didn't create any confusion with respect to the existing tools, perhaps something like: anagrams(population, r) Return an iterator over a all distinct r-length permutations of the population. Unlike permutations(), element uniqueness is determined by value rather than by position. Also, anagrams() makes no guarantees about the order the tuples are generated. Raymond

Excellent! My top two names are 1. multiset_permutations (reflects the mathematical name) 2. anagrams Note that we may also want to add multiset_combinations. It hasn't been part of this discussion, but it may be part of another discussion and I wanted to point this out as I know many of you are future-conscious. We seem to be all agreed that we want to accept "r", the length of the permutation desired. With permutations, the *set* is passed in as a iterable representing distinct elements. With multiset_permutations, there are three ways to pass in the *multiset*: - 1. an iterable whose elements (or an optional key function applied to which) are compared using __eq__ - 2. a dict (of which collections.Counter) is a subclass - 3. an iterable whose elements are key-value pairs and whose values are counts Example uses: 1. multiset_permutations(word) 2. multiset_permutations(Counter(word)) 3. multiset_permutations(Counter(word).items()) >From a dictionary: 1. multiset_permutations(itertools.chain.from_iterable(itertools.repeat(k, v) for k, v in d.items())) 2. multiset_permutations(d) 3. multiset_permutations(d.items()) >From an iterable of key-value pairs: 1. multiset_permutations(itertools.chain.from_iterable(itertools.repeat(k, v) for k, v in it)) 2. multiset_permutations({k: v for k, v in it}) 3. multiset_permutations(it) The advantage of 2 is that no elements are compared by multiset_permutations (so it is simpler and faster). The advantage of 3 is that no elements are compared, and they need not be comparable or hashable. This version is truly a generalization of the "permutations" function. This way, for any input "it" you could pass to permutations, you could equivalently pass zip(it, itertools.repeat(1)) to multiset_permutations. Comments? Neil On Mon, Oct 14, 2013 at 1:56 PM, Raymond Hettinger < raymond.hettinger@gmail.com> wrote: > > On Oct 13, 2013, at 1:56 PM, Neil Girdhar <mistersheik@gmail.com> wrote: > > I'm now convinced that itertools.permutations is fine as it is. I am not > totally convinced that multiset_permutations doesn't belong in itertools, > > > Now that we have a good algorithm, I'm open to adding this to itertools, > but it would need to have a name that didn't create any confusion > with respect to the existing tools, perhaps something like: > > anagrams(population, r) > > Return an iterator over a all distinct r-length permutations > of the population. > > Unlike permutations(), element uniqueness is determined > by value rather than by position. Also, anagrams() makes > no guarantees about the order the tuples are generated. > > > > Raymond >

[Raymond Hettinger]
Now that we have a good algorithm, I'm open to adding this to itertools,
I remain reluctant, because I still haven't seen a compelling use case. Yes, it generates all distinct r-letter anagrams - but so what? LOL ;-) Seriously, I've written anagram programs several times in my life, and generating "all possible" never occurred to me because it's so crushingly inefficient.
"anagrams" is great! Inspired :-) What about an optional argument to define what the _user_ means by "equality"? The algorithm I posted had an optional `equal=operator.__eq__` argument. Else you're going to be pushed to add a clumsy `TransformAnagrams` later <0.4 wink>.
Well, MRAB's algorithm (and my rewrite) guarantees that _if_ the elements support a total order, and appear in the iterable in non-decreasing order, then the anagrams are generated in non-decreasing lexicographic order. And that may be a useful guarantee (hard to tell without a real use case, though!). There's another ambiguity I haven't seen addressed explicitly. Consider this:
(3, 3.0, Fraction(3, 1)) All the algorithms posted here work to show all 3 elements in this case. But why? If the elements all equal, then other outputs "should be" acceptable too. Like (3, 3, 3) or (3.0, Fraction(3, 1), 3.0) etc. All those outputs compare equal! This isn't visible if, e.g., the iterable's elements are letters (where a == b if and only if str(a) == str(b), so the output looks the same no matter what). At least "my" algorithm could be simplified materially if it only saved (and iterated over) a (single) canonical representative for each equivalence class, instead of saving entire equivalence classes and then jumping through hoops to cycle through each equivalence class's elements. But, for some reason, output (3, 3, 3) just "looks wrong" above. I'm not sure why.

On 15/10/2013 01:48, Tim Peters wrote:
[snip] I can see that one disadvantage of my algorithm is that the worst-case storage requirement is O(n^2) (I think). This is because the set of first items could have N members, the set of second items could have N-1 members, etc. On the other hand, IMHO, the sheer number of permutations will become a problem long before the memory requirement does! :-)

On 13/10/2013 08:38, Neil Girdhar wrote:
My intuition is that we want Python to be "complete".
No thank you. I much prefer "Python in a Nutshell" the size it is now, I'm not interested in competing with (say) "Java in a Nutshell". -- Roses are red, Violets are blue, Most poems rhyme, But this one doesn't. Mark Lawrence

One example of prior art: Maxima, which I use in its wxMaxima incarnation. """ Function: permutations(a) Returns a set of all distinct permutations of the members of the list or set a. Each permutation is a list, not a set. When a is a list, duplicate members of a are included in the permutations """ Examples from a Maxima shell:
permutations([1, 2. 3]); {[1,2,3],[1,3,2],[2,1,3],[2,3,1],[3,1,2],[3,2,1]}
permutations({1, 1.0, 1, 1.0}) {[1,1.0],[1.0,1]}
That last one may be surprising at first, but note that it's the first example where I passed a _set_ (instead of a list). And:
{1, 1.0, 1, 1.0} {1,1.0}
Best I can tell, Maxima has no builtin function akin to our permutations(it, r) when r < len(it). But Maxima has a huge number of builtin functions, and I often struggle to find ones I _want_ in its docs ;-)

On Oct 11, 2013, at 19:48, David Mertz <mertz@gnosis.cx> wrote:
My feeling, however, is that if one were to trim down the results from a permutations-related function, it is more interesting to me to only eliminate IDENTICAL items, not to eliminate merely EQUAL ones.
I agree with the rest of your message, but I still think you're wrong here. Anyone who is surprised by distinct_permutations((3.0, 3)) treating the two values the same would be equally surprised by {3.0, 3} having only one member. Or by groupby((3.0, 'a'), (3, 'b')) only having one group. And so on. In Python, sets, dict keys, groups, etc. work by ==. That was a choice that could have been made differently, but Python made that choice long ago, and has applied it completely consistently, and it would be very strange to choose differently in this case.

Hi Andrew, I've sort of said as much in my last reply to Nick. But maybe I can clarify further. I can imagine *someone* wanting a filtering of permutations by either identify or equality. Maybe, in fact, by other comparisons also for generality. This might suggest an API like the following: equal_perms = distinct_permutations(items, r, filter_by=operator.eq) ident_perms = distinct_permutations(items, r, filter_by=operator.is_) Or even perhaps, in some use-case that isn't clear to me, e.g. start_same_perms = distinct_permutations(items, r, filter_by=lambda a,b: a[0]==b[0]) Or perhaps more plausibly, some predicate that, e.g. tests if two returned tuples are the same under case normalization of the strings within them. I guess the argument then would be what the default value of 'filter_by' might be... but that seems less important to me if there were an option to pass a predicate as you liked. On Fri, Oct 11, 2013 at 7:57 PM, Andrew Barnert <abarnert@yahoo.com> wrote:
-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

Btw. My implementation of nonredundant_permutations *IS* guaranteed to work by the docs for Python 3.4. Actually for Python 2.7+. That is, it's not just an implementation accident (as I thought before I checked), but a promised API of itertools.permutations that: Permutations are emitted in lexicographic sort order. So, if the input iterable is sorted, the permutation tuples will be produced in sorted order. As long as that holds, my function will indeed behave correctly (but of course, with the limitation that it blows up if different items in the argument iterable cannot be compared using operator.lt(). On Fri, Oct 11, 2013 at 10:38 PM, David Mertz <mertz@gnosis.cx> wrote:
-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

On Sat, Oct 12, 2013 at 12:02 AM, Neil Girdhar <mistersheik@gmail.com>wrote:
Why not just use the standard python way to generalize this: "key" rather than the nonstandard "filter_by".
Yes, 'key' is a much better name than what I suggested. I'm not quite sure how best to implement this still. I guess MRAB's recursive approach should work, even though I like the simplicity of my style that takes full advantage of the existing itertools.permutations() (and uses 1/3 as many lines of--I think clearer--code). His has the advantage, however, that it doesn't require operator.lt() to work... however, without benchmarking, I have a pretty strong feeling that my suggestion will be faster since it avoids all that recursive call overhead. Maybe I'm wrong about that though.
-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

On 10/12/2013 02:09 AM, David Mertz wrote:
I'd like to see some nice examples of how it's used. It seems to me, There is some mixing of combination/permutation concepts. def filter_repeats(itr): seen = set() for i in itr: if i in seen: continue seen.add(i) yield i def unique_combinations(itr, length=None): if length == None: length = len(itr) return it.product(filter_repeats(itr), repeat=length) This one isn't the same as yours, but it's an example of filter_repeats. While that isn't the most efficient way in some cases, filtering repeat items out of things is a fairly common problem. Cheers, Ron

On Fri, Oct 11, 2013 at 10:37:02PM -0400, Neil Girdhar wrote:
If by "this way" you mean "unique permutations only", then yes it *completely* disputable, and I am doing so right now. I'm not arguing one way or the other for a separate "unique_permutations" generator, just that the existing permutations generator does the right thing. If you're satisfied with that answer, you can stop reading now, because the rest of my post is going to be rather long: TL;DR: If you want a unique_permutations generator, that's a reasonable request. If you insist on changing permutations, that's unreasonable, firstly because the current behaviour is correct, and secondly because backwards compatibility would constrain it to keep the existing behaviour even if it were wrong. . . . Still here? Okay then, let me justify why I say the current behaviour is correct. Speaking as a math tutor who has taught High School level combinatorics for 20+ years, I've never come across any text book or source that defines permutations in terms of unique permutations only. In every case that I can remember, or that I still have access to, unique permutations is considered a different kind of operation ("permutations ignoring duplicates", if you like) rather than the default. E.g. "Modern Mathematics 6" by Fitzpatrick and Galbraith has a separate section for permutations with repetition, gives the example of taking permutations from the word "MAMMAL", and explicitly contrasts situations where you consider the three letters M as "different" from when you consider them "the same". But in all such cases, such a situation is discussed as a restriction on permutations, not an expansion, that is: * there are permutations; * sometimes you want to only consider unique permutations; rather than: * there are permutations, which are always unique; * sometimes you want to consider things which are like permutations except they're not necessarily unique. I'd even turn this around and challenge you to find a source that *does* define them as always unique. Here's a typical example, from the Collins Dictionary of Mathematics: [quote] **permutation** or **ordered arrangement** n. 1 an ordered arrangement of a specified number of objects selected from a set. The number of distinct permutations of r objects from n is n!/(n-r)! usually written <subscript>n P <subscript>r or <superscript>n P <subscript>r. For example there are six distinct permutations of two objects selected out of three: <1,2>, <1,3>, <2,1>, <2,3>, <3,1>, <3,2>. Compare COMBINATION. 2. any rearrangement of all the elements of a finite sequence, such as (1,3,2) and (3,1,2). It is *odd* or *even* according as the number of exchanges of position yielding it from the original order is odd or even. It is a *cyclic permutation* if it merely advances all the elements a fixed number of places; that is, if it is a CYCLE of maximal LENGTH. A *transposition* is a cycle of degree two, and all permutations factor as products of transpositions. See also SIGNATURE. 3. any BIJECTION of a set to itself, where the set may be finite or infinite. [end quote] The definition makes no comment about how to handle duplicate elements, but we can derive an answer for that: 1) We're told how many permutations there are. Picking r elements out of n gives us n!/(n-r)!. If you throw away duplicate permutations, you will fall short. 2) The number of permutations shouldn't depend on the specific entities being permuted. Permutations of (1, 2, 3, 4) and (A, B, C, D) should be identical. If your set of elements contains duplicates, such as (Red ball, Red ball, Red ball, Black ball, Black ball), we can put the balls into 1:1 correspondence with integers (1, 2, 3, 4, 5), permute the integers, then reverse the mapping to get balls again. If we do this, we ought to get the same result as just permuting the balls directly. (That's not to say that there are never cases where we don't care to distinguish betweem one red ball and another. But in general we do distinguish between them.) I think this argument may hinge on what you consider *distinct*. In this context, if I permute the string "RRRBB", I consider all three characters to be distinct. Object identity is an implementation detail (not all programming languages have "objects"); even equality is an irrelevant detail. If I'm choosing to permute "RRRBB" rather than "RB", then clearly *to me* there must be some distinguishing factor between the three Rs and two Bs. Another source is Wolfram Mathworld: http://mathworld.wolfram.com/Permutation.html which likewise says nothing about discarding repeated permutations when there are repeated elements. See also their page on "Ball Picking": http://mathworld.wolfram.com/BallPicking.html Last but not least, here's a source which clearly distinguishes permutations from "permutations with duplicates": http://mathcentral.uregina.ca/QQ/database/QQ.09.07/h/beth3.html and even gives a distinct formula for calculating the number of permutations. Neither Wolfram Mathworld nor the Collins Dictionary of Maths consider this formula important enough to mention, which suggests strongly that it should be considered separate from the default permutations. (A little like cyclic permutations, which are different again.) -- Steven

On 10/12/2013 10:35 AM, Steven D'Aprano wrote:
I agree that backwards compatibility should be kept, but the current behaviour of itertools.permutations is (IMHO) surprising. So here are my 2c: Until I tried it myself, I was sure that it will be like the corresponding permutations functions in Sage: sage: list(Permutations("aba")) [['a', 'a', 'b'], ['a', 'b', 'a'], ['b', 'a', 'a']] or Mathematica: http://www.wolframalpha.com/input/?i=permutations+of+{a%2C+b%2C+a} Currently the docstring of itertools.permutations just says "Return successive r-length permutations of elements in the iterable", without telling what happens with input of repeated elements. The full doc in the reference manual is better in that regard, but I think at least one example with repeated elements would be nice. Regards, TB

On 10/12/2013 3:35 AM, Steven D'Aprano wrote:
The items of a set are, by definition of a set, distinct, so the question of different but equal permutations does not arise.
The items of a sequence may be duplicates. But in the treatments of permutations I have seen (admittedly not all of them), they are considered to be distinguished by position, so that one may replace the item by counts 1 to n and vice versa.
Back to a set of distinct items again. You are correct that itertools.permutations does the right thing by standard definition.
The question is whether this particular variation is important inportant enough to put in itertools. It is not a combinatorics module and did not start with permutations. -- Terry Jan Reedy

Note that if permutations is made to return only unique permutations, the behaviour of defining unique elements by index can be recovered using: ([it[index] for index in indexes] for indexes in itertools.permutations(range(len(it)))) On Friday, October 11, 2013 2:38:33 PM UTC-4, Neil Girdhar wrote:

On Fri, Oct 11, 2013 at 11:38:33AM -0700, Neil Girdhar wrote:
I dispute this entire premise. Take a simple (and stereotypical) example, picking balls from an urn. Say that you have three Red and Two black balls, and randomly select without replacement. If you count only unique permutations, you get only four possibilities: py> set(''.join(t) for t in itertools.permutations('RRRBB', 2)) {'BR', 'RB', 'RR', 'BB'} which implies that drawing RR is no more likely than drawing BB, which is incorrect. The right way to model this experiment is not to count distinct permutations, but actual permutations: py> list(''.join(t) for t in itertools.permutations('RRRBB', 2)) ['RR', 'RR', 'RB', 'RB', 'RR', 'RR', 'RB', 'RB', 'RR', 'RR', 'RB', 'RB', 'BR', 'BR', 'BR', 'BB', 'BR', 'BR', 'BR', 'BB'] which makes it clear that there are two ways of drawing BB compared to six ways of drawing RR. If that's not obvious enough, consider the case where you have two thousand red balls and two black balls -- do you really conclude that there are the same number of ways to pick RR as BB? So I disagree that counting only distinct permutations is the most useful or common convention. If you're permuting a collection of non-distinct values, you should expect non-distinct permutations. I'm trying to think of a realistic, physical situation where you would only want distinct permutations, and I can't.
Should we consider fixing itertools.permutations and to output only unique permutations (if possible, although I realize that would break code).
Absolutely not. Even if you were right that it should return unique permutations, and I strongly disagree that you were, the fact that it would break code is a deal-breaker. -- Steven

I think it's pretty indisputable that permutations are formally defined this way (and I challenge you to find a source that doesn't agree with that). I'm sure you know that your idea of using permutations to evaluate a multinomial distribution is not efficient. A nicer way to evaluate probabilities is to pass your set through a collections.Counter, and then use the resulting dictionary with scipy.stats.multinomial (if it exists yet). I believe most people will be surprised that len(permutations(iterable)) does count unique permutations. Best, Neil On Fri, Oct 11, 2013 at 10:06 PM, Steven D'Aprano <steve@pearwood.info>wrote:

Related to, but not quite the same as Steven D'Aprano's point, I would find it very strange for itertools.permutations() to return a list that was narrowed to equal-but-not-identical items. This is why I've raised the example of 'items=[Fraction(3,1), Decimal(3.0), 3.0]' several times. I've created the Fraction, Decimal, and float for distinct reasons to get different behaviors and available methods. When I want to look for the permutations of those I don't want "any old random choice of equal values" since presumably I've given them a type for a reason. On the other hand, I can see a little bit of sense that 'itertools.permutations([3,3,3,3,3,3,3])' doesn't *really* need to tell me a list of 7!==5040 things that are exactly the same as each other. On the other hand, I don't know how to generalize that, since my feeling is far less clear for 'itertools.permutations([1,2,3,4,5,6,6])' ... there's redundancy, but there's also important information in the probability and count of specific sequences. My feeling, however, is that if one were to trim down the results from a permutations-related function, it is more interesting to me to only eliminate IDENTICAL items, not to eliminate merely EQUAL ones. On Fri, Oct 11, 2013 at 7:37 PM, Neil Girdhar <mistersheik@gmail.com> wrote:
-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

I honestly think that Python should stick to the mathematical definition of permutations rather than some kind of consensus of the tiny minority of people here. When next_permutation was added to C++, I believe the whole standards committee discussed it and they came up with the thing that makes the most sense. The fact that dict and set use equality is I think the reason that permutations should use equality. Neil On Fri, Oct 11, 2013 at 10:48 PM, David Mertz <mertz@gnosis.cx> wrote:

On 12 Oct 2013 12:56, "Neil Girdhar" <mistersheik@gmail.com> wrote:
I honestly think that Python should stick to the mathematical definition
of permutations rather than some kind of consensus of the tiny minority of people here. When next_permutation was added to C++, I believe the whole standards committee discussed it and they came up with the thing that makes the most sense. The fact that dict and set use equality is I think the reason that permutations should use equality. Why should the behaviour of hash based containers limit the behaviour of itertools? Python required a permutation solution that is memory efficient and works with arbitrary objects, so that's what itertools provides. However, you'd also like a memory efficient iterator for *mathematical* permutations that pays attention to object values and filters out equivalent results. I *believe* the request is equivalent to giving a name to the following genexp: (k for k, grp in groupby(permutations(sorted(input)))) That's a reasonable enough request (although perhaps more suited to the recipes section in the itertools docs), but conflating it with complaints about the way the existing iterator works is a good way to get people to ignore you (especially now the language specific reasons for the current behaviour have been pointed out, along with confirmation of the fact that backwards compatibility requirements would prohibit changing it even if we wanted to). Cheers, Nick.
find it very strange for itertools.permutations() to return a list that was narrowed to equal-but-not-identical items. permutations-related function, it is more interesting to me to only eliminate IDENTICAL items, not to eliminate merely EQUAL ones. this way (and I challenge you to find a source that doesn't agree with that). I'm sure you know that your idea of using permutations to evaluate a multinomial distribution is not efficient. A nicer way to evaluate probabilities is to pass your set through a collections.Counter, and then use the resulting dictionary with scipy.stats.multinomial (if it exists yet). python-ideas+unsubscribe@googlegroups.com.

What you propose, Nick, is definitely different from the several functions that have been bandied about here. I.e.
def nick_permutations(items, r=None): ... return (k for k, grp in groupby(permutations(sorted(items),r)))
["".join(p) for p in nonredundant_permutations('aaabb', 3)] ['aaa', 'aab', 'aba', 'abb', 'baa', 'bab', 'bba']
If I'm thinking of this right, what you give is equivalent to the initial flawed version of 'nonredundant_permutations()' that I suggested, which used '!=' rather than the correct '>' in comparing to the 'last' tuple. FWIW, I deliberately chose the name 'nonredundant_permutations' rather than MRAB's choice of 'unique_permutations' because I think what the filtering does is precisely NOT to give unique ones. Or rather, not to give ALL unique ones, but only those defined by equivalence (i.e. rather than identity). My name is ugly, and if there were to be a function like it in itertools, a better name should be found. But such a name should emphasize that it is "filter by equivalence classes" ... actually, maybe this suggests another function which is instead "filter by identity of tuples", potentially also added to itertools. On Fri, Oct 11, 2013 at 9:35 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

Hi Nick, Rereading my messages, I feel like I haven't been as diplomatic as I wanted. Like everyone here, I care a lot about Python and I want to see it become as perfect as it can be made. If my wording has been too strong, it's only out of passion for Python. I acknowledged in my initial request that it would be impossible to change the default behaviour of itertools.permutations. I understand that that ship has sailed. I think my best proposal is to have an efficient distinct_permutations function in itertools. It should be in itertools so that it is discoverable. It should be a function rather one of the recipes proposed to make it as efficient as possible. (Correct me if I'm wrong, but like the set solution, groupby is also not so efficient.) I welcome the discussion and hope that the most efficient implementation someone here comes up with will be added one day to itertools. Best, Neil On Sat, Oct 12, 2013 at 12:35 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:

On Oct 11, 2013, at 23:55, Neil Girdhar <mistersheik@gmail.com> wrote:
I think my best proposal is to have an efficient distinct_permutations function in itertools. It should be in itertools so that it is discoverable. It should be a function rather one of the recipes proposed to make it as efficient as possible. (Correct me if I'm wrong, but like the set solution, groupby is also not so efficient.)
I welcome the discussion and hope that the most efficient implementation someone here comes up with will be added one day to itertools.
I think getting something onto PyPI (whether as part of more-itertools or elsewhere) and/or the ActiveState recipes (and maybe StackOverflow and CodeReview) is the best way to get from here to there. Continuing to discuss it here, you've only got the half dozen or so people who are on this list and haven't tuned out this thread to come up with the most efficient implementation. Put it out in the world and people will begin giving you comments/bug reports/rants calling you an idiot for missing the obvious more efficient way to do it, and then you can use their code. And then, when you're satisfied with it, you have a concrete proposal for something to add to itertools in python X.Y+1 instead of some implementation to be named later to add one day. I was also going to suggest that you drop the argument about whether this is the one true definition of sequence permutation and just focus on whether it's a useful thing to have, but it looks like you're way ahead of me there, so never mind.

Neil Girdhar writes:
Is there an agreed mathematical definition of permutations of *sequences*? Every definition I can find refers to permutations of *sets*. I think any categorist would agree that there are a large number of maps of _Sequence_ to _Set_, in particular the two obviously useful ones[1]: the one that takes each element of the sequence to a *different* element of the corresponding set, and the one that takes equal elements of the sequence to the *same* element of the corresponding set. The corresponding set need not be the underlying set of the sequence, and which one is appropriate presumably depends on applications.
To the negligible (in several senses of the word) fraction of humanity that participates in C++ standardization. Python is not C++ (thanking all the Roman and Greek gods, and refusing to identify Zeus with Jupiter, nor Aphrodite with Venus<wink/>).
The fact that dict and set use equality is I think the reason that permutations should use equality.
Sequences are not sets, and dict is precisely the wrong example for you to use, since it makes exactly the point that values that are identical may be bound to several different keys. We don't unify keys in a dict just because the values are identical (or equal). Similar, in representing a sequence as a set, we use a set of ordered pairs, with the first component an unique integer indicating position, and the second the sequence element. Since there are several useful mathematical ways to convert sequences to sets, and in particular one very similar, if not identical, to the one you like is enshrined in the very convenient constructor set(), I think it's useful to leave it as it is.
It is universally agreed that a list of n distinct symbols has n! permutations.
But that's because there's really no sensible definition of "underlying set" for such a list except the set containing exactly the same elements as the list.[2] But there is no universal agreement that "permutations of a list" is a sensible phrase. For example, although the Wikipedia article Permutation refers to lists of permutations, linked list representations of data, to the "list of objects" for use in Cauchy's notation, and to the cycle representation as a list of sequences, it doesn't once refer to permutation of a list. They're obvious not averse to discussing lists, but the word use for the entity being permuted is invariably "set". Footnotes: [1] And some maps not terribly useful for our purposes, such as one that maps all sequences to a singleton. [2] A categorist would disagree, but that's not interesting.

I'm sorry, but I can't find a reference supporting the statement that the current permutations function is consistent with the mathematical definition. Perhaps you would like to find a reference? A quick search yielded the book "the Combinatorics of Permutations": http://books.google.ca/books?id=Op-nF-mBR7YC&lpg=PP1 Please look in the chapter "Permutation of multisets". Best, Neil On Sat, Oct 12, 2013 at 2:34 AM, Steven D'Aprano <steve@pearwood.info>wrote:

On 12 Oct 2013 17:18, "Neil Girdhar" <mistersheik@gmail.com> wrote:
I'm sorry, but I can't find a reference supporting the statement that the
current permutations function is consistent with the mathematical definition. Perhaps you would like to find a reference? A quick search yielded the book "the Combinatorics of Permutations": http://books.google.ca/books?id=Op-nF-mBR7YC&lpg=PP1 Please look in the chapter "Permutation of multisets". Itertools effectively produces the permutation of (index, value) pairs. Hence Steven's point about the permutations of a list not being mathematically defined, so you have to decide what set to map it to in order to decide what counts as a unique value. The mapping itertools uses considers position in the iterable relevant so exchanging two values that are themselves equivalent is still considered a distinct permutation since their original position is taken into account. Like a lot of mathematics, it's a matter of paying close attention to which entities are actually being manipulated and how the equivalence classes are being defined :) Hence the current proposal amounts to adding another variant that provides the permutations of an unordered multiset instead of those of a set of (index, value) 2-tuples (with the indices stripped from the results). One interesting point is that combining collections.Counter.elements() with itertools.permutations() currently does the wrong thing, since itertools.permutations() *always* considers iterable order significant, while for collections.Counter.elements() it's explicitly arbitrary. Cheers, Nick.
wrote: python-ideas+unsubscribe@googlegroups.com.

On Sat, Oct 12, 2013 at 8:07 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
I hadn't thought about it, but as I read the docs for 3.4 (and it's the same back through 2.7), not only would both of these be permissible in a Python implementation:
list(collections.Counter({'a':2,'b':1}).elements()) ['a', 'a', 'b']
Or:
list(collections.Counter({'a':2,'b':1}).elements()) ['b', 'a', 'a']
But even this would be per documentation (although really unlikely as an implementation):
list(collections.Counter({'a':2,'b':1}).elements()) ['a', 'b', 'a']
-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

Yes, you're right and I understand what's been done although like the 30 upvoters to the linked stackoverflow question, I find the current behaviour surprising and would like to see a distinct_permutations function. How do I start to submit a patch? Neil On Sat, Oct 12, 2013 at 11:07 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:

On Oct 12, 2013, at 11:56 AM, Neil Girdhar <mistersheik@gmail.com> wrote:
, I find the current behaviour surprising and would like to see a distinct_permutations function. How do I start to submit a patch?
You can submit your patch at http://bugs.python.org and assign it to me (the module designer and maintainer). That said, the odds of it being accepted are slim. There are many ways to write combinatoric functions (Knuth has a whole book on the subject) and I don't aspire to include multiple variants unless there are strong motivating use cases. In general, if someone wants to eliminate duplicates from the population, they can do so easily with: permutations(set(population), n) The current design solves the most common use cases and it has some nice properties such as: * permutations is a subsequence of product * no assumptions are made about the comparability or orderability of members of the population * len(list(permutations(range(n), r))) == n! / (n-r)! just like you were taught in school * it is fast For more exotic needs, I think is appropriate to look outside the standard library to more full-featured combinatoric libraries (there are several listed at pypi.python.org). Raymond

On 10/12/2013 05:44 PM, Raymond Hettinger wrote:
+1 About the only improvement I can see would be a footnote in the itertools doc table that lists the different combinatorics. Being a naive permutations user myself I would have made the mistake of thinking that "r-length tuples, all possible orderings, no repeated elements" meant no repeated values. The longer text for permutations makes it clear how it works. My rst-foo is not good enough to link from the table down into the permutation text where the distinction is made clear. If no one beats me to a proposed patch I'll see if I can figure it out. -- ~Ethan~

Hi Raymond, I agree with you on the consistency point with itertools.product. That's a great point. However, permutations(set(population)) is not the correct way to take the permutations of a multiset. Please take a look at how permutations are taken from a multiset from any of the papers I linked or any paper that you can find on the internet. The number of permutations of multiset is n! / \prod a_i! for a_i are the element counts — just like I was taught in school. There is currently no fast way to find these permutations of a multiset and it is a common operation for solving problems. What is needed, I think is a function multiset_permutations that accepts an iterable. Best, Neil On Sat, Oct 12, 2013 at 8:44 PM, Raymond Hettinger < raymond.hettinger@gmail.com> wrote:

On Sat, Oct 12, 2013 at 05:44:38PM -0700, Raymond Hettinger wrote:
In fairness Raymond, the proposal is not to eliminate duplicates from the population, but from the permutations themselves. Consider the example I gave earlier, where you're permuting "RRRBB" two items at a time. There are 20 permutations including duplicates, but sixteen of them are repeated: py> list(''.join(t) for t in permutations("RRRBB", 2)) ['RR', 'RR', 'RB', 'RB', 'RR', 'RR', 'RB', 'RB', 'RR', 'RR', 'RB', 'RB', 'BR', 'BR', 'BR', 'BB', 'BR', 'BR', 'BR', 'BB'] py> set(''.join(t) for t in permutations("RRRBB", 2)) {'BR', 'RB', 'RR', 'BB'} But if you eliminate duplicates from the population first, you get only two permutations: py> list(''.join(t) for t in permutations(set("RRRBB"), 2)) ['BR', 'RB'] If it were just a matter of calling set() on the output of permutations, that would be trivial enough. But, you might care about order, or elements might not be hashable, or you might have a LOT of permutations to generate before discarding: population = "R"*1000 + "B"*500 set(''.join(t) for t in permutations(population, 2)) # takes a while... In my opinion, if unique_permutations is no more efficient than calling set on the output of permutations, it's not worth it. But if somebody can come up with an implementation which is significantly more efficient, without making unreasonable assumptions about orderability, hashability or even comparability, then in my opinion that might be worthwhile. -- Steven

On Oct 12, 2013, at 6:47 PM, Steven D'Aprano <steve@pearwood.info> wrote:
the proposal is not to eliminate duplicates from the population, but from the permutations themselves.
I'm curious about the use cases for this. Other than red/blue marble examples and some puzzle problems, does this come-up in any real problems? Do we actually need this? Raymond

My intuition is that we want Python to be "complete". Many other languages can find the permutations of a multiset. Python has a permutations function. Many people on stackoverflow expected that function to be able to find those permutations. One suggestion: Why not make it so that itertools.permutations checks if its argument is an instance of collections.Mapping? If it is, we could interpret the items as a mapping from elements to positive integers, which is a compact representation of a multiset. Then, it could do the right thing for that case. Best, Neil On Sat, Oct 12, 2013 at 11:03 PM, Raymond Hettinger < raymond.hettinger@gmail.com> wrote:

On 13 October 2013 17:38, Neil Girdhar <mistersheik@gmail.com> wrote:
Nope, we expressly *don't* want the standard library to be "complete", because that would mean growing to the size of PyPI (or larger). There's always going to be scope for applications to adopt new domain specific dependencies with more in-depth support than that provided by the standard library. Many standard library modules are in fact deliberately designed as "stepping stone" modules that will meet the needs of code which have an incidental relationship to that task, but will need to be replaced with something more sophisticated for code directly related to that domain. Many times, that means they will ignore as irrelevant distinctions that are critical in certain contexts, simply because they don't come up all that often outside those specific domains, and addressing them involves making the core module more complicated to use for more typical cases. In this case, the proposed alternate permutations mechanism only makes a difference when: 1. The data set contains equivalent values 2. Input order is not considered significant, so exchanging equivalent values should *not* create a new permutation (i.e. multiset permutations rather than sequence permutations). If users aren't likely to encounter situations where that makes a difference, then providing both in the standard library isn't being helpful, it's being actively user hostile by asking them to make a decision they're not yet qualified to make for the sake of the few experts that specifically need . Hence Raymond's request for data modelling problems outside the "learning or studying combinatorics" context to make the case for standard library inclusion. Interestingly, I just found another language which has the equivalent of the currrent behaviour of itertools.permutations: Haskell has it as Data.List.permutations. As far as I can tell, Haskell doesn't offer support for multiset permutations in the core, you need an additional package like Math.Combinatorics (see: http://hackage.haskell.org/package/multiset-comb-0.2.3/docs/Math-Combinatori...). Since iterator based programming in Python is heavily inspired by Haskell, this suggests that the current behaviour of itertools.permutations is appropriate and that Raymond is right to be dubious about including multiset permutations support directly in the standard library. Those interested in improving the experience of writing combinatorics code in Python may wish to look into helping out with the combinatorics package on PyPI: http://phillipmfeldman.org/Python/for_developers.html (For example, politely approach Phillip to see if he is interested in hosting it on GitHub or BitBucket, providing Sphinx docs on ReadTheDocs, improving the PyPI metadata, etc - note I have no experience with this package, it's just the first hit for "python combinatorics")
If you want to go down the path of only caring about hashable values, you may want to argue for a permutations method on collections.Counter (it's conceivable that approach has the potential to be even faster than an approach based on accepting and processing an arbitrary iterable, since it can avoid generating repeated values in the first place). A Counter based multiset permutation algorithm was actually posted to python-list back in 2009, just after collections.Counter was introduced: https://mail.python.org/pipermail/python-list/2009-January/521685.html I just created an updated version of that recipe and posted it as https://bitbucket.org/ncoghlan/misc/src/default/multiset_permutations.py Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Executive summary: Thanks for discussion everyone. I'm now convinced that itertools.permutations is fine as it is. I am not totally convinced that multiset_permutations doesn't belong in itertools, or else there should be a standard combinatorics library. On Sun, Oct 13, 2013 at 5:27 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
By complete I meant that just as if you were to add the "error function, erf" to math, you would want to add an equivalent version to cmath. When I saw the permutation function in itertools, I expected that it would work on both sets and multisets, or else there would be another function that did.
Good point.
You've convinced me that itertools permutations is doing the right thing :) I'm not sure if multiset permutations should be in the standard library or not. It is very useful.
Nice find!
I just created an updated version of that recipe and posted it as https://bitbucket.org/ncoghlan/misc/src/default/multiset_permutations.py
Why not just define multiset_permutations to accept a dict (dict is a base class of Counter)? Since you're going to convert from an iterable (with duplicates) to a dict (via Counter) anyway, why not accept it as such. Users who want an interface similar to itertools.permutations can pass their iterable through Counter first. Cheers,

On Sun, Oct 13, 2013 at 9:56 PM, Neil Girdhar <mistersheik@gmail.com> wrote:
By complete I meant that just as if you were to add the "error function, erf" to math, you would want to add an equivalent version to cmath.
An interesting choice of example. *Why* would you want to do so? Since you bring this up, I assume you're already aware that math.erf exists but cmath.erf does not. I believe there are good, practical reasons *not* to add cmath.erf, in spite of the existence of math.erf. Not least of these is that cmath.erf would be significantly more complicated to implement and of significantly less interest to users. And perhaps there's a parallel with itertools.permutations and the proposed itertools.multiset_permutations here... -- Mark

Actually I didn't notice that. It seems weird to find erf in math, but erf for complex numbers in scipy.special. It's just about organization and user discovery. I realize that from the developer's point of view, erf for complex numbers is complicated, but why does the user care? On Mon, Oct 14, 2013 at 8:29 AM, Mark Dickinson <dickinsm@gmail.com> wrote:

On 14 October 2013 13:37, Neil Girdhar <mistersheik@gmail.com> wrote:
This is the first time I've seen a suggestion that there should be cmath.erf. So I would say that most users don't care about having a complex error function. Whoever would take the time to implement the complex error function might instead spend that time implementing and maintaining something that users do care about. Oscar

On Mon, Oct 14, 2013 at 08:37:59AM -0400, Neil Girdhar wrote:
99% of users don't care about math.errf at all. Of those who do, 99% don't care about cmath.errf. I'd like to see cmath.errf because I'm a maths junkie, but if I were responsible for *actually doing the work* I'd make the same decision to leave cmath.errf out and leave it for a larger, more complete library like scipy. There are an infinitely large number of potential programs which could in principle be added to Python's std lib, and only a finite number of person-hours to do the work. And there are costs to adding software to the std lib, not just benefits. -- Steven

On 15 October 2013 11:29, Neil Girdhar <mistersheik@gmail.com> wrote:
You make a good point. It was just a random example to illustrate that desire for completeness.
The desire for conceptual purity and consistency is a good one, it just needs to be balanced against the practical constraints of writing, maintaining, documenting, teaching and learning the standard library. "It isn't worth the hassle" is the answer to a whole lot of "Why not X?" questions in software development :) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Mon, Oct 14, 2013 at 6:39 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
"It isn't worth the hassle" is the answer to a whole lot of "Why not X?" questions in software development :)
Sometimes it's not worth the hassle on either side. If this is added to the standard library and I write code that uses it, my code won't be backwards compatible with older versions of Python. So I'll either have to not support older Python versions or use an alternative implementation. If this is on pypi then that's not an issue. Not everything useful should be in the standard library. If I had come across a need for this, I'd have just used unique_everseen(permuations(...)) until performance became an issue. --- Bruce I'm hiring: http://www.cadencemd.com/info/jobs Latest blog post: Alice's Puzzle Page http://www.vroospeak.com Learn how hackers think: http://j.mp/gruyere-security

On Oct 13, 2013, at 1:56 PM, Neil Girdhar <mistersheik@gmail.com> wrote:
I'm now convinced that itertools.permutations is fine as it is. I am not totally convinced that multiset_permutations doesn't belong in itertools,
Now that we have a good algorithm, I'm open to adding this to itertools, but it would need to have a name that didn't create any confusion with respect to the existing tools, perhaps something like: anagrams(population, r) Return an iterator over a all distinct r-length permutations of the population. Unlike permutations(), element uniqueness is determined by value rather than by position. Also, anagrams() makes no guarantees about the order the tuples are generated. Raymond

Excellent! My top two names are 1. multiset_permutations (reflects the mathematical name) 2. anagrams Note that we may also want to add multiset_combinations. It hasn't been part of this discussion, but it may be part of another discussion and I wanted to point this out as I know many of you are future-conscious. We seem to be all agreed that we want to accept "r", the length of the permutation desired. With permutations, the *set* is passed in as a iterable representing distinct elements. With multiset_permutations, there are three ways to pass in the *multiset*: - 1. an iterable whose elements (or an optional key function applied to which) are compared using __eq__ - 2. a dict (of which collections.Counter) is a subclass - 3. an iterable whose elements are key-value pairs and whose values are counts Example uses: 1. multiset_permutations(word) 2. multiset_permutations(Counter(word)) 3. multiset_permutations(Counter(word).items()) >From a dictionary: 1. multiset_permutations(itertools.chain.from_iterable(itertools.repeat(k, v) for k, v in d.items())) 2. multiset_permutations(d) 3. multiset_permutations(d.items()) >From an iterable of key-value pairs: 1. multiset_permutations(itertools.chain.from_iterable(itertools.repeat(k, v) for k, v in it)) 2. multiset_permutations({k: v for k, v in it}) 3. multiset_permutations(it) The advantage of 2 is that no elements are compared by multiset_permutations (so it is simpler and faster). The advantage of 3 is that no elements are compared, and they need not be comparable or hashable. This version is truly a generalization of the "permutations" function. This way, for any input "it" you could pass to permutations, you could equivalently pass zip(it, itertools.repeat(1)) to multiset_permutations. Comments? Neil On Mon, Oct 14, 2013 at 1:56 PM, Raymond Hettinger < raymond.hettinger@gmail.com> wrote: > > On Oct 13, 2013, at 1:56 PM, Neil Girdhar <mistersheik@gmail.com> wrote: > > I'm now convinced that itertools.permutations is fine as it is. I am not > totally convinced that multiset_permutations doesn't belong in itertools, > > > Now that we have a good algorithm, I'm open to adding this to itertools, > but it would need to have a name that didn't create any confusion > with respect to the existing tools, perhaps something like: > > anagrams(population, r) > > Return an iterator over a all distinct r-length permutations > of the population. > > Unlike permutations(), element uniqueness is determined > by value rather than by position. Also, anagrams() makes > no guarantees about the order the tuples are generated. > > > > Raymond >

[Raymond Hettinger]
Now that we have a good algorithm, I'm open to adding this to itertools,
I remain reluctant, because I still haven't seen a compelling use case. Yes, it generates all distinct r-letter anagrams - but so what? LOL ;-) Seriously, I've written anagram programs several times in my life, and generating "all possible" never occurred to me because it's so crushingly inefficient.
"anagrams" is great! Inspired :-) What about an optional argument to define what the _user_ means by "equality"? The algorithm I posted had an optional `equal=operator.__eq__` argument. Else you're going to be pushed to add a clumsy `TransformAnagrams` later <0.4 wink>.
Well, MRAB's algorithm (and my rewrite) guarantees that _if_ the elements support a total order, and appear in the iterable in non-decreasing order, then the anagrams are generated in non-decreasing lexicographic order. And that may be a useful guarantee (hard to tell without a real use case, though!). There's another ambiguity I haven't seen addressed explicitly. Consider this:
(3, 3.0, Fraction(3, 1)) All the algorithms posted here work to show all 3 elements in this case. But why? If the elements all equal, then other outputs "should be" acceptable too. Like (3, 3, 3) or (3.0, Fraction(3, 1), 3.0) etc. All those outputs compare equal! This isn't visible if, e.g., the iterable's elements are letters (where a == b if and only if str(a) == str(b), so the output looks the same no matter what). At least "my" algorithm could be simplified materially if it only saved (and iterated over) a (single) canonical representative for each equivalence class, instead of saving entire equivalence classes and then jumping through hoops to cycle through each equivalence class's elements. But, for some reason, output (3, 3, 3) just "looks wrong" above. I'm not sure why.

On 15/10/2013 01:48, Tim Peters wrote:
[snip] I can see that one disadvantage of my algorithm is that the worst-case storage requirement is O(n^2) (I think). This is because the set of first items could have N members, the set of second items could have N-1 members, etc. On the other hand, IMHO, the sheer number of permutations will become a problem long before the memory requirement does! :-)

On 13/10/2013 08:38, Neil Girdhar wrote:
My intuition is that we want Python to be "complete".
No thank you. I much prefer "Python in a Nutshell" the size it is now, I'm not interested in competing with (say) "Java in a Nutshell". -- Roses are red, Violets are blue, Most poems rhyme, But this one doesn't. Mark Lawrence

One example of prior art: Maxima, which I use in its wxMaxima incarnation. """ Function: permutations(a) Returns a set of all distinct permutations of the members of the list or set a. Each permutation is a list, not a set. When a is a list, duplicate members of a are included in the permutations """ Examples from a Maxima shell:
permutations([1, 2. 3]); {[1,2,3],[1,3,2],[2,1,3],[2,3,1],[3,1,2],[3,2,1]}
permutations({1, 1.0, 1, 1.0}) {[1,1.0],[1.0,1]}
That last one may be surprising at first, but note that it's the first example where I passed a _set_ (instead of a list). And:
{1, 1.0, 1, 1.0} {1,1.0}
Best I can tell, Maxima has no builtin function akin to our permutations(it, r) when r < len(it). But Maxima has a huge number of builtin functions, and I often struggle to find ones I _want_ in its docs ;-)

On Oct 11, 2013, at 19:48, David Mertz <mertz@gnosis.cx> wrote:
My feeling, however, is that if one were to trim down the results from a permutations-related function, it is more interesting to me to only eliminate IDENTICAL items, not to eliminate merely EQUAL ones.
I agree with the rest of your message, but I still think you're wrong here. Anyone who is surprised by distinct_permutations((3.0, 3)) treating the two values the same would be equally surprised by {3.0, 3} having only one member. Or by groupby((3.0, 'a'), (3, 'b')) only having one group. And so on. In Python, sets, dict keys, groups, etc. work by ==. That was a choice that could have been made differently, but Python made that choice long ago, and has applied it completely consistently, and it would be very strange to choose differently in this case.

Hi Andrew, I've sort of said as much in my last reply to Nick. But maybe I can clarify further. I can imagine *someone* wanting a filtering of permutations by either identify or equality. Maybe, in fact, by other comparisons also for generality. This might suggest an API like the following: equal_perms = distinct_permutations(items, r, filter_by=operator.eq) ident_perms = distinct_permutations(items, r, filter_by=operator.is_) Or even perhaps, in some use-case that isn't clear to me, e.g. start_same_perms = distinct_permutations(items, r, filter_by=lambda a,b: a[0]==b[0]) Or perhaps more plausibly, some predicate that, e.g. tests if two returned tuples are the same under case normalization of the strings within them. I guess the argument then would be what the default value of 'filter_by' might be... but that seems less important to me if there were an option to pass a predicate as you liked. On Fri, Oct 11, 2013 at 7:57 PM, Andrew Barnert <abarnert@yahoo.com> wrote:
-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

Btw. My implementation of nonredundant_permutations *IS* guaranteed to work by the docs for Python 3.4. Actually for Python 2.7+. That is, it's not just an implementation accident (as I thought before I checked), but a promised API of itertools.permutations that: Permutations are emitted in lexicographic sort order. So, if the input iterable is sorted, the permutation tuples will be produced in sorted order. As long as that holds, my function will indeed behave correctly (but of course, with the limitation that it blows up if different items in the argument iterable cannot be compared using operator.lt(). On Fri, Oct 11, 2013 at 10:38 PM, David Mertz <mertz@gnosis.cx> wrote:
-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

On Sat, Oct 12, 2013 at 12:02 AM, Neil Girdhar <mistersheik@gmail.com>wrote:
Why not just use the standard python way to generalize this: "key" rather than the nonstandard "filter_by".
Yes, 'key' is a much better name than what I suggested. I'm not quite sure how best to implement this still. I guess MRAB's recursive approach should work, even though I like the simplicity of my style that takes full advantage of the existing itertools.permutations() (and uses 1/3 as many lines of--I think clearer--code). His has the advantage, however, that it doesn't require operator.lt() to work... however, without benchmarking, I have a pretty strong feeling that my suggestion will be faster since it avoids all that recursive call overhead. Maybe I'm wrong about that though.
-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

On 10/12/2013 02:09 AM, David Mertz wrote:
I'd like to see some nice examples of how it's used. It seems to me, There is some mixing of combination/permutation concepts. def filter_repeats(itr): seen = set() for i in itr: if i in seen: continue seen.add(i) yield i def unique_combinations(itr, length=None): if length == None: length = len(itr) return it.product(filter_repeats(itr), repeat=length) This one isn't the same as yours, but it's an example of filter_repeats. While that isn't the most efficient way in some cases, filtering repeat items out of things is a fairly common problem. Cheers, Ron

On Fri, Oct 11, 2013 at 10:37:02PM -0400, Neil Girdhar wrote:
If by "this way" you mean "unique permutations only", then yes it *completely* disputable, and I am doing so right now. I'm not arguing one way or the other for a separate "unique_permutations" generator, just that the existing permutations generator does the right thing. If you're satisfied with that answer, you can stop reading now, because the rest of my post is going to be rather long: TL;DR: If you want a unique_permutations generator, that's a reasonable request. If you insist on changing permutations, that's unreasonable, firstly because the current behaviour is correct, and secondly because backwards compatibility would constrain it to keep the existing behaviour even if it were wrong. . . . Still here? Okay then, let me justify why I say the current behaviour is correct. Speaking as a math tutor who has taught High School level combinatorics for 20+ years, I've never come across any text book or source that defines permutations in terms of unique permutations only. In every case that I can remember, or that I still have access to, unique permutations is considered a different kind of operation ("permutations ignoring duplicates", if you like) rather than the default. E.g. "Modern Mathematics 6" by Fitzpatrick and Galbraith has a separate section for permutations with repetition, gives the example of taking permutations from the word "MAMMAL", and explicitly contrasts situations where you consider the three letters M as "different" from when you consider them "the same". But in all such cases, such a situation is discussed as a restriction on permutations, not an expansion, that is: * there are permutations; * sometimes you want to only consider unique permutations; rather than: * there are permutations, which are always unique; * sometimes you want to consider things which are like permutations except they're not necessarily unique. I'd even turn this around and challenge you to find a source that *does* define them as always unique. Here's a typical example, from the Collins Dictionary of Mathematics: [quote] **permutation** or **ordered arrangement** n. 1 an ordered arrangement of a specified number of objects selected from a set. The number of distinct permutations of r objects from n is n!/(n-r)! usually written <subscript>n P <subscript>r or <superscript>n P <subscript>r. For example there are six distinct permutations of two objects selected out of three: <1,2>, <1,3>, <2,1>, <2,3>, <3,1>, <3,2>. Compare COMBINATION. 2. any rearrangement of all the elements of a finite sequence, such as (1,3,2) and (3,1,2). It is *odd* or *even* according as the number of exchanges of position yielding it from the original order is odd or even. It is a *cyclic permutation* if it merely advances all the elements a fixed number of places; that is, if it is a CYCLE of maximal LENGTH. A *transposition* is a cycle of degree two, and all permutations factor as products of transpositions. See also SIGNATURE. 3. any BIJECTION of a set to itself, where the set may be finite or infinite. [end quote] The definition makes no comment about how to handle duplicate elements, but we can derive an answer for that: 1) We're told how many permutations there are. Picking r elements out of n gives us n!/(n-r)!. If you throw away duplicate permutations, you will fall short. 2) The number of permutations shouldn't depend on the specific entities being permuted. Permutations of (1, 2, 3, 4) and (A, B, C, D) should be identical. If your set of elements contains duplicates, such as (Red ball, Red ball, Red ball, Black ball, Black ball), we can put the balls into 1:1 correspondence with integers (1, 2, 3, 4, 5), permute the integers, then reverse the mapping to get balls again. If we do this, we ought to get the same result as just permuting the balls directly. (That's not to say that there are never cases where we don't care to distinguish betweem one red ball and another. But in general we do distinguish between them.) I think this argument may hinge on what you consider *distinct*. In this context, if I permute the string "RRRBB", I consider all three characters to be distinct. Object identity is an implementation detail (not all programming languages have "objects"); even equality is an irrelevant detail. If I'm choosing to permute "RRRBB" rather than "RB", then clearly *to me* there must be some distinguishing factor between the three Rs and two Bs. Another source is Wolfram Mathworld: http://mathworld.wolfram.com/Permutation.html which likewise says nothing about discarding repeated permutations when there are repeated elements. See also their page on "Ball Picking": http://mathworld.wolfram.com/BallPicking.html Last but not least, here's a source which clearly distinguishes permutations from "permutations with duplicates": http://mathcentral.uregina.ca/QQ/database/QQ.09.07/h/beth3.html and even gives a distinct formula for calculating the number of permutations. Neither Wolfram Mathworld nor the Collins Dictionary of Maths consider this formula important enough to mention, which suggests strongly that it should be considered separate from the default permutations. (A little like cyclic permutations, which are different again.) -- Steven

On 10/12/2013 10:35 AM, Steven D'Aprano wrote:
I agree that backwards compatibility should be kept, but the current behaviour of itertools.permutations is (IMHO) surprising. So here are my 2c: Until I tried it myself, I was sure that it will be like the corresponding permutations functions in Sage: sage: list(Permutations("aba")) [['a', 'a', 'b'], ['a', 'b', 'a'], ['b', 'a', 'a']] or Mathematica: http://www.wolframalpha.com/input/?i=permutations+of+{a%2C+b%2C+a} Currently the docstring of itertools.permutations just says "Return successive r-length permutations of elements in the iterable", without telling what happens with input of repeated elements. The full doc in the reference manual is better in that regard, but I think at least one example with repeated elements would be nice. Regards, TB

On 10/12/2013 3:35 AM, Steven D'Aprano wrote:
The items of a set are, by definition of a set, distinct, so the question of different but equal permutations does not arise.
The items of a sequence may be duplicates. But in the treatments of permutations I have seen (admittedly not all of them), they are considered to be distinguished by position, so that one may replace the item by counts 1 to n and vice versa.
Back to a set of distinct items again. You are correct that itertools.permutations does the right thing by standard definition.
The question is whether this particular variation is important inportant enough to put in itertools. It is not a combinatorics module and did not start with permutations. -- Terry Jan Reedy
participants (18)
-
Andrew Barnert
-
Bruce Leban
-
David Mertz
-
Ethan Furman
-
Mark Dickinson
-
Mark Lawrence
-
MRAB
-
Neil Girdhar
-
Nick Coghlan
-
Oscar Benjamin
-
Raymond Hettinger
-
Ron Adam
-
Serhiy Storchaka
-
Stephen J. Turnbull
-
Steven D'Aprano
-
TB
-
Terry Reedy
-
Tim Peters