Mailman 3 Extremely weird itertools.permutations - Python-ideas

Extremely weird itertools.permutations

Neil Girdhar

Oct. 11, 2013

6:38 p.m.

"It is universally agreed that a list of n distinct symbols has n! permutations. However, when the symbols are not distinct, the most common convention, in mathematics and elsewhere, seems to be to count only distinct permutations." — http://stackoverflow.com/questions/6534430/why-does-pythons-itertools-permut.... Should we consider fixing itertools.permutations and to output only unique permutations (if possible, although I realize that would break code). It is completely non-obvious to have permutations returning duplicates. For a non-breaking compromise what about adding a flag? Best, Neil

Attachments:

attachment.htm (text/html — 690 bytes)

Show replies by date

Neil Girdhar

October 2013

6:50 p.m.

Note that if permutations is made to return only unique permutations, the behaviour of defining unique elements by index can be recovered using: ([it[index] for index in indexes] for indexes in itertools.permutations(range(len(it)))) On Friday, October 11, 2013 2:38:33 PM UTC-4, Neil Girdhar wrote:

...

Serhiy Storchaka

7:29 p.m.

11.10.13 21:38, Neil Girdhar написав(ла):

...

I think this should be separated function.

MRAB

9:25 p.m.

On 11/10/2013 20:29, Serhiy Storchaka wrote:

...

Steven D'Aprano

2:06 a.m.

On Fri, Oct 11, 2013 at 11:38:33AM -0700, Neil Girdhar wrote:

...

I dispute this entire premise. Take a simple (and stereotypical) example, picking balls from an urn. Say that you have three Red and Two black balls, and randomly select without replacement. If you count only unique permutations, you get only four possibilities: py> set(''.join(t) for t in itertools.permutations('RRRBB', 2)) {'BR', 'RB', 'RR', 'BB'} which implies that drawing RR is no more likely than drawing BB, which is incorrect. The right way to model this experiment is not to count distinct permutations, but actual permutations: py> list(''.join(t) for t in itertools.permutations('RRRBB', 2)) ['RR', 'RR', 'RB', 'RB', 'RR', 'RR', 'RB', 'RB', 'RR', 'RR', 'RB', 'RB', 'BR', 'BR', 'BR', 'BB', 'BR', 'BR', 'BR', 'BB'] which makes it clear that there are two ways of drawing BB compared to six ways of drawing RR. If that's not obvious enough, consider the case where you have two thousand red balls and two black balls -- do you really conclude that there are the same number of ways to pick RR as BB? So I disagree that counting only distinct permutations is the most useful or common convention. If you're permuting a collection of non-distinct values, you should expect non-distinct permutations. I'm trying to think of a realistic, physical situation where you would only want distinct permutations, and I can't.

...

Should we consider fixing itertools.permutations and to output only unique permutations (if possible, although I realize that would break code).

Absolutely not. Even if you were right that it should return unique permutations, and I strongly disagree that you were, the fact that it would break code is a deal-breaker. -- Steven

Neil Girdhar

2:37 a.m.

I think it's pretty indisputable that permutations are formally defined this way (and I challenge you to find a source that doesn't agree with that). I'm sure you know that your idea of using permutations to evaluate a multinomial distribution is not efficient. A nicer way to evaluate probabilities is to pass your set through a collections.Counter, and then use the resulting dictionary with scipy.stats.multinomial (if it exists yet). I believe most people will be surprised that len(permutations(iterable)) does count unique permutations. Best, Neil On Fri, Oct 11, 2013 at 10:06 PM, Steven D'Aprano <steve@pearwood.info>wrote:

...

On Fri, Oct 11, 2013 at 11:38:33AM -0700, Neil Girdhar wrote:

...
"It is universally agreed that a list of n distinct symbols has n! permutations. However, when the symbols are not distinct, the most common convention, in mathematics and elsewhere, seems to be to count only distinct permutations." —

I dispute this entire premise. Take a simple (and stereotypical) example, picking balls from an urn.

Say that you have three Red and Two black balls, and randomly select without replacement. If you count only unique permutations, you get only four possibilities:

py> set(''.join(t) for t in itertools.permutations('RRRBB', 2)) {'BR', 'RB', 'RR', 'BB'}

which implies that drawing RR is no more likely than drawing BB, which is incorrect. The right way to model this experiment is not to count distinct permutations, but actual permutations:

py> list(''.join(t) for t in itertools.permutations('RRRBB', 2)) ['RR', 'RR', 'RB', 'RB', 'RR', 'RR', 'RB', 'RB', 'RR', 'RR', 'RB', 'RB', 'BR', 'BR', 'BR', 'BB', 'BR', 'BR', 'BR', 'BB']

which makes it clear that there are two ways of drawing BB compared to six ways of drawing RR. If that's not obvious enough, consider the case where you have two thousand red balls and two black balls -- do you really conclude that there are the same number of ways to pick RR as BB?

So I disagree that counting only distinct permutations is the most useful or common convention. If you're permuting a collection of non-distinct values, you should expect non-distinct permutations.

I'm trying to think of a realistic, physical situation where you would only want distinct permutations, and I can't.

...
Should we consider fixing itertools.permutations and to output only unique permutations (if possible, although I realize that would break code).

Absolutely not. Even if you were right that it should return unique permutations, and I strongly disagree that you were, the fact that it would break code is a deal-breaker.

-- Steven _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas

--

--- You received this message because you are subscribed to a topic in the Google Groups "python-ideas" group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/python-ideas/dDttJfkyu2k/unsubscribe. To unsubscribe from this group and all its topics, send an email to python-ideas+unsubscribe@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.

David Mertz

2:48 a.m.

Related to, but not quite the same as Steven D'Aprano's point, I would find it very strange for itertools.permutations() to return a list that was narrowed to equal-but-not-identical items. This is why I've raised the example of 'items=[Fraction(3,1), Decimal(3.0), 3.0]' several times. I've created the Fraction, Decimal, and float for distinct reasons to get different behaviors and available methods. When I want to look for the permutations of those I don't want "any old random choice of equal values" since presumably I've given them a type for a reason. On the other hand, I can see a little bit of sense that 'itertools.permutations([3,3,3,3,3,3,3])' doesn't *really* need to tell me a list of 7!==5040 things that are exactly the same as each other. On the other hand, I don't know how to generalize that, since my feeling is far less clear for 'itertools.permutations([1,2,3,4,5,6,6])' ... there's redundancy, but there's also important information in the probability and count of specific sequences. My feeling, however, is that if one were to trim down the results from a permutations-related function, it is more interesting to me to only eliminate IDENTICAL items, not to eliminate merely EQUAL ones. On Fri, Oct 11, 2013 at 7:37 PM, Neil Girdhar <mistersheik@gmail.com> wrote:

...

I think it's pretty indisputable that permutations are formally defined this way (and I challenge you to find a source that doesn't agree with that). I'm sure you know that your idea of using permutations to evaluate a multinomial distribution is not efficient. A nicer way to evaluate probabilities is to pass your set through a collections.Counter, and then use the resulting dictionary with scipy.stats.multinomial (if it exists yet).

I believe most people will be surprised that len(permutations(iterable)) does count unique permutations.

Best,

Neil

On Fri, Oct 11, 2013 at 10:06 PM, Steven D'Aprano <steve@pearwood.info>wrote:

...
On Fri, Oct 11, 2013 at 11:38:33AM -0700, Neil Girdhar wrote:

...
"It is universally agreed that a list of n distinct symbols has n! permutations. However, when the symbols are not distinct, the most common convention, in mathematics and elsewhere, seems to be to count only distinct permutations." —

I dispute this entire premise. Take a simple (and stereotypical) example, picking balls from an urn.

Say that you have three Red and Two black balls, and randomly select without replacement. If you count only unique permutations, you get only four possibilities:

py> set(''.join(t) for t in itertools.permutations('RRRBB', 2)) {'BR', 'RB', 'RR', 'BB'}

which implies that drawing RR is no more likely than drawing BB, which is incorrect. The right way to model this experiment is not to count distinct permutations, but actual permutations:

py> list(''.join(t) for t in itertools.permutations('RRRBB', 2)) ['RR', 'RR', 'RB', 'RB', 'RR', 'RR', 'RB', 'RB', 'RR', 'RR', 'RB', 'RB', 'BR', 'BR', 'BR', 'BB', 'BR', 'BR', 'BR', 'BB']

which makes it clear that there are two ways of drawing BB compared to six ways of drawing RR. If that's not obvious enough, consider the case where you have two thousand red balls and two black balls -- do you really conclude that there are the same number of ways to pick RR as BB?

So I disagree that counting only distinct permutations is the most useful or common convention. If you're permuting a collection of non-distinct values, you should expect non-distinct permutations.

I'm trying to think of a realistic, physical situation where you would only want distinct permutations, and I can't.

...
Should we consider fixing itertools.permutations and to output only unique permutations (if possible, although I realize that would break code).

Absolutely not. Even if you were right that it should return unique permutations, and I strongly disagree that you were, the fact that it would break code is a deal-breaker.

-- Steven _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas

--

--- You received this message because you are subscribed to a topic in the Google Groups "python-ideas" group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/python-ideas/dDttJfkyu2k/unsubscribe. To unsubscribe from this group and all its topics, send an email to python-ideas+unsubscribe@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas

-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

Neil Girdhar

2:55 a.m.

I honestly think that Python should stick to the mathematical definition of permutations rather than some kind of consensus of the tiny minority of people here. When next_permutation was added to C++, I believe the whole standards committee discussed it and they came up with the thing that makes the most sense. The fact that dict and set use equality is I think the reason that permutations should use equality. Neil On Fri, Oct 11, 2013 at 10:48 PM, David Mertz <mertz@gnosis.cx> wrote:

...

Related to, but not quite the same as Steven D'Aprano's point, I would find it very strange for itertools.permutations() to return a list that was narrowed to equal-but-not-identical items.

This is why I've raised the example of 'items=[Fraction(3,1), Decimal(3.0), 3.0]' several times. I've created the Fraction, Decimal, and float for distinct reasons to get different behaviors and available methods. When I want to look for the permutations of those I don't want "any old random choice of equal values" since presumably I've given them a type for a reason.

On the other hand, I can see a little bit of sense that 'itertools.permutations([3,3,3,3,3,3,3])' doesn't *really* need to tell me a list of 7!==5040 things that are exactly the same as each other. On the other hand, I don't know how to generalize that, since my feeling is far less clear for 'itertools.permutations([1,2,3,4,5,6,6])' ... there's redundancy, but there's also important information in the probability and count of specific sequences.

My feeling, however, is that if one were to trim down the results from a permutations-related function, it is more interesting to me to only eliminate IDENTICAL items, not to eliminate merely EQUAL ones.

On Fri, Oct 11, 2013 at 7:37 PM, Neil Girdhar <mistersheik@gmail.com>wrote:

...
I think it's pretty indisputable that permutations are formally defined this way (and I challenge you to find a source that doesn't agree with that). I'm sure you know that your idea of using permutations to evaluate a multinomial distribution is not efficient. A nicer way to evaluate probabilities is to pass your set through a collections.Counter, and then use the resulting dictionary with scipy.stats.multinomial (if it exists yet).

I believe most people will be surprised that len(permutations(iterable)) does count unique permutations.

Best,

Neil

On Fri, Oct 11, 2013 at 10:06 PM, Steven D'Aprano <steve@pearwood.info>wrote:

...
On Fri, Oct 11, 2013 at 11:38:33AM -0700, Neil Girdhar wrote:

...
"It is universally agreed that a list of n distinct symbols has n! permutations. However, when the symbols are not distinct, the most common convention, in mathematics and elsewhere, seems to be to count only distinct permutations." —

I dispute this entire premise. Take a simple (and stereotypical) example, picking balls from an urn.

Say that you have three Red and Two black balls, and randomly select without replacement. If you count only unique permutations, you get only four possibilities:

py> set(''.join(t) for t in itertools.permutations('RRRBB', 2)) {'BR', 'RB', 'RR', 'BB'}

which implies that drawing RR is no more likely than drawing BB, which is incorrect. The right way to model this experiment is not to count distinct permutations, but actual permutations:

py> list(''.join(t) for t in itertools.permutations('RRRBB', 2)) ['RR', 'RR', 'RB', 'RB', 'RR', 'RR', 'RB', 'RB', 'RR', 'RR', 'RB', 'RB', 'BR', 'BR', 'BR', 'BB', 'BR', 'BR', 'BR', 'BB']

which makes it clear that there are two ways of drawing BB compared to six ways of drawing RR. If that's not obvious enough, consider the case where you have two thousand red balls and two black balls -- do you really conclude that there are the same number of ways to pick RR as BB?

So I disagree that counting only distinct permutations is the most useful or common convention. If you're permuting a collection of non-distinct values, you should expect non-distinct permutations.

I'm trying to think of a realistic, physical situation where you would only want distinct permutations, and I can't.

...
Should we consider fixing itertools.permutations and to output only unique permutations (if possible, although I realize that would break code).

Absolutely not. Even if you were right that it should return unique permutations, and I strongly disagree that you were, the fact that it would break code is a deal-breaker.

-- Steven _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas

--

--- You received this message because you are subscribed to a topic in the Google Groups "python-ideas" group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/python-ideas/dDttJfkyu2k/unsubscribe. To unsubscribe from this group and all its topics, send an email to python-ideas+unsubscribe@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas

-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

Nick Coghlan

4:35 a.m.

On 12 Oct 2013 12:56, "Neil Girdhar" <mistersheik@gmail.com> wrote:

...

I honestly think that Python should stick to the mathematical definition

of permutations rather than some kind of consensus of the tiny minority of people here. When next_permutation was added to C++, I believe the whole standards committee discussed it and they came up with the thing that makes the most sense. The fact that dict and set use equality is I think the reason that permutations should use equality. Why should the behaviour of hash based containers limit the behaviour of itertools? Python required a permutation solution that is memory efficient and works with arbitrary objects, so that's what itertools provides. However, you'd also like a memory efficient iterator for *mathematical* permutations that pays attention to object values and filters out equivalent results. I *believe* the request is equivalent to giving a name to the following genexp: (k for k, grp in groupby(permutations(sorted(input)))) That's a reasonable enough request (although perhaps more suited to the recipes section in the itertools docs), but conflating it with complaints about the way the existing iterator works is a good way to get people to ignore you (especially now the language specific reasons for the current behaviour have been pointed out, along with confirmation of the fact that backwards compatibility requirements would prohibit changing it even if we wanted to). Cheers, Nick.

...

Neil

On Fri, Oct 11, 2013 at 10:48 PM, David Mertz <mertz@gnosis.cx> wrote:

...
Related to, but not quite the same as Steven D'Aprano's point, I would

...

...
This is why I've raised the example of 'items=[Fraction(3,1),

Decimal(3.0), 3.0]' several times. I've created the Fraction, Decimal, and float for distinct reasons to get different behaviors and available methods. When I want to look for the permutations of those I don't want "any old random choice of equal values" since presumably I've given them a type for a reason.

...
On the other hand, I can see a little bit of sense that

'itertools.permutations([3,3,3,3,3,3,3])' doesn't *really* need to tell me a list of 7!==5040 things that are exactly the same as each other. On the other hand, I don't know how to generalize that, since my feeling is far less clear for 'itertools.permutations([1,2,3,4,5,6,6])' ... there's redundancy, but there's also important information in the probability and count of specific sequences.

...
My feeling, however, is that if one were to trim down the results from a

...

...
On Fri, Oct 11, 2013 at 7:37 PM, Neil Girdhar <mistersheik@gmail.com>

wrote:

...
...
I think it's pretty indisputable that permutations are formally defined

...

...
...
I believe most people will be surprised that

len(permutations(iterable)) does count unique permutations.

...
Best,

Neil

On Fri, Oct 11, 2013 at 10:06 PM, Steven D'Aprano <steve@pearwood.info>

wrote:

...
...
On Fri, Oct 11, 2013 at 11:38:33AM -0700, Neil Girdhar wrote:

...
"It is universally agreed that a list of n distinct symbols has n! permutations. However, when the symbols are not distinct, the most

common

...
...
convention, in mathematics and elsewhere, seems to be to count only distinct permutations." —

I dispute this entire premise. Take a simple (and stereotypical) example, picking balls from an urn.

Say that you have three Red and Two black balls, and randomly select without replacement. If you count only unique permutations, you get only four possibilities:

py> set(''.join(t) for t in itertools.permutations('RRRBB', 2)) {'BR', 'RB', 'RR', 'BB'}

which implies that drawing RR is no more likely than drawing BB, which is incorrect. The right way to model this experiment is not to count distinct permutations, but actual permutations:

py> list(''.join(t) for t in itertools.permutations('RRRBB', 2)) ['RR', 'RR', 'RB', 'RB', 'RR', 'RR', 'RB', 'RB', 'RR', 'RR', 'RB', 'RB', 'BR', 'BR', 'BR', 'BB', 'BR', 'BR', 'BR', 'BB']

which makes it clear that there are two ways of drawing BB compared to six ways of drawing RR. If that's not obvious enough, consider the case where you have two thousand red balls and two black balls -- do you really conclude that there are the same number of ways to pick RR as BB?

So I disagree that counting only distinct permutations is the most useful or common convention. If you're permuting a collection of non-distinct values, you should expect non-distinct permutations.

I'm trying to think of a realistic, physical situation where you would only want distinct permutations, and I can't.

...
Should we consider fixing itertools.permutations and to output only unique permutations (if possible, although I realize that would break code).

Absolutely not. Even if you were right that it should return unique permutations, and I strongly disagree that you were, the fact that it would break code is a deal-breaker.

-- Steven _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas

--

--- You received this message because you are subscribed to a topic in the Google Groups "python-ideas" group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/python-ideas/dDttJfkyu2k/unsubscribe. To unsubscribe from this group and all its topics, send an email to

find it very strange for itertools.permutations() to return a list that was narrowed to equal-but-not-identical items. permutations-related function, it is more interesting to me to only eliminate IDENTICAL items, not to eliminate merely EQUAL ones. this way (and I challenge you to find a source that doesn't agree with that). I'm sure you know that your idea of using permutations to evaluate a multinomial distribution is not efficient. A nicer way to evaluate probabilities is to pass your set through a collections.Counter, and then use the resulting dictionary with scipy.stats.multinomial (if it exists yet). python-ideas+unsubscribe@googlegroups.com.

...

...
...
...
For more options, visit https://groups.google.com/groups/opt_out.

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas

-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas

David Mertz

5:26 a.m.

What you propose, Nick, is definitely different from the several functions that have been bandied about here. I.e.

...

...
...
def nick_permutations(items, r=None): ... return (k for k, grp in groupby(permutations(sorted(items),r)))

...

...
...
["".join(p) for p in nonredundant_permutations('aaabb', 3)] ['aaa', 'aab', 'aba', 'abb', 'baa', 'bab', 'bba']

...

...
...
["".join(p) for p in nick_permutations('aaabb', 3)] ['aaa', 'aab', 'aaa', 'aab', 'aba', 'abb', 'aba', 'abb', 'aaa', 'aab', 'aaa', 'aab', 'aba', 'abb', 'aba', 'abb', 'aaa', 'aab', 'aaa', 'aab', 'aba', 'abb', 'aba', 'abb', 'baa', 'bab', 'baa', 'bab', 'baa', 'bab', 'bba', 'baa', 'bab', 'baa', 'bab', 'baa', 'bab', 'bba']

...

...
...
["".join(p) for p in permutations('aaabb', 3)] ['aaa', 'aab', 'aab', 'aaa', 'aab', 'aab', 'aba', 'aba', 'abb', 'aba', 'aba', 'abb', 'aaa', 'aab', 'aab', 'aaa', 'aab', 'aab', 'aba', 'aba', 'abb', 'aba', 'aba', 'abb', 'aaa', 'aab', 'aab', 'aaa', 'aab', 'aab', 'aba', 'aba', 'abb', 'aba', 'aba', 'abb', 'baa', 'baa', 'bab', 'baa', 'baa', 'bab', 'baa', 'baa', 'bab', 'bba', 'bba', 'bba', 'baa', 'baa', 'bab', 'baa', 'baa', 'bab', 'baa', 'baa', 'bab', 'bba', 'bba', 'bba']

If I'm thinking of this right, what you give is equivalent to the initial flawed version of 'nonredundant_permutations()' that I suggested, which used '!=' rather than the correct '>' in comparing to the 'last' tuple. FWIW, I deliberately chose the name 'nonredundant_permutations' rather than MRAB's choice of 'unique_permutations' because I think what the filtering does is precisely NOT to give unique ones. Or rather, not to give ALL unique ones, but only those defined by equivalence (i.e. rather than identity). My name is ugly, and if there were to be a function like it in itertools, a better name should be found. But such a name should emphasize that it is "filter by equivalence classes" ... actually, maybe this suggests another function which is instead "filter by identity of tuples", potentially also added to itertools. On Fri, Oct 11, 2013 at 9:35 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:

...

On 12 Oct 2013 12:56, "Neil Girdhar" <mistersheik@gmail.com> wrote:

...
I honestly think that Python should stick to the mathematical definition

of permutations rather than some kind of consensus of the tiny minority of people here. When next_permutation was added to C++, I believe the whole standards committee discussed it and they came up with the thing that makes the most sense. The fact that dict and set use equality is I think the reason that permutations should use equality.

Why should the behaviour of hash based containers limit the behaviour of itertools?

Python required a permutation solution that is memory efficient and works with arbitrary objects, so that's what itertools provides.

However, you'd also like a memory efficient iterator for *mathematical* permutations that pays attention to object values and filters out equivalent results.

I *believe* the request is equivalent to giving a name to the following genexp:

(k for k, grp in groupby(permutations(sorted(input))))

That's a reasonable enough request (although perhaps more suited to the recipes section in the itertools docs), but conflating it with complaints about the way the existing iterator works is a good way to get people to ignore you (especially now the language specific reasons for the current behaviour have been pointed out, along with confirmation of the fact that backwards compatibility requirements would prohibit changing it even if we wanted to).

Cheers, Nick.

...
Neil

On Fri, Oct 11, 2013 at 10:48 PM, David Mertz <mertz@gnosis.cx> wrote:

...
Related to, but not quite the same as Steven D'Aprano's point, I would

...
...
This is why I've raised the example of 'items=[Fraction(3,1),

Decimal(3.0), 3.0]' several times. I've created the Fraction, Decimal, and float for distinct reasons to get different behaviors and available methods. When I want to look for the permutations of those I don't want "any old random choice of equal values" since presumably I've given them a type for a reason.

...
On the other hand, I can see a little bit of sense that

'itertools.permutations([3,3,3,3,3,3,3])' doesn't *really* need to tell me a list of 7!==5040 things that are exactly the same as each other. On the other hand, I don't know how to generalize that, since my feeling is far less clear for 'itertools.permutations([1,2,3,4,5,6,6])' ... there's redundancy, but there's also important information in the probability and count of specific sequences.

...
My feeling, however, is that if one were to trim down the results from

a permutations-related function, it is more interesting to me to only eliminate IDENTICAL items, not to eliminate merely EQUAL ones.

...
On Fri, Oct 11, 2013 at 7:37 PM, Neil Girdhar <mistersheik@gmail.com>

wrote:

...
...
I think it's pretty indisputable that permutations are formally

defined this way (and I challenge you to find a source that doesn't agree with that). I'm sure you know that your idea of using permutations to evaluate a multinomial distribution is not efficient. A nicer way to evaluate probabilities is to pass your set through a collections.Counter, and then use the resulting dictionary with scipy.stats.multinomial (if it exists yet).

...
I believe most people will be surprised that

len(permutations(iterable)) does count unique permutations.

...
Best,

Neil

On Fri, Oct 11, 2013 at 10:06 PM, Steven D'Aprano <steve@pearwood.info>

wrote:

...
...
On Fri, Oct 11, 2013 at 11:38:33AM -0700, Neil Girdhar wrote:

...
"It is universally agreed that a list of n distinct symbols has n! permutations. However, when the symbols are not distinct, the most

common

...
...
convention, in mathematics and elsewhere, seems to be to count only distinct permutations." —

I dispute this entire premise. Take a simple (and stereotypical) example, picking balls from an urn.

Say that you have three Red and Two black balls, and randomly select without replacement. If you count only unique permutations, you get only four possibilities:

py> set(''.join(t) for t in itertools.permutations('RRRBB', 2)) {'BR', 'RB', 'RR', 'BB'}

which implies that drawing RR is no more likely than drawing BB, which is incorrect. The right way to model this experiment is not to count distinct permutations, but actual permutations:

py> list(''.join(t) for t in itertools.permutations('RRRBB', 2)) ['RR', 'RR', 'RB', 'RB', 'RR', 'RR', 'RB', 'RB', 'RR', 'RR', 'RB', 'RB', 'BR', 'BR', 'BR', 'BB', 'BR', 'BR', 'BR', 'BB']

which makes it clear that there are two ways of drawing BB compared to six ways of drawing RR. If that's not obvious enough, consider the case where you have two thousand red balls and two black balls -- do you really conclude that there are the same number of ways to pick RR as BB?

So I disagree that counting only distinct permutations is the most useful or common convention. If you're permuting a collection of non-distinct values, you should expect non-distinct permutations.

I'm trying to think of a realistic, physical situation where you would only want distinct permutations, and I can't.

...
Should we consider fixing itertools.permutations and to output only unique permutations (if possible, although I realize that would break code).

Absolutely not. Even if you were right that it should return unique permutations, and I strongly disagree that you were, the fact that it would break code is a deal-breaker.

-- Steven _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas

--

--- You received this message because you are subscribed to a topic in

...
...
...
...
To unsubscribe from this topic, visit https://groups.google.com/d/topic/python-ideas/dDttJfkyu2k/unsubscribe. To unsubscribe from this group and all its topics, send an email to

find it very strange for itertools.permutations() to return a list that was narrowed to equal-but-not-identical items. the Google Groups "python-ideas" group. python-ideas+unsubscribe@googlegroups.com.

...
...
...
...
For more options, visit https://groups.google.com/groups/opt_out.

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas

-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas

Neil Girdhar

6:55 a.m.

Hi Nick, Rereading my messages, I feel like I haven't been as diplomatic as I wanted. Like everyone here, I care a lot about Python and I want to see it become as perfect as it can be made. If my wording has been too strong, it's only out of passion for Python. I acknowledged in my initial request that it would be impossible to change the default behaviour of itertools.permutations. I understand that that ship has sailed. I think my best proposal is to have an efficient distinct_permutations function in itertools. It should be in itertools so that it is discoverable. It should be a function rather one of the recipes proposed to make it as efficient as possible. (Correct me if I'm wrong, but like the set solution, groupby is also not so efficient.) I welcome the discussion and hope that the most efficient implementation someone here comes up with will be added one day to itertools. Best, Neil On Sat, Oct 12, 2013 at 12:35 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:

...

On 12 Oct 2013 12:56, "Neil Girdhar" <mistersheik@gmail.com> wrote:

...
I honestly think that Python should stick to the mathematical definition

of permutations rather than some kind of consensus of the tiny minority of people here. When next_permutation was added to C++, I believe the whole standards committee discussed it and they came up with the thing that makes the most sense. The fact that dict and set use equality is I think the reason that permutations should use equality.

Why should the behaviour of hash based containers limit the behaviour of itertools?

Python required a permutation solution that is memory efficient and works with arbitrary objects, so that's what itertools provides.

However, you'd also like a memory efficient iterator for *mathematical* permutations that pays attention to object values and filters out equivalent results.

I *believe* the request is equivalent to giving a name to the following genexp:

(k for k, grp in groupby(permutations(sorted(input))))

That's a reasonable enough request (although perhaps more suited to the recipes section in the itertools docs), but conflating it with complaints about the way the existing iterator works is a good way to get people to ignore you (especially now the language specific reasons for the current behaviour have been pointed out, along with confirmation of the fact that backwards compatibility requirements would prohibit changing it even if we wanted to).

Cheers, Nick.

...
Neil

On Fri, Oct 11, 2013 at 10:48 PM, David Mertz <mertz@gnosis.cx> wrote:

...
Related to, but not quite the same as Steven D'Aprano's point, I would

...
...
This is why I've raised the example of 'items=[Fraction(3,1),

Decimal(3.0), 3.0]' several times. I've created the Fraction, Decimal, and float for distinct reasons to get different behaviors and available methods. When I want to look for the permutations of those I don't want "any old random choice of equal values" since presumably I've given them a type for a reason.

...
On the other hand, I can see a little bit of sense that

'itertools.permutations([3,3,3,3,3,3,3])' doesn't *really* need to tell me a list of 7!==5040 things that are exactly the same as each other. On the other hand, I don't know how to generalize that, since my feeling is far less clear for 'itertools.permutations([1,2,3,4,5,6,6])' ... there's redundancy, but there's also important information in the probability and count of specific sequences.

...
My feeling, however, is that if one were to trim down the results from

a permutations-related function, it is more interesting to me to only eliminate IDENTICAL items, not to eliminate merely EQUAL ones.

...
On Fri, Oct 11, 2013 at 7:37 PM, Neil Girdhar <mistersheik@gmail.com>

wrote:

...
...
I think it's pretty indisputable that permutations are formally

defined this way (and I challenge you to find a source that doesn't agree with that). I'm sure you know that your idea of using permutations to evaluate a multinomial distribution is not efficient. A nicer way to evaluate probabilities is to pass your set through a collections.Counter, and then use the resulting dictionary with scipy.stats.multinomial (if it exists yet).

...
I believe most people will be surprised that

len(permutations(iterable)) does count unique permutations.

...
Best,

Neil

On Fri, Oct 11, 2013 at 10:06 PM, Steven D'Aprano <steve@pearwood.info>

wrote:

...
...
On Fri, Oct 11, 2013 at 11:38:33AM -0700, Neil Girdhar wrote:

...
"It is universally agreed that a list of n distinct symbols has n! permutations. However, when the symbols are not distinct, the most

common

...
...
convention, in mathematics and elsewhere, seems to be to count only distinct permutations." —

I dispute this entire premise. Take a simple (and stereotypical) example, picking balls from an urn.

Say that you have three Red and Two black balls, and randomly select without replacement. If you count only unique permutations, you get only four possibilities:

py> set(''.join(t) for t in itertools.permutations('RRRBB', 2)) {'BR', 'RB', 'RR', 'BB'}

which implies that drawing RR is no more likely than drawing BB, which is incorrect. The right way to model this experiment is not to count distinct permutations, but actual permutations:

py> list(''.join(t) for t in itertools.permutations('RRRBB', 2)) ['RR', 'RR', 'RB', 'RB', 'RR', 'RR', 'RB', 'RB', 'RR', 'RR', 'RB', 'RB', 'BR', 'BR', 'BR', 'BB', 'BR', 'BR', 'BR', 'BB']

which makes it clear that there are two ways of drawing BB compared to six ways of drawing RR. If that's not obvious enough, consider the case where you have two thousand red balls and two black balls -- do you really conclude that there are the same number of ways to pick RR as BB?

So I disagree that counting only distinct permutations is the most useful or common convention. If you're permuting a collection of non-distinct values, you should expect non-distinct permutations.

I'm trying to think of a realistic, physical situation where you would only want distinct permutations, and I can't.

...
Should we consider fixing itertools.permutations and to output only unique permutations (if possible, although I realize that would break code).

Absolutely not. Even if you were right that it should return unique permutations, and I strongly disagree that you were, the fact that it would break code is a deal-breaker.

-- Steven _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas

--

--- You received this message because you are subscribed to a topic in

...
...
...
...
To unsubscribe from this topic, visit https://groups.google.com/d/topic/python-ideas/dDttJfkyu2k/unsubscribe. To unsubscribe from this group and all its topics, send an email to

find it very strange for itertools.permutations() to return a list that was narrowed to equal-but-not-identical items. the Google Groups "python-ideas" group. python-ideas+unsubscribe@googlegroups.com.

...
...
...
...
For more options, visit https://groups.google.com/groups/opt_out.

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas

-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas

Andrew Barnert

8:22 a.m.

On Oct 11, 2013, at 23:55, Neil Girdhar <mistersheik@gmail.com> wrote:

...

I think my best proposal is to have an efficient distinct_permutations function in itertools. It should be in itertools so that it is discoverable. It should be a function rather one of the recipes proposed to make it as efficient as possible. (Correct me if I'm wrong, but like the set solution, groupby is also not so efficient.)

I welcome the discussion and hope that the most efficient implementation someone here comes up with will be added one day to itertools.

I think getting something onto PyPI (whether as part of more-itertools or elsewhere) and/or the ActiveState recipes (and maybe StackOverflow and CodeReview) is the best way to get from here to there. Continuing to discuss it here, you've only got the half dozen or so people who are on this list and haven't tuned out this thread to come up with the most efficient implementation. Put it out in the world and people will begin giving you comments/bug reports/rants calling you an idiot for missing the obvious more efficient way to do it, and then you can use their code. And then, when you're satisfied with it, you have a concrete proposal for something to add to itertools in python X.Y+1 instead of some implementation to be named later to add one day. I was also going to suggest that you drop the argument about whether this is the one true definition of sequence permutation and just focus on whether it's a useful thing to have, but it looks like you're way ahead of me there, so never mind.

Stephen J. Turnbull

5:10 a.m.

Neil Girdhar writes:

...

I honestly think that Python should stick to the mathematical definition of permutations rather than some kind of consensus of the tiny minority of people here.

Is there an agreed mathematical definition of permutations of *sequences*? Every definition I can find refers to permutations of *sets*. I think any categorist would agree that there are a large number of maps of _Sequence_ to _Set_, in particular the two obviously useful ones[1]: the one that takes each element of the sequence to a *different* element of the corresponding set, and the one that takes equal elements of the sequence to the *same* element of the corresponding set. The corresponding set need not be the underlying set of the sequence, and which one is appropriate presumably depends on applications.

...

When next_permutation was added to C++, I believe the whole standards committee discussed it and they came up with the thing that makes the most sense.

To the negligible (in several senses of the word) fraction of humanity that participates in C++ standardization. Python is not C++ (thanking all the Roman and Greek gods, and refusing to identify Zeus with Jupiter, nor Aphrodite with Venus<wink/>).

...

The fact that dict and set use equality is I think the reason that permutations should use equality.

Sequences are not sets, and dict is precisely the wrong example for you to use, since it makes exactly the point that values that are identical may be bound to several different keys. We don't unify keys in a dict just because the values are identical (or equal). Similar, in representing a sequence as a set, we use a set of ordered pairs, with the first component an unique integer indicating position, and the second the sequence element. Since there are several useful mathematical ways to convert sequences to sets, and in particular one very similar, if not identical, to the one you like is enshrined in the very convenient constructor set(), I think it's useful to leave it as it is.

...

It is universally agreed that a list of n distinct symbols has n! permutations.

But that's because there's really no sensible definition of "underlying set" for such a list except the set containing exactly the same elements as the list.[2] But there is no universal agreement that "permutations of a list" is a sensible phrase. For example, although the Wikipedia article Permutation refers to lists of permutations, linked list representations of data, to the "list of objects" for use in Cauchy's notation, and to the cycle representation as a list of sequences, it doesn't once refer to permutation of a list. They're obvious not averse to discussing lists, but the word use for the entity being permuted is invariably "set". Footnotes: [1] And some maps not terribly useful for our purposes, such as one that maps all sequences to a singleton. [2] A categorist would disagree, but that's not interesting.

Steven D'Aprano

6:34 a.m.

On Fri, Oct 11, 2013 at 10:55:06PM -0400, Neil Girdhar wrote:

...

So do I. And that is exactly what itertools.permutations already does. -- Steven

Neil Girdhar

7:17 a.m.

I'm sorry, but I can't find a reference supporting the statement that the current permutations function is consistent with the mathematical definition. Perhaps you would like to find a reference? A quick search yielded the book "the Combinatorics of Permutations": http://books.google.ca/books?id=Op-nF-mBR7YC&lpg=PP1 Please look in the chapter "Permutation of multisets". Best, Neil On Sat, Oct 12, 2013 at 2:34 AM, Steven D'Aprano <steve@pearwood.info>wrote:

...

Nick Coghlan

3:07 p.m.

On 12 Oct 2013 17:18, "Neil Girdhar" <mistersheik@gmail.com> wrote:

...

I'm sorry, but I can't find a reference supporting the statement that the

current permutations function is consistent with the mathematical definition. Perhaps you would like to find a reference? A quick search yielded the book "the Combinatorics of Permutations": http://books.google.ca/books?id=Op-nF-mBR7YC&lpg=PP1 Please look in the chapter "Permutation of multisets". Itertools effectively produces the permutation of (index, value) pairs. Hence Steven's point about the permutations of a list not being mathematically defined, so you have to decide what set to map it to in order to decide what counts as a unique value. The mapping itertools uses considers position in the iterable relevant so exchanging two values that are themselves equivalent is still considered a distinct permutation since their original position is taken into account. Like a lot of mathematics, it's a matter of paying close attention to which entities are actually being manipulated and how the equivalence classes are being defined :) Hence the current proposal amounts to adding another variant that provides the permutations of an unordered multiset instead of those of a set of (index, value) 2-tuples (with the indices stripped from the results). One interesting point is that combining collections.Counter.elements() with itertools.permutations() currently does the wrong thing, since itertools.permutations() *always* considers iterable order significant, while for collections.Counter.elements() it's explicitly arbitrary. Cheers, Nick.

...

Best,

Neil

On Sat, Oct 12, 2013 at 2:34 AM, Steven D'Aprano <steve@pearwood.info>

...

...
On Fri, Oct 11, 2013 at 10:55:06PM -0400, Neil Girdhar wrote:

...
I honestly think that Python should stick to the mathematical

definition of

...
...
permutations rather than some kind of consensus of the tiny minority of people here.

So do I. And that is exactly what itertools.permutations already does.

-- Steven _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas

--

--- You received this message because you are subscribed to a topic in the Google Groups "python-ideas" group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/python-ideas/dDttJfkyu2k/unsubscribe. To unsubscribe from this group and all its topics, send an email to

wrote: python-ideas+unsubscribe@googlegroups.com.

...

...
For more options, visit https://groups.google.com/groups/opt_out.

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas

David Mertz

4:56 p.m.

On Sat, Oct 12, 2013 at 8:07 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:

...

One interesting point is that combining collections.Counter.elements() with itertools.permutations() currently does the wrong thing, since itertools.permutations() *always* considers iterable order significant, while for collections.Counter.elements() it's explicitly arbitrary.

I hadn't thought about it, but as I read the docs for 3.4 (and it's the same back through 2.7), not only would both of these be permissible in a Python implementation:

...

...
...
list(collections.Counter({'a':2,'b':1}).elements()) ['a', 'a', 'b']

Or:

...

...
...
list(collections.Counter({'a':2,'b':1}).elements()) ['b', 'a', 'a']

But even this would be per documentation (although really unlikely as an implementation):

...

...
...
list(collections.Counter({'a':2,'b':1}).elements()) ['a', 'b', 'a']

Neil Girdhar

6:56 p.m.

Yes, you're right and I understand what's been done although like the 30 upvoters to the linked stackoverflow question, I find the current behaviour surprising and would like to see a distinct_permutations function. How do I start to submit a patch? Neil On Sat, Oct 12, 2013 at 11:07 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:

...

On 12 Oct 2013 17:18, "Neil Girdhar" <mistersheik@gmail.com> wrote:

...
I'm sorry, but I can't find a reference supporting the statement that

the current permutations function is consistent with the mathematical definition. Perhaps you would like to find a reference? A quick search yielded the book "the Combinatorics of Permutations": http://books.google.ca/books?id=Op-nF-mBR7YC&lpg=PP1 Please look in the chapter "Permutation of multisets".

Itertools effectively produces the permutation of (index, value) pairs. Hence Steven's point about the permutations of a list not being mathematically defined, so you have to decide what set to map it to in order to decide what counts as a unique value. The mapping itertools uses considers position in the iterable relevant so exchanging two values that are themselves equivalent is still considered a distinct permutation since their original position is taken into account. Like a lot of mathematics, it's a matter of paying close attention to which entities are actually being manipulated and how the equivalence classes are being defined :)

Hence the current proposal amounts to adding another variant that provides the permutations of an unordered multiset instead of those of a set of (index, value) 2-tuples (with the indices stripped from the results).

One interesting point is that combining collections.Counter.elements() with itertools.permutations() currently does the wrong thing, since itertools.permutations() *always* considers iterable order significant, while for collections.Counter.elements() it's explicitly arbitrary.

Cheers, Nick.

...
Best,

Neil

On Sat, Oct 12, 2013 at 2:34 AM, Steven D'Aprano <steve@pearwood.info>

...
...
On Fri, Oct 11, 2013 at 10:55:06PM -0400, Neil Girdhar wrote:

...
I honestly think that Python should stick to the mathematical

definition of

...
...
permutations rather than some kind of consensus of the tiny minority of people here.

So do I. And that is exactly what itertools.permutations already does.

-- Steven _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas

--

--- You received this message because you are subscribed to a topic in the Google Groups "python-ideas" group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/python-ideas/dDttJfkyu2k/unsubscribe. To unsubscribe from this group and all its topics, send an email to

wrote: python-ideas+unsubscribe@googlegroups.com.

...
...
For more options, visit https://groups.google.com/groups/opt_out.

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas

Raymond Hettinger

12:44 a.m.

On Oct 12, 2013, at 11:56 AM, Neil Girdhar <mistersheik@gmail.com> wrote:

...

, I find the current behaviour surprising and would like to see a distinct_permutations function. How do I start to submit a patch?

You can submit your patch at http://bugs.python.org and assign it to me (the module designer and maintainer). That said, the odds of it being accepted are slim. There are many ways to write combinatoric functions (Knuth has a whole book on the subject) and I don't aspire to include multiple variants unless there are strong motivating use cases. In general, if someone wants to eliminate duplicates from the population, they can do so easily with: permutations(set(population), n) The current design solves the most common use cases and it has some nice properties such as: * permutations is a subsequence of product * no assumptions are made about the comparability or orderability of members of the population * len(list(permutations(range(n), r))) == n! / (n-r)! just like you were taught in school * it is fast For more exotic needs, I think is appropriate to look outside the standard library to more full-featured combinatoric libraries (there are several listed at pypi.python.org). Raymond

Ethan Furman

1:11 a.m.

On 10/12/2013 05:44 PM, Raymond Hettinger wrote:

...

+1 About the only improvement I can see would be a footnote in the itertools doc table that lists the different combinatorics. Being a naive permutations user myself I would have made the mistake of thinking that "r-length tuples, all possible orderings, no repeated elements" meant no repeated values. The longer text for permutations makes it clear how it works. My rst-foo is not good enough to link from the table down into the permutation text where the distinction is made clear. If no one beats me to a proposed patch I'll see if I can figure it out. -- ~Ethan~

Neil Girdhar

1:24 a.m.

Hi Raymond, I agree with you on the consistency point with itertools.product. That's a great point. However, permutations(set(population)) is not the correct way to take the permutations of a multiset. Please take a look at how permutations are taken from a multiset from any of the papers I linked or any paper that you can find on the internet. The number of permutations of multiset is n! / \prod a_i! for a_i are the element counts — just like I was taught in school. There is currently no fast way to find these permutations of a multiset and it is a common operation for solving problems. What is needed, I think is a function multiset_permutations that accepts an iterable. Best, Neil On Sat, Oct 12, 2013 at 8:44 PM, Raymond Hettinger < raymond.hettinger@gmail.com> wrote:

...

Steven D'Aprano

1:47 a.m.

On Sat, Oct 12, 2013 at 05:44:38PM -0700, Raymond Hettinger wrote:

...

In fairness Raymond, the proposal is not to eliminate duplicates from the population, but from the permutations themselves. Consider the example I gave earlier, where you're permuting "RRRBB" two items at a time. There are 20 permutations including duplicates, but sixteen of them are repeated: py> list(''.join(t) for t in permutations("RRRBB", 2)) ['RR', 'RR', 'RB', 'RB', 'RR', 'RR', 'RB', 'RB', 'RR', 'RR', 'RB', 'RB', 'BR', 'BR', 'BR', 'BB', 'BR', 'BR', 'BR', 'BB'] py> set(''.join(t) for t in permutations("RRRBB", 2)) {'BR', 'RB', 'RR', 'BB'} But if you eliminate duplicates from the population first, you get only two permutations: py> list(''.join(t) for t in permutations(set("RRRBB"), 2)) ['BR', 'RB'] If it were just a matter of calling set() on the output of permutations, that would be trivial enough. But, you might care about order, or elements might not be hashable, or you might have a LOT of permutations to generate before discarding: population = "R"*1000 + "B"*500 set(''.join(t) for t in permutations(population, 2)) # takes a while... In my opinion, if unique_permutations is no more efficient than calling set on the output of permutations, it's not worth it. But if somebody can come up with an implementation which is significantly more efficient, without making unreasonable assumptions about orderability, hashability or even comparability, then in my opinion that might be worthwhile. -- Steven

Raymond Hettinger

3:03 a.m.

On Oct 12, 2013, at 6:47 PM, Steven D'Aprano <steve@pearwood.info> wrote:

...

the proposal is not to eliminate duplicates from the population, but from the permutations themselves.

I'm curious about the use cases for this. Other than red/blue marble examples and some puzzle problems, does this come-up in any real problems? Do we actually need this? Raymond

Neil Girdhar

7:38 a.m.

My intuition is that we want Python to be "complete". Many other languages can find the permutations of a multiset. Python has a permutations function. Many people on stackoverflow expected that function to be able to find those permutations. One suggestion: Why not make it so that itertools.permutations checks if its argument is an instance of collections.Mapping? If it is, we could interpret the items as a mapping from elements to positive integers, which is a compact representation of a multiset. Then, it could do the right thing for that case. Best, Neil On Sat, Oct 12, 2013 at 11:03 PM, Raymond Hettinger < raymond.hettinger@gmail.com> wrote:

...

Nick Coghlan

9:27 a.m.

On 13 October 2013 17:38, Neil Girdhar <mistersheik@gmail.com> wrote:

...

Nope, we expressly *don't* want the standard library to be "complete", because that would mean growing to the size of PyPI (or larger). There's always going to be scope for applications to adopt new domain specific dependencies with more in-depth support than that provided by the standard library. Many standard library modules are in fact deliberately designed as "stepping stone" modules that will meet the needs of code which have an incidental relationship to that task, but will need to be replaced with something more sophisticated for code directly related to that domain. Many times, that means they will ignore as irrelevant distinctions that are critical in certain contexts, simply because they don't come up all that often outside those specific domains, and addressing them involves making the core module more complicated to use for more typical cases. In this case, the proposed alternate permutations mechanism only makes a difference when: 1. The data set contains equivalent values 2. Input order is not considered significant, so exchanging equivalent values should *not* create a new permutation (i.e. multiset permutations rather than sequence permutations). If users aren't likely to encounter situations where that makes a difference, then providing both in the standard library isn't being helpful, it's being actively user hostile by asking them to make a decision they're not yet qualified to make for the sake of the few experts that specifically need . Hence Raymond's request for data modelling problems outside the "learning or studying combinatorics" context to make the case for standard library inclusion. Interestingly, I just found another language which has the equivalent of the currrent behaviour of itertools.permutations: Haskell has it as Data.List.permutations. As far as I can tell, Haskell doesn't offer support for multiset permutations in the core, you need an additional package like Math.Combinatorics (see: http://hackage.haskell.org/package/multiset-comb-0.2.3/docs/Math-Combinatori...). Since iterator based programming in Python is heavily inspired by Haskell, this suggests that the current behaviour of itertools.permutations is appropriate and that Raymond is right to be dubious about including multiset permutations support directly in the standard library. Those interested in improving the experience of writing combinatorics code in Python may wish to look into helping out with the combinatorics package on PyPI: http://phillipmfeldman.org/Python/for_developers.html (For example, politely approach Phillip to see if he is interested in hosting it on GitHub or BitBucket, providing Sphinx docs on ReadTheDocs, improving the PyPI metadata, etc - note I have no experience with this package, it's just the first hit for "python combinatorics")

...

If you want to go down the path of only caring about hashable values, you may want to argue for a permutations method on collections.Counter (it's conceivable that approach has the potential to be even faster than an approach based on accepting and processing an arbitrary iterable, since it can avoid generating repeated values in the first place). A Counter based multiset permutation algorithm was actually posted to python-list back in 2009, just after collections.Counter was introduced: https://mail.python.org/pipermail/python-list/2009-January/521685.html I just created an updated version of that recipe and posted it as https://bitbucket.org/ncoghlan/misc/src/default/multiset_permutations.py Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Neil Girdhar

8:56 p.m.

Executive summary: Thanks for discussion everyone. I'm now convinced that itertools.permutations is fine as it is. I am not totally convinced that multiset_permutations doesn't belong in itertools, or else there should be a standard combinatorics library. On Sun, Oct 13, 2013 at 5:27 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:

...

On 13 October 2013 17:38, Neil Girdhar <mistersheik@gmail.com> wrote:

...
My intuition is that we want Python to be "complete". Many other languages can find the permutations of a multiset. Python has a permutations function. Many people on stackoverflow expected that function to be able to find those permutations.

Nope, we expressly *don't* want the standard library to be "complete", because that would mean growing to the size of PyPI (or larger). There's always going to be scope for applications to adopt new domain specific dependencies with more in-depth support than that provided by the standard library.

By complete I meant that just as if you were to add the "error function, erf" to math, you would want to add an equivalent version to cmath. When I saw the permutation function in itertools, I expected that it would work on both sets and multisets, or else there would be another function that did.

...

Many standard library modules are in fact deliberately designed as "stepping stone" modules that will meet the needs of code which have an incidental relationship to that task, but will need to be replaced with something more sophisticated for code directly related to that domain. Many times, that means they will ignore as irrelevant distinctions that are critical in certain contexts, simply because they don't come up all that often outside those specific domains, and addressing them involves making the core module more complicated to use for more typical cases.

Good point.

...

In this case, the proposed alternate permutations mechanism only makes a difference when:

1. The data set contains equivalent values 2. Input order is not considered significant, so exchanging equivalent values should *not* create a new permutation (i.e. multiset permutations rather than sequence permutations).

If users aren't likely to encounter situations where that makes a difference, then providing both in the standard library isn't being helpful, it's being actively user hostile by asking them to make a decision they're not yet qualified to make for the sake of the few experts that specifically need . Hence Raymond's request for data modelling problems outside the "learning or studying combinatorics" context to make the case for standard library inclusion.

Interestingly, I just found another language which has the equivalent of the currrent behaviour of itertools.permutations: Haskell has it as Data.List.permutations. As far as I can tell, Haskell doesn't offer support for multiset permutations in the core, you need an additional package like Math.Combinatorics (see:

http://hackage.haskell.org/package/multiset-comb-0.2.3/docs/Math-Combinatori... ).

Since iterator based programming in Python is heavily inspired by Haskell, this suggests that the current behaviour of itertools.permutations is appropriate and that Raymond is right to be dubious about including multiset permutations support directly in the standard library.

You've convinced me that itertools permutations is doing the right thing :) I'm not sure if multiset permutations should be in the standard library or not. It is very useful.

...

Those interested in improving the experience of writing combinatorics code in Python may wish to look into helping out with the combinatorics package on PyPI: http://phillipmfeldman.org/Python/for_developers.html (For example, politely approach Phillip to see if he is interested in hosting it on GitHub or BitBucket, providing Sphinx docs on ReadTheDocs, improving the PyPI metadata, etc - note I have no experience with this package, it's just the first hit for "python combinatorics")

...
One suggestion: Why not make it so that itertools.permutations checks if its argument is an instance of collections.Mapping? If it is, we could interpret the items as a mapping from elements to positive integers, which is a compact representation of a multiset. Then, it could do the right thing for that case.

If you want to go down the path of only caring about hashable values, you may want to argue for a permutations method on collections.Counter (it's conceivable that approach has the potential to be even faster than an approach based on accepting and processing an arbitrary iterable, since it can avoid generating repeated values in the first place).

A Counter based multiset permutation algorithm was actually posted to python-list back in 2009, just after collections.Counter was introduced: https://mail.python.org/pipermail/python-list/2009-January/521685.html

Nice find!

...

I just created an updated version of that recipe and posted it as https://bitbucket.org/ncoghlan/misc/src/default/multiset_permutations.py

Why not just define multiset_permutations to accept a dict (dict is a base class of Counter)? Since you're going to convert from an iterable (with duplicates) to a dict (via Counter) anyway, why not accept it as such. Users who want an interface similar to itertools.permutations can pass their iterable through Counter first. Cheers,

...

Nick.

-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Mark Dickinson

12:29 p.m.

On Sun, Oct 13, 2013 at 9:56 PM, Neil Girdhar <mistersheik@gmail.com> wrote:

...

By complete I meant that just as if you were to add the "error function, erf" to math, you would want to add an equivalent version to cmath.

An interesting choice of example. *Why* would you want to do so? Since you bring this up, I assume you're already aware that math.erf exists but cmath.erf does not. I believe there are good, practical reasons *not* to add cmath.erf, in spite of the existence of math.erf. Not least of these is that cmath.erf would be significantly more complicated to implement and of significantly less interest to users. And perhaps there's a parallel with itertools.permutations and the proposed itertools.multiset_permutations here... -- Mark

Neil Girdhar

12:37 p.m.

Actually I didn't notice that. It seems weird to find erf in math, but erf for complex numbers in scipy.special. It's just about organization and user discovery. I realize that from the developer's point of view, erf for complex numbers is complicated, but why does the user care? On Mon, Oct 14, 2013 at 8:29 AM, Mark Dickinson <dickinsm@gmail.com> wrote:

...

Oscar Benjamin

1:11 p.m.

On 14 October 2013 13:37, Neil Girdhar <mistersheik@gmail.com> wrote:

...

This is the first time I've seen a suggestion that there should be cmath.erf. So I would say that most users don't care about having a complex error function. Whoever would take the time to implement the complex error function might instead spend that time implementing and maintaining something that users do care about. Oscar

Neil Girdhar

1:15 p.m.

Look I don't want it, and anyway it's already in scipy.special. I just organizational symmetry. I expected to find complex versions of math functions in cmath —not in scipy special. On Mon, Oct 14, 2013 at 9:11 AM, Oscar Benjamin <oscar.j.benjamin@gmail.com>wrote:

...

Mark Lawrence

1:26 p.m.

On 14/10/2013 14:15, Neil Girdhar wrote:

...

Why are you comparing core Python modules with third party ones? -- Roses are red, Violets are blue, Most poems rhyme, But this one doesn't. Mark Lawrence

Steven D'Aprano

1:27 a.m.

On Mon, Oct 14, 2013 at 08:37:59AM -0400, Neil Girdhar wrote:

...

99% of users don't care about math.errf at all. Of those who do, 99% don't care about cmath.errf. I'd like to see cmath.errf because I'm a maths junkie, but if I were responsible for *actually doing the work* I'd make the same decision to leave cmath.errf out and leave it for a larger, more complete library like scipy. There are an infinitely large number of potential programs which could in principle be added to Python's std lib, and only a finite number of person-hours to do the work. And there are costs to adding software to the std lib, not just benefits. -- Steven

Neil Girdhar

1:29 a.m.

You make a good point. It was just a random example to illustrate that desire for completeness. On Mon, Oct 14, 2013 at 9:27 PM, Steven D'Aprano <steve@pearwood.info>wrote:

...

Nick Coghlan

1:39 a.m.

On 15 October 2013 11:29, Neil Girdhar <mistersheik@gmail.com> wrote:

...

You make a good point. It was just a random example to illustrate that desire for completeness.

The desire for conceptual purity and consistency is a good one, it just needs to be balanced against the practical constraints of writing, maintaining, documenting, teaching and learning the standard library. "It isn't worth the hassle" is the answer to a whole lot of "Why not X?" questions in software development :) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Neil Girdhar

1:40 a.m.

On Mon, Oct 14, 2013 at 9:39 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:

...

Totally agree.

Mark Lawrence

7:30 a.m.

On 15/10/2013 02:39, Nick Coghlan wrote:

...

Would our volunteers be more inclined to take on the hassle if they got double time on Saturdays and triple time on Sundays? :) -- Roses are red, Violets are blue, Most poems rhyme, But this one doesn't. Mark Lawrence

Bruce Leban

8:34 p.m.

On Mon, Oct 14, 2013 at 6:39 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:

...

"It isn't worth the hassle" is the answer to a whole lot of "Why not X?" questions in software development :)

Sometimes it's not worth the hassle on either side. If this is added to the standard library and I write code that uses it, my code won't be backwards compatible with older versions of Python. So I'll either have to not support older Python versions or use an alternative implementation. If this is on pypi then that's not an issue. Not everything useful should be in the standard library. If I had come across a need for this, I'd have just used unique_everseen(permuations(...)) until performance became an issue. --- Bruce I'm hiring: http://www.cadencemd.com/info/jobs Latest blog post: Alice's Puzzle Page http://www.vroospeak.com Learn how hackers think: http://j.mp/gruyere-security

Raymond Hettinger

5:56 p.m.

On Oct 13, 2013, at 1:56 PM, Neil Girdhar <mistersheik@gmail.com> wrote:

...

I'm now convinced that itertools.permutations is fine as it is. I am not totally convinced that multiset_permutations doesn't belong in itertools,

Now that we have a good algorithm, I'm open to adding this to itertools, but it would need to have a name that didn't create any confusion with respect to the existing tools, perhaps something like: anagrams(population, r) Return an iterator over a all distinct r-length permutations of the population. Unlike permutations(), element uniqueness is determined by value rather than by position. Also, anagrams() makes no guarantees about the order the tuples are generated. Raymond

Neil Girdhar

8:28 p.m.

Excellent!

My top two names are
1. multiset_permutations (reflects the mathematical name)
2. anagrams

Note that we may also want to add multiset_combinations.  It hasn't been
part of this discussion, but it may be part of another discussion and I
wanted to point this out as I know many of you are future-conscious.

We seem to be all agreed that we want to accept "r", the length of the
permutation desired.

With permutations, the *set* is passed in as a iterable representing
distinct elements.  With multiset_permutations, there are three ways to
pass in the *multiset*:
- 1. an iterable whose elements (or an optional key function applied to
which) are compared using __eq__
- 2. a dict (of which collections.Counter) is a subclass
- 3. an iterable whose elements are key-value pairs and whose values are
counts

Example uses:
1. multiset_permutations(word)
2. multiset_permutations(Counter(word))
3. multiset_permutations(Counter(word).items())

>From a dictionary:
1. multiset_permutations(itertools.chain.from_iterable(itertools.repeat(k,
v) for k, v in d.items()))
2. multiset_permutations(d)
3. multiset_permutations(d.items())

>From an iterable of key-value pairs:
1. multiset_permutations(itertools.chain.from_iterable(itertools.repeat(k,
v) for k, v in it))
2. multiset_permutations({k: v for k, v in it})
3. multiset_permutations(it)

The advantage of 2 is that no elements are compared by
multiset_permutations (so it is simpler and faster).
The advantage of 3 is that no elements are compared, and they need not be
comparable or hashable.  This version is truly a generalization of the
"permutations" function.  This way, for any input "it" you could pass to
permutations, you could equivalently pass zip(it, itertools.repeat(1)) to
multiset_permutations.

Comments?

Neil

On Mon, Oct 14, 2013 at 1:56 PM, Raymond Hettinger <
raymond.hettinger@gmail.com> wrote:

>
> On Oct 13, 2013, at 1:56 PM, Neil Girdhar <mistersheik@gmail.com> wrote:
>
>  I'm now convinced that itertools.permutations is fine as it is.  I am not
> totally convinced that multiset_permutations doesn't belong in itertools,
>
>
> Now that we have a good algorithm,  I'm open to adding this to itertools,
> but it would need to have a name that didn't create any confusion
> with respect to the existing tools, perhaps something like:
>
>     anagrams(population, r)
>
>     Return an iterator over a all distinct r-length permutations
>     of the population.
>
>     Unlike permutations(), element uniqueness is determined
>     by value rather than by position.  Also, anagrams() makes
>     no guarantees about the order the tuples are generated.
>
>
>
> Raymond
>

Terry Reedy

10:59 p.m.

On 10/14/2013 4:28 PM, Neil Girdhar wrote:

...

I like anagrams. I did not completely get what this issue was about until someone finally mentioned anagrams as use case. -- Terry Jan Reedy

Tim Peters

12:48 a.m.

[Raymond Hettinger]

...

Now that we have a good algorithm, I'm open to adding this to itertools,

I remain reluctant, because I still haven't seen a compelling use case. Yes, it generates all distinct r-letter anagrams - but so what? LOL ;-) Seriously, I've written anagram programs several times in my life, and generating "all possible" never occurred to me because it's so crushingly inefficient.

...

but it would need to have a name that didn't create any confusion with respect to the existing tools, perhaps something like:

anagrams(population, r)

"anagrams" is great! Inspired :-) What about an optional argument to define what the _user_ means by "equality"? The algorithm I posted had an optional `equal=operator.__eq__` argument. Else you're going to be pushed to add a clumsy `TransformAnagrams` later <0.4 wink>.

...

Return an iterator over a all distinct r-length permutations of the population.

Unlike permutations(), element uniqueness is determined by value rather than by position. Also, anagrams() makes no guarantees about the order the tuples are generated.

Well, MRAB's algorithm (and my rewrite) guarantees that _if_ the elements support a total order, and appear in the iterable in non-decreasing order, then the anagrams are generated in non-decreasing lexicographic order. And that may be a useful guarantee (hard to tell without a real use case, though!). There's another ambiguity I haven't seen addressed explicitly. Consider this:

...

...
...
from fractions import Fraction for a in anagrams([3, 3.0, Fraction(3)], 3): ... print(a)

(3, 3.0, Fraction(3, 1)) All the algorithms posted here work to show all 3 elements in this case. But why? If the elements all equal, then other outputs "should be" acceptable too. Like (3, 3, 3) or (3.0, Fraction(3, 1), 3.0) etc. All those outputs compare equal! This isn't visible if, e.g., the iterable's elements are letters (where a == b if and only if str(a) == str(b), so the output looks the same no matter what). At least "my" algorithm could be simplified materially if it only saved (and iterated over) a (single) canonical representative for each equivalence class, instead of saving entire equivalence classes and then jumping through hoops to cycle through each equivalence class's elements. But, for some reason, output (3, 3, 3) just "looks wrong" above. I'm not sure why.

MRAB

1:17 a.m.

On 15/10/2013 01:48, Tim Peters wrote:

...

[snip] I can see that one disadvantage of my algorithm is that the worst-case storage requirement is O(n^2) (I think). This is because the set of first items could have N members, the set of second items could have N-1 members, etc. On the other hand, IMHO, the sheer number of permutations will become a problem long before the memory requirement does! :-)

Tim Peters

1:40 a.m.

[MRAB]

...

My rewrite is O(N) space (best and worst cases). I _think_ yours is too, but I understand my rewrite better by now ;-) Each element of the iterable appears in exactly one ENode: the `ehead` list is a partitioning of the input iterable.

Mark Lawrence

11:05 a.m.

On 13/10/2013 08:38, Neil Girdhar wrote:

...

My intuition is that we want Python to be "complete".

No thank you. I much prefer "Python in a Nutshell" the size it is now, I'm not interested in competing with (say) "Java in a Nutshell". -- Roses are red, Violets are blue, Most poems rhyme, But this one doesn't. Mark Lawrence

Tim Peters

2:45 a.m.

One example of prior art: Maxima, which I use in its wxMaxima incarnation. """ Function: permutations(a) Returns a set of all distinct permutations of the members of the list or set a. Each permutation is a list, not a set. When a is a list, duplicate members of a are included in the permutations """ Examples from a Maxima shell:

...

permutations([1, 2. 3]); {[1,2,3],[1,3,2],[2,1,3],[2,3,1],[3,1,2],[3,2,1]}

...

permutations([[1, 2], [1, 2], [2, 3]]) {[[1,2],[1,2],[2,3]], [[1,2],[2,3],[1,2]], [[2,3],[1,2],[1,2]]}

...

permutations({1, 1.0, 1, 1.0}) {[1,1.0],[1.0,1]}

That last one may be surprising at first, but note that it's the first example where I passed a _set_ (instead of a list). And:

...

{1, 1.0, 1, 1.0} {1,1.0}

Best I can tell, Maxima has no builtin function akin to our permutations(it, r) when r < len(it). But Maxima has a huge number of builtin functions, and I often struggle to find ones I _want_ in its docs ;-)

Andrew Barnert

2:57 a.m.

On Oct 11, 2013, at 19:48, David Mertz <mertz@gnosis.cx> wrote:

...

My feeling, however, is that if one were to trim down the results from a permutations-related function, it is more interesting to me to only eliminate IDENTICAL items, not to eliminate merely EQUAL ones.

I agree with the rest of your message, but I still think you're wrong here. Anyone who is surprised by distinct_permutations((3.0, 3)) treating the two values the same would be equally surprised by {3.0, 3} having only one member. Or by groupby((3.0, 'a'), (3, 'b')) only having one group. And so on. In Python, sets, dict keys, groups, etc. work by ==. That was a choice that could have been made differently, but Python made that choice long ago, and has applied it completely consistently, and it would be very strange to choose differently in this case.

David Mertz

5:38 a.m.

Hi Andrew, I've sort of said as much in my last reply to Nick. But maybe I can clarify further. I can imagine *someone* wanting a filtering of permutations by either identify or equality. Maybe, in fact, by other comparisons also for generality. This might suggest an API like the following: equal_perms = distinct_permutations(items, r, filter_by=operator.eq) ident_perms = distinct_permutations(items, r, filter_by=operator.is_) Or even perhaps, in some use-case that isn't clear to me, e.g. start_same_perms = distinct_permutations(items, r, filter_by=lambda a,b: a[0]==b[0]) Or perhaps more plausibly, some predicate that, e.g. tests if two returned tuples are the same under case normalization of the strings within them. I guess the argument then would be what the default value of 'filter_by' might be... but that seems less important to me if there were an option to pass a predicate as you liked. On Fri, Oct 11, 2013 at 7:57 PM, Andrew Barnert <abarnert@yahoo.com> wrote:

...

David Mertz

5:48 a.m.

Btw. My implementation of nonredundant_permutations *IS* guaranteed to work by the docs for Python 3.4. Actually for Python 2.7+. That is, it's not just an implementation accident (as I thought before I checked), but a promised API of itertools.permutations that: Permutations are emitted in lexicographic sort order. So, if the input iterable is sorted, the permutation tuples will be produced in sorted order. As long as that holds, my function will indeed behave correctly (but of course, with the limitation that it blows up if different items in the argument iterable cannot be compared using operator.lt(). On Fri, Oct 11, 2013 at 10:38 PM, David Mertz <mertz@gnosis.cx> wrote:

...

Hi Andrew,

I've sort of said as much in my last reply to Nick. But maybe I can clarify further. I can imagine *someone* wanting a filtering of permutations by either identify or equality. Maybe, in fact, by other comparisons also for generality.

This might suggest an API like the following:

equal_perms = distinct_permutations(items, r, filter_by=operator.eq) ident_perms = distinct_permutations(items, r, filter_by=operator.is_)

Or even perhaps, in some use-case that isn't clear to me, e.g.

start_same_perms = distinct_permutations(items, r, filter_by=lambda a,b: a[0]==b[0])

Or perhaps more plausibly, some predicate that, e.g. tests if two returned tuples are the same under case normalization of the strings within them.

I guess the argument then would be what the default value of 'filter_by' might be... but that seems less important to me if there were an option to pass a predicate as you liked.

On Fri, Oct 11, 2013 at 7:57 PM, Andrew Barnert <abarnert@yahoo.com>wrote:

...
On Oct 11, 2013, at 19:48, David Mertz <mertz@gnosis.cx> wrote:

...
My feeling, however, is that if one were to trim down the results from a permutations-related function, it is more interesting to me to only eliminate IDENTICAL items, not to eliminate merely EQUAL ones.

I agree with the rest of your message, but I still think you're wrong here. Anyone who is surprised by distinct_permutations((3.0, 3)) treating the two values the same would be equally surprised by {3.0, 3} having only one member. Or by groupby((3.0, 'a'), (3, 'b')) only having one group. And so on.

In Python, sets, dict keys, groups, etc. work by ==. That was a choice that could have been made differently, but Python made that choice long ago, and has applied it completely consistently, and it would be very strange to choose differently in this case.

-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

Neil Girdhar

7:02 a.m.

Why not just use the standard python way to generalize this: "key" rather than the nonstandard "filter_by". On Sat, Oct 12, 2013 at 1:38 AM, David Mertz <mertz@gnosis.cx> wrote:

...

Hi Andrew,

I've sort of said as much in my last reply to Nick. But maybe I can clarify further. I can imagine *someone* wanting a filtering of permutations by either identify or equality. Maybe, in fact, by other comparisons also for generality.

This might suggest an API like the following:

equal_perms = distinct_permutations(items, r, filter_by=operator.eq) ident_perms = distinct_permutations(items, r, filter_by=operator.is_)

Or even perhaps, in some use-case that isn't clear to me, e.g.

start_same_perms = distinct_permutations(items, r, filter_by=lambda a,b: a[0]==b[0])

Or perhaps more plausibly, some predicate that, e.g. tests if two returned tuples are the same under case normalization of the strings within them.

I guess the argument then would be what the default value of 'filter_by' might be... but that seems less important to me if there were an option to pass a predicate as you liked.

On Fri, Oct 11, 2013 at 7:57 PM, Andrew Barnert <abarnert@yahoo.com>wrote:

...
On Oct 11, 2013, at 19:48, David Mertz <mertz@gnosis.cx> wrote:

...
My feeling, however, is that if one were to trim down the results from a permutations-related function, it is more interesting to me to only eliminate IDENTICAL items, not to eliminate merely EQUAL ones.

I agree with the rest of your message, but I still think you're wrong here. Anyone who is surprised by distinct_permutations((3.0, 3)) treating the two values the same would be equally surprised by {3.0, 3} having only one member. Or by groupby((3.0, 'a'), (3, 'b')) only having one group. And so on.

In Python, sets, dict keys, groups, etc. work by ==. That was a choice that could have been made differently, but Python made that choice long ago, and has applied it completely consistently, and it would be very strange to choose differently in this case.

-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

David Mertz

7:09 a.m.

On Sat, Oct 12, 2013 at 12:02 AM, Neil Girdhar <mistersheik@gmail.com>wrote:

...

Why not just use the standard python way to generalize this: "key" rather than the nonstandard "filter_by".

Yes, 'key' is a much better name than what I suggested. I'm not quite sure how best to implement this still. I guess MRAB's recursive approach should work, even though I like the simplicity of my style that takes full advantage of the existing itertools.permutations() (and uses 1/3 as many lines of--I think clearer--code). His has the advantage, however, that it doesn't require operator.lt() to work... however, without benchmarking, I have a pretty strong feeling that my suggestion will be faster since it avoids all that recursive call overhead. Maybe I'm wrong about that though.

...

On Sat, Oct 12, 2013 at 1:38 AM, David Mertz <mertz@gnosis.cx> wrote:

...
Hi Andrew,

I've sort of said as much in my last reply to Nick. But maybe I can clarify further. I can imagine *someone* wanting a filtering of permutations by either identify or equality. Maybe, in fact, by other comparisons also for generality.

This might suggest an API like the following:

equal_perms = distinct_permutations(items, r, filter_by=operator.eq) ident_perms = distinct_permutations(items, r, filter_by=operator.is_)

Or even perhaps, in some use-case that isn't clear to me, e.g.

start_same_perms = distinct_permutations(items, r, filter_by=lambda a,b: a[0]==b[0])

Or perhaps more plausibly, some predicate that, e.g. tests if two returned tuples are the same under case normalization of the strings within them.

I guess the argument then would be what the default value of 'filter_by' might be... but that seems less important to me if there were an option to pass a predicate as you liked.

On Fri, Oct 11, 2013 at 7:57 PM, Andrew Barnert <abarnert@yahoo.com>wrote:

...
On Oct 11, 2013, at 19:48, David Mertz <mertz@gnosis.cx> wrote:

...
My feeling, however, is that if one were to trim down the results from a permutations-related function, it is more interesting to me to only eliminate IDENTICAL items, not to eliminate merely EQUAL ones.

I agree with the rest of your message, but I still think you're wrong here. Anyone who is surprised by distinct_permutations((3.0, 3)) treating the two values the same would be equally surprised by {3.0, 3} having only one member. Or by groupby((3.0, 'a'), (3, 'b')) only having one group. And so on.

In Python, sets, dict keys, groups, etc. work by ==. That was a choice that could have been made differently, but Python made that choice long ago, and has applied it completely consistently, and it would be very strange to choose differently in this case.

-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

Ron Adam

4:10 p.m.

On 10/12/2013 02:09 AM, David Mertz wrote:

...

I'd like to see some nice examples of how it's used. It seems to me, There is some mixing of combination/permutation concepts. def filter_repeats(itr): seen = set() for i in itr: if i in seen: continue seen.add(i) yield i def unique_combinations(itr, length=None): if length == None: length = len(itr) return it.product(filter_repeats(itr), repeat=length) This one isn't the same as yours, but it's an example of filter_repeats. While that isn't the most efficient way in some cases, filtering repeat items out of things is a fairly common problem. Cheers, Ron

Steven D'Aprano

7:35 a.m.

On Fri, Oct 11, 2013 at 10:37:02PM -0400, Neil Girdhar wrote:

...

If by "this way" you mean "unique permutations only", then yes it *completely* disputable, and I am doing so right now. I'm not arguing one way or the other for a separate "unique_permutations" generator, just that the existing permutations generator does the right thing. If you're satisfied with that answer, you can stop reading now, because the rest of my post is going to be rather long: TL;DR: If you want a unique_permutations generator, that's a reasonable request. If you insist on changing permutations, that's unreasonable, firstly because the current behaviour is correct, and secondly because backwards compatibility would constrain it to keep the existing behaviour even if it were wrong. . . . Still here? Okay then, let me justify why I say the current behaviour is correct. Speaking as a math tutor who has taught High School level combinatorics for 20+ years, I've never come across any text book or source that defines permutations in terms of unique permutations only. In every case that I can remember, or that I still have access to, unique permutations is considered a different kind of operation ("permutations ignoring duplicates", if you like) rather than the default. E.g. "Modern Mathematics 6" by Fitzpatrick and Galbraith has a separate section for permutations with repetition, gives the example of taking permutations from the word "MAMMAL", and explicitly contrasts situations where you consider the three letters M as "different" from when you consider them "the same". But in all such cases, such a situation is discussed as a restriction on permutations, not an expansion, that is: * there are permutations; * sometimes you want to only consider unique permutations; rather than: * there are permutations, which are always unique; * sometimes you want to consider things which are like permutations except they're not necessarily unique. I'd even turn this around and challenge you to find a source that *does* define them as always unique. Here's a typical example, from the Collins Dictionary of Mathematics: [quote] **permutation** or **ordered arrangement** n. 1 an ordered arrangement of a specified number of objects selected from a set. The number of distinct permutations of r objects from n is n!/(n-r)! usually written <subscript>n P <subscript>r or <superscript>n P <subscript>r. For example there are six distinct permutations of two objects selected out of three: <1,2>, <1,3>, <2,1>, <2,3>, <3,1>, <3,2>. Compare COMBINATION. 2. any rearrangement of all the elements of a finite sequence, such as (1,3,2) and (3,1,2). It is *odd* or *even* according as the number of exchanges of position yielding it from the original order is odd or even. It is a *cyclic permutation* if it merely advances all the elements a fixed number of places; that is, if it is a CYCLE of maximal LENGTH. A *transposition* is a cycle of degree two, and all permutations factor as products of transpositions. See also SIGNATURE. 3. any BIJECTION of a set to itself, where the set may be finite or infinite. [end quote] The definition makes no comment about how to handle duplicate elements, but we can derive an answer for that: 1) We're told how many permutations there are. Picking r elements out of n gives us n!/(n-r)!. If you throw away duplicate permutations, you will fall short. 2) The number of permutations shouldn't depend on the specific entities being permuted. Permutations of (1, 2, 3, 4) and (A, B, C, D) should be identical. If your set of elements contains duplicates, such as (Red ball, Red ball, Red ball, Black ball, Black ball), we can put the balls into 1:1 correspondence with integers (1, 2, 3, 4, 5), permute the integers, then reverse the mapping to get balls again. If we do this, we ought to get the same result as just permuting the balls directly. (That's not to say that there are never cases where we don't care to distinguish betweem one red ball and another. But in general we do distinguish between them.) I think this argument may hinge on what you consider *distinct*. In this context, if I permute the string "RRRBB", I consider all three characters to be distinct. Object identity is an implementation detail (not all programming languages have "objects"); even equality is an irrelevant detail. If I'm choosing to permute "RRRBB" rather than "RB", then clearly *to me* there must be some distinguishing factor between the three Rs and two Bs. Another source is Wolfram Mathworld: http://mathworld.wolfram.com/Permutation.html which likewise says nothing about discarding repeated permutations when there are repeated elements. See also their page on "Ball Picking": http://mathworld.wolfram.com/BallPicking.html Last but not least, here's a source which clearly distinguishes permutations from "permutations with duplicates": http://mathcentral.uregina.ca/QQ/database/QQ.09.07/h/beth3.html and even gives a distinct formula for calculating the number of permutations. Neither Wolfram Mathworld nor the Collins Dictionary of Maths consider this formula important enough to mention, which suggests strongly that it should be considered separate from the default permutations. (A little like cyclic permutations, which are different again.) -- Steven

Steven D'Aprano

7:39 a.m.

On Sat, Oct 12, 2013 at 06:35:31PM +1100, Steven D'Aprano wrote:

...

/s/three/five/ -- Steven

TB

8:18 a.m.

On 10/12/2013 10:35 AM, Steven D'Aprano wrote:

...

I agree that backwards compatibility should be kept, but the current behaviour of itertools.permutations is (IMHO) surprising. So here are my 2c: Until I tried it myself, I was sure that it will be like the corresponding permutations functions in Sage: sage: list(Permutations("aba")) [['a', 'a', 'b'], ['a', 'b', 'a'], ['b', 'a', 'a']] or Mathematica: http://www.wolframalpha.com/input/?i=permutations+of+{a%2C+b%2C+a} Currently the docstring of itertools.permutations just says "Return successive r-length permutations of elements in the iterable", without telling what happens with input of repeated elements. The full doc in the reference manual is better in that regard, but I think at least one example with repeated elements would be nice. Regards, TB

Neil Girdhar

8:20 a.m.

+1 On Sat, Oct 12, 2013 at 4:18 AM, TB <bauertomer@gmail.com> wrote:

...

Mark Lawrence

8:28 a.m.

On 12/10/2013 09:18, TB wrote:

...

I look forward to seeing your suggested doc patch on the Python bug tracker. -- Roses are red, Violets are blue, Most poems rhyme, But this one doesn't. Mark Lawrence

Terry Reedy

8:41 a.m.

On 10/12/2013 3:35 AM, Steven D'Aprano wrote:

...

The items of a set are, by definition of a set, distinct, so the question of different but equal permutations does not arise.

...

The items of a sequence may be duplicates. But in the treatments of permutations I have seen (admittedly not all of them), they are considered to be distinguished by position, so that one may replace the item by counts 1 to n and vice versa.

...

Back to a set of distinct items again. You are correct that itertools.permutations does the right thing by standard definition.

...

The question is whether this particular variation is important inportant enough to put in itertools. It is not a combinatorics module and did not start with permutations. -- Terry Jan Reedy

Neil Girdhar

October 2013

6:50 p.m.

...

Serhiy Storchaka

7:29 p.m.

11.10.13 21:38, Neil Girdhar написав(ла):

...

I think this should be separated function.

MRAB

9:25 p.m.

On 11/10/2013 20:29, Serhiy Storchaka wrote:

...

Steven D'Aprano

2:06 a.m.

On Fri, Oct 11, 2013 at 11:38:33AM -0700, Neil Girdhar wrote:

...

Should we consider fixing itertools.permutations and to output only unique permutations (if possible, although I realize that would break code).

Absolutely not. Even if you were right that it should return unique permutations, and I strongly disagree that you were, the fact that it would break code is a deal-breaker. -- Steven

Neil Girdhar

2:37 a.m.

...

On Fri, Oct 11, 2013 at 11:38:33AM -0700, Neil Girdhar wrote:

...
"It is universally agreed that a list of n distinct symbols has n! permutations. However, when the symbols are not distinct, the most common convention, in mathematics and elsewhere, seems to be to count only distinct permutations." —

I dispute this entire premise. Take a simple (and stereotypical) example, picking balls from an urn.

Say that you have three Red and Two black balls, and randomly select without replacement. If you count only unique permutations, you get only four possibilities:

py> set(''.join(t) for t in itertools.permutations('RRRBB', 2)) {'BR', 'RB', 'RR', 'BB'}

which implies that drawing RR is no more likely than drawing BB, which is incorrect. The right way to model this experiment is not to count distinct permutations, but actual permutations:

py> list(''.join(t) for t in itertools.permutations('RRRBB', 2)) ['RR', 'RR', 'RB', 'RB', 'RR', 'RR', 'RB', 'RB', 'RR', 'RR', 'RB', 'RB', 'BR', 'BR', 'BR', 'BB', 'BR', 'BR', 'BR', 'BB']

which makes it clear that there are two ways of drawing BB compared to six ways of drawing RR. If that's not obvious enough, consider the case where you have two thousand red balls and two black balls -- do you really conclude that there are the same number of ways to pick RR as BB?

So I disagree that counting only distinct permutations is the most useful or common convention. If you're permuting a collection of non-distinct values, you should expect non-distinct permutations.

I'm trying to think of a realistic, physical situation where you would only want distinct permutations, and I can't.

...
Should we consider fixing itertools.permutations and to output only unique permutations (if possible, although I realize that would break code).

Absolutely not. Even if you were right that it should return unique permutations, and I strongly disagree that you were, the fact that it would break code is a deal-breaker.

-- Steven _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas

--

--- You received this message because you are subscribed to a topic in the Google Groups "python-ideas" group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/python-ideas/dDttJfkyu2k/unsubscribe. To unsubscribe from this group and all its topics, send an email to python-ideas+unsubscribe@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.

David Mertz

2:48 a.m.

...

I think it's pretty indisputable that permutations are formally defined this way (and I challenge you to find a source that doesn't agree with that). I'm sure you know that your idea of using permutations to evaluate a multinomial distribution is not efficient. A nicer way to evaluate probabilities is to pass your set through a collections.Counter, and then use the resulting dictionary with scipy.stats.multinomial (if it exists yet).

I believe most people will be surprised that len(permutations(iterable)) does count unique permutations.

Best,

Neil

On Fri, Oct 11, 2013 at 10:06 PM, Steven D'Aprano <steve@pearwood.info>wrote:

...
On Fri, Oct 11, 2013 at 11:38:33AM -0700, Neil Girdhar wrote:

...
"It is universally agreed that a list of n distinct symbols has n! permutations. However, when the symbols are not distinct, the most common convention, in mathematics and elsewhere, seems to be to count only distinct permutations." —

I dispute this entire premise. Take a simple (and stereotypical) example, picking balls from an urn.

Say that you have three Red and Two black balls, and randomly select without replacement. If you count only unique permutations, you get only four possibilities:

py> set(''.join(t) for t in itertools.permutations('RRRBB', 2)) {'BR', 'RB', 'RR', 'BB'}

which implies that drawing RR is no more likely than drawing BB, which is incorrect. The right way to model this experiment is not to count distinct permutations, but actual permutations:

py> list(''.join(t) for t in itertools.permutations('RRRBB', 2)) ['RR', 'RR', 'RB', 'RB', 'RR', 'RR', 'RB', 'RB', 'RR', 'RR', 'RB', 'RB', 'BR', 'BR', 'BR', 'BB', 'BR', 'BR', 'BR', 'BB']

which makes it clear that there are two ways of drawing BB compared to six ways of drawing RR. If that's not obvious enough, consider the case where you have two thousand red balls and two black balls -- do you really conclude that there are the same number of ways to pick RR as BB?

So I disagree that counting only distinct permutations is the most useful or common convention. If you're permuting a collection of non-distinct values, you should expect non-distinct permutations.

I'm trying to think of a realistic, physical situation where you would only want distinct permutations, and I can't.

...
Should we consider fixing itertools.permutations and to output only unique permutations (if possible, although I realize that would break code).

Absolutely not. Even if you were right that it should return unique permutations, and I strongly disagree that you were, the fact that it would break code is a deal-breaker.

-- Steven _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas

--

--- You received this message because you are subscribed to a topic in the Google Groups "python-ideas" group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/python-ideas/dDttJfkyu2k/unsubscribe. To unsubscribe from this group and all its topics, send an email to python-ideas+unsubscribe@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas

Neil Girdhar

October 2013

2:55 a.m.

...

Related to, but not quite the same as Steven D'Aprano's point, I would find it very strange for itertools.permutations() to return a list that was narrowed to equal-but-not-identical items.

This is why I've raised the example of 'items=[Fraction(3,1), Decimal(3.0), 3.0]' several times. I've created the Fraction, Decimal, and float for distinct reasons to get different behaviors and available methods. When I want to look for the permutations of those I don't want "any old random choice of equal values" since presumably I've given them a type for a reason.

On the other hand, I can see a little bit of sense that 'itertools.permutations([3,3,3,3,3,3,3])' doesn't *really* need to tell me a list of 7!==5040 things that are exactly the same as each other. On the other hand, I don't know how to generalize that, since my feeling is far less clear for 'itertools.permutations([1,2,3,4,5,6,6])' ... there's redundancy, but there's also important information in the probability and count of specific sequences.

My feeling, however, is that if one were to trim down the results from a permutations-related function, it is more interesting to me to only eliminate IDENTICAL items, not to eliminate merely EQUAL ones.

On Fri, Oct 11, 2013 at 7:37 PM, Neil Girdhar <mistersheik@gmail.com>wrote:

...
I think it's pretty indisputable that permutations are formally defined this way (and I challenge you to find a source that doesn't agree with that). I'm sure you know that your idea of using permutations to evaluate a multinomial distribution is not efficient. A nicer way to evaluate probabilities is to pass your set through a collections.Counter, and then use the resulting dictionary with scipy.stats.multinomial (if it exists yet).

I believe most people will be surprised that len(permutations(iterable)) does count unique permutations.

Best,

Neil

On Fri, Oct 11, 2013 at 10:06 PM, Steven D'Aprano <steve@pearwood.info>wrote:

...
On Fri, Oct 11, 2013 at 11:38:33AM -0700, Neil Girdhar wrote:

...
"It is universally agreed that a list of n distinct symbols has n! permutations. However, when the symbols are not distinct, the most common convention, in mathematics and elsewhere, seems to be to count only distinct permutations." —

I dispute this entire premise. Take a simple (and stereotypical) example, picking balls from an urn.

Say that you have three Red and Two black balls, and randomly select without replacement. If you count only unique permutations, you get only four possibilities:

py> set(''.join(t) for t in itertools.permutations('RRRBB', 2)) {'BR', 'RB', 'RR', 'BB'}

which implies that drawing RR is no more likely than drawing BB, which is incorrect. The right way to model this experiment is not to count distinct permutations, but actual permutations:

py> list(''.join(t) for t in itertools.permutations('RRRBB', 2)) ['RR', 'RR', 'RB', 'RB', 'RR', 'RR', 'RB', 'RB', 'RR', 'RR', 'RB', 'RB', 'BR', 'BR', 'BR', 'BB', 'BR', 'BR', 'BR', 'BB']

which makes it clear that there are two ways of drawing BB compared to six ways of drawing RR. If that's not obvious enough, consider the case where you have two thousand red balls and two black balls -- do you really conclude that there are the same number of ways to pick RR as BB?

So I disagree that counting only distinct permutations is the most useful or common convention. If you're permuting a collection of non-distinct values, you should expect non-distinct permutations.

I'm trying to think of a realistic, physical situation where you would only want distinct permutations, and I can't.

...
Should we consider fixing itertools.permutations and to output only unique permutations (if possible, although I realize that would break code).

Absolutely not. Even if you were right that it should return unique permutations, and I strongly disagree that you were, the fact that it would break code is a deal-breaker.

-- Steven _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas

--

--- You received this message because you are subscribed to a topic in the Google Groups "python-ideas" group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/python-ideas/dDttJfkyu2k/unsubscribe. To unsubscribe from this group and all its topics, send an email to python-ideas+unsubscribe@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas

-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

Nick Coghlan

4:35 a.m.

On 12 Oct 2013 12:56, "Neil Girdhar" <mistersheik@gmail.com> wrote:

...

I honestly think that Python should stick to the mathematical definition

...

Neil

On Fri, Oct 11, 2013 at 10:48 PM, David Mertz <mertz@gnosis.cx> wrote:

...
Related to, but not quite the same as Steven D'Aprano's point, I would

...

...
This is why I've raised the example of 'items=[Fraction(3,1),

Decimal(3.0), 3.0]' several times. I've created the Fraction, Decimal, and float for distinct reasons to get different behaviors and available methods. When I want to look for the permutations of those I don't want "any old random choice of equal values" since presumably I've given them a type for a reason.

...
On the other hand, I can see a little bit of sense that

'itertools.permutations([3,3,3,3,3,3,3])' doesn't *really* need to tell me a list of 7!==5040 things that are exactly the same as each other. On the other hand, I don't know how to generalize that, since my feeling is far less clear for 'itertools.permutations([1,2,3,4,5,6,6])' ... there's redundancy, but there's also important information in the probability and count of specific sequences.

...
My feeling, however, is that if one were to trim down the results from a

...

...
On Fri, Oct 11, 2013 at 7:37 PM, Neil Girdhar <mistersheik@gmail.com>

wrote:

...
...
I think it's pretty indisputable that permutations are formally defined

...

...
...
I believe most people will be surprised that

len(permutations(iterable)) does count unique permutations.

...
Best,

Neil

On Fri, Oct 11, 2013 at 10:06 PM, Steven D'Aprano <steve@pearwood.info>

wrote:

...
...
On Fri, Oct 11, 2013 at 11:38:33AM -0700, Neil Girdhar wrote:

...
"It is universally agreed that a list of n distinct symbols has n! permutations. However, when the symbols are not distinct, the most

common

...
...
convention, in mathematics and elsewhere, seems to be to count only distinct permutations." —

I dispute this entire premise. Take a simple (and stereotypical) example, picking balls from an urn.

Say that you have three Red and Two black balls, and randomly select without replacement. If you count only unique permutations, you get only four possibilities:

py> set(''.join(t) for t in itertools.permutations('RRRBB', 2)) {'BR', 'RB', 'RR', 'BB'}

which implies that drawing RR is no more likely than drawing BB, which is incorrect. The right way to model this experiment is not to count distinct permutations, but actual permutations:

py> list(''.join(t) for t in itertools.permutations('RRRBB', 2)) ['RR', 'RR', 'RB', 'RB', 'RR', 'RR', 'RB', 'RB', 'RR', 'RR', 'RB', 'RB', 'BR', 'BR', 'BR', 'BB', 'BR', 'BR', 'BR', 'BB']

which makes it clear that there are two ways of drawing BB compared to six ways of drawing RR. If that's not obvious enough, consider the case where you have two thousand red balls and two black balls -- do you really conclude that there are the same number of ways to pick RR as BB?

So I disagree that counting only distinct permutations is the most useful or common convention. If you're permuting a collection of non-distinct values, you should expect non-distinct permutations.

I'm trying to think of a realistic, physical situation where you would only want distinct permutations, and I can't.

...
Should we consider fixing itertools.permutations and to output only unique permutations (if possible, although I realize that would break code).

Absolutely not. Even if you were right that it should return unique permutations, and I strongly disagree that you were, the fact that it would break code is a deal-breaker.

-- Steven _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas

--

--- You received this message because you are subscribed to a topic in the Google Groups "python-ideas" group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/python-ideas/dDttJfkyu2k/unsubscribe. To unsubscribe from this group and all its topics, send an email to

...

...
...
...
For more options, visit https://groups.google.com/groups/opt_out.

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas

-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas

David Mertz

5:26 a.m.

What you propose, Nick, is definitely different from the several functions that have been bandied about here. I.e.

...

...
...
def nick_permutations(items, r=None): ... return (k for k, grp in groupby(permutations(sorted(items),r)))

...

...
...
["".join(p) for p in nonredundant_permutations('aaabb', 3)] ['aaa', 'aab', 'aba', 'abb', 'baa', 'bab', 'bba']

...

...
...
["".join(p) for p in nick_permutations('aaabb', 3)] ['aaa', 'aab', 'aaa', 'aab', 'aba', 'abb', 'aba', 'abb', 'aaa', 'aab', 'aaa', 'aab', 'aba', 'abb', 'aba', 'abb', 'aaa', 'aab', 'aaa', 'aab', 'aba', 'abb', 'aba', 'abb', 'baa', 'bab', 'baa', 'bab', 'baa', 'bab', 'bba', 'baa', 'bab', 'baa', 'bab', 'baa', 'bab', 'bba']

...

...
...
["".join(p) for p in permutations('aaabb', 3)] ['aaa', 'aab', 'aab', 'aaa', 'aab', 'aab', 'aba', 'aba', 'abb', 'aba', 'aba', 'abb', 'aaa', 'aab', 'aab', 'aaa', 'aab', 'aab', 'aba', 'aba', 'abb', 'aba', 'aba', 'abb', 'aaa', 'aab', 'aab', 'aaa', 'aab', 'aab', 'aba', 'aba', 'abb', 'aba', 'aba', 'abb', 'baa', 'baa', 'bab', 'baa', 'baa', 'bab', 'baa', 'baa', 'bab', 'bba', 'bba', 'bba', 'baa', 'baa', 'bab', 'baa', 'baa', 'bab', 'baa', 'baa', 'bab', 'bba', 'bba', 'bba']

...

On 12 Oct 2013 12:56, "Neil Girdhar" <mistersheik@gmail.com> wrote:

...
I honestly think that Python should stick to the mathematical definition

of permutations rather than some kind of consensus of the tiny minority of people here. When next_permutation was added to C++, I believe the whole standards committee discussed it and they came up with the thing that makes the most sense. The fact that dict and set use equality is I think the reason that permutations should use equality.

Why should the behaviour of hash based containers limit the behaviour of itertools?

Python required a permutation solution that is memory efficient and works with arbitrary objects, so that's what itertools provides.

However, you'd also like a memory efficient iterator for *mathematical* permutations that pays attention to object values and filters out equivalent results.

I *believe* the request is equivalent to giving a name to the following genexp:

(k for k, grp in groupby(permutations(sorted(input))))

That's a reasonable enough request (although perhaps more suited to the recipes section in the itertools docs), but conflating it with complaints about the way the existing iterator works is a good way to get people to ignore you (especially now the language specific reasons for the current behaviour have been pointed out, along with confirmation of the fact that backwards compatibility requirements would prohibit changing it even if we wanted to).

Cheers, Nick.

...
Neil

On Fri, Oct 11, 2013 at 10:48 PM, David Mertz <mertz@gnosis.cx> wrote:

...
Related to, but not quite the same as Steven D'Aprano's point, I would

...
...
This is why I've raised the example of 'items=[Fraction(3,1),

Decimal(3.0), 3.0]' several times. I've created the Fraction, Decimal, and float for distinct reasons to get different behaviors and available methods. When I want to look for the permutations of those I don't want "any old random choice of equal values" since presumably I've given them a type for a reason.

...
On the other hand, I can see a little bit of sense that

'itertools.permutations([3,3,3,3,3,3,3])' doesn't *really* need to tell me a list of 7!==5040 things that are exactly the same as each other. On the other hand, I don't know how to generalize that, since my feeling is far less clear for 'itertools.permutations([1,2,3,4,5,6,6])' ... there's redundancy, but there's also important information in the probability and count of specific sequences.

...
My feeling, however, is that if one were to trim down the results from

a permutations-related function, it is more interesting to me to only eliminate IDENTICAL items, not to eliminate merely EQUAL ones.

...
On Fri, Oct 11, 2013 at 7:37 PM, Neil Girdhar <mistersheik@gmail.com>

wrote:

...
...
I think it's pretty indisputable that permutations are formally

defined this way (and I challenge you to find a source that doesn't agree with that). I'm sure you know that your idea of using permutations to evaluate a multinomial distribution is not efficient. A nicer way to evaluate probabilities is to pass your set through a collections.Counter, and then use the resulting dictionary with scipy.stats.multinomial (if it exists yet).

...
I believe most people will be surprised that

len(permutations(iterable)) does count unique permutations.

...
Best,

Neil

On Fri, Oct 11, 2013 at 10:06 PM, Steven D'Aprano <steve@pearwood.info>

wrote:

...
...
On Fri, Oct 11, 2013 at 11:38:33AM -0700, Neil Girdhar wrote:

...
"It is universally agreed that a list of n distinct symbols has n! permutations. However, when the symbols are not distinct, the most

common

...
...
convention, in mathematics and elsewhere, seems to be to count only distinct permutations." —

I dispute this entire premise. Take a simple (and stereotypical) example, picking balls from an urn.

Say that you have three Red and Two black balls, and randomly select without replacement. If you count only unique permutations, you get only four possibilities:

py> set(''.join(t) for t in itertools.permutations('RRRBB', 2)) {'BR', 'RB', 'RR', 'BB'}

which implies that drawing RR is no more likely than drawing BB, which is incorrect. The right way to model this experiment is not to count distinct permutations, but actual permutations:

py> list(''.join(t) for t in itertools.permutations('RRRBB', 2)) ['RR', 'RR', 'RB', 'RB', 'RR', 'RR', 'RB', 'RB', 'RR', 'RR', 'RB', 'RB', 'BR', 'BR', 'BR', 'BB', 'BR', 'BR', 'BR', 'BB']

which makes it clear that there are two ways of drawing BB compared to six ways of drawing RR. If that's not obvious enough, consider the case where you have two thousand red balls and two black balls -- do you really conclude that there are the same number of ways to pick RR as BB?

So I disagree that counting only distinct permutations is the most useful or common convention. If you're permuting a collection of non-distinct values, you should expect non-distinct permutations.

I'm trying to think of a realistic, physical situation where you would only want distinct permutations, and I can't.

...
Should we consider fixing itertools.permutations and to output only unique permutations (if possible, although I realize that would break code).

Absolutely not. Even if you were right that it should return unique permutations, and I strongly disagree that you were, the fact that it would break code is a deal-breaker.

-- Steven _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas

--

--- You received this message because you are subscribed to a topic in

...
...
...
...
To unsubscribe from this topic, visit https://groups.google.com/d/topic/python-ideas/dDttJfkyu2k/unsubscribe. To unsubscribe from this group and all its topics, send an email to

find it very strange for itertools.permutations() to return a list that was narrowed to equal-but-not-identical items. the Google Groups "python-ideas" group. python-ideas+unsubscribe@googlegroups.com.

...
...
...
...
For more options, visit https://groups.google.com/groups/opt_out.

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas

-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas

Neil Girdhar

6:55 a.m.

...

On 12 Oct 2013 12:56, "Neil Girdhar" <mistersheik@gmail.com> wrote:

...
I honestly think that Python should stick to the mathematical definition

of permutations rather than some kind of consensus of the tiny minority of people here. When next_permutation was added to C++, I believe the whole standards committee discussed it and they came up with the thing that makes the most sense. The fact that dict and set use equality is I think the reason that permutations should use equality.

Why should the behaviour of hash based containers limit the behaviour of itertools?

Python required a permutation solution that is memory efficient and works with arbitrary objects, so that's what itertools provides.

However, you'd also like a memory efficient iterator for *mathematical* permutations that pays attention to object values and filters out equivalent results.

I *believe* the request is equivalent to giving a name to the following genexp:

(k for k, grp in groupby(permutations(sorted(input))))

That's a reasonable enough request (although perhaps more suited to the recipes section in the itertools docs), but conflating it with complaints about the way the existing iterator works is a good way to get people to ignore you (especially now the language specific reasons for the current behaviour have been pointed out, along with confirmation of the fact that backwards compatibility requirements would prohibit changing it even if we wanted to).

Cheers, Nick.

...
Neil

On Fri, Oct 11, 2013 at 10:48 PM, David Mertz <mertz@gnosis.cx> wrote:

...
Related to, but not quite the same as Steven D'Aprano's point, I would

...
...
This is why I've raised the example of 'items=[Fraction(3,1),

Decimal(3.0), 3.0]' several times. I've created the Fraction, Decimal, and float for distinct reasons to get different behaviors and available methods. When I want to look for the permutations of those I don't want "any old random choice of equal values" since presumably I've given them a type for a reason.

...
On the other hand, I can see a little bit of sense that

'itertools.permutations([3,3,3,3,3,3,3])' doesn't *really* need to tell me a list of 7!==5040 things that are exactly the same as each other. On the other hand, I don't know how to generalize that, since my feeling is far less clear for 'itertools.permutations([1,2,3,4,5,6,6])' ... there's redundancy, but there's also important information in the probability and count of specific sequences.

...
My feeling, however, is that if one were to trim down the results from

a permutations-related function, it is more interesting to me to only eliminate IDENTICAL items, not to eliminate merely EQUAL ones.

...
On Fri, Oct 11, 2013 at 7:37 PM, Neil Girdhar <mistersheik@gmail.com>

wrote:

...
...
I think it's pretty indisputable that permutations are formally

defined this way (and I challenge you to find a source that doesn't agree with that). I'm sure you know that your idea of using permutations to evaluate a multinomial distribution is not efficient. A nicer way to evaluate probabilities is to pass your set through a collections.Counter, and then use the resulting dictionary with scipy.stats.multinomial (if it exists yet).

...
I believe most people will be surprised that

len(permutations(iterable)) does count unique permutations.

...
Best,

Neil

On Fri, Oct 11, 2013 at 10:06 PM, Steven D'Aprano <steve@pearwood.info>

wrote:

...
...
On Fri, Oct 11, 2013 at 11:38:33AM -0700, Neil Girdhar wrote:

...
"It is universally agreed that a list of n distinct symbols has n! permutations. However, when the symbols are not distinct, the most

common

...
...
convention, in mathematics and elsewhere, seems to be to count only distinct permutations." —

I dispute this entire premise. Take a simple (and stereotypical) example, picking balls from an urn.

Say that you have three Red and Two black balls, and randomly select without replacement. If you count only unique permutations, you get only four possibilities:

py> set(''.join(t) for t in itertools.permutations('RRRBB', 2)) {'BR', 'RB', 'RR', 'BB'}

which implies that drawing RR is no more likely than drawing BB, which is incorrect. The right way to model this experiment is not to count distinct permutations, but actual permutations:

py> list(''.join(t) for t in itertools.permutations('RRRBB', 2)) ['RR', 'RR', 'RB', 'RB', 'RR', 'RR', 'RB', 'RB', 'RR', 'RR', 'RB', 'RB', 'BR', 'BR', 'BR', 'BB', 'BR', 'BR', 'BR', 'BB']

which makes it clear that there are two ways of drawing BB compared to six ways of drawing RR. If that's not obvious enough, consider the case where you have two thousand red balls and two black balls -- do you really conclude that there are the same number of ways to pick RR as BB?

So I disagree that counting only distinct permutations is the most useful or common convention. If you're permuting a collection of non-distinct values, you should expect non-distinct permutations.

I'm trying to think of a realistic, physical situation where you would only want distinct permutations, and I can't.

...
Should we consider fixing itertools.permutations and to output only unique permutations (if possible, although I realize that would break code).

Absolutely not. Even if you were right that it should return unique permutations, and I strongly disagree that you were, the fact that it would break code is a deal-breaker.

-- Steven _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas

--

--- You received this message because you are subscribed to a topic in

...
...
...
...
To unsubscribe from this topic, visit https://groups.google.com/d/topic/python-ideas/dDttJfkyu2k/unsubscribe. To unsubscribe from this group and all its topics, send an email to

find it very strange for itertools.permutations() to return a list that was narrowed to equal-but-not-identical items. the Google Groups "python-ideas" group. python-ideas+unsubscribe@googlegroups.com.

...
...
...
...
For more options, visit https://groups.google.com/groups/opt_out.

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas

-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas

Andrew Barnert

8:22 a.m.

On Oct 11, 2013, at 23:55, Neil Girdhar <mistersheik@gmail.com> wrote:

...

I think my best proposal is to have an efficient distinct_permutations function in itertools. It should be in itertools so that it is discoverable. It should be a function rather one of the recipes proposed to make it as efficient as possible. (Correct me if I'm wrong, but like the set solution, groupby is also not so efficient.)

I welcome the discussion and hope that the most efficient implementation someone here comes up with will be added one day to itertools.

Stephen J. Turnbull

5:10 a.m.

Neil Girdhar writes:

...

I honestly think that Python should stick to the mathematical definition of permutations rather than some kind of consensus of the tiny minority of people here.

...

When next_permutation was added to C++, I believe the whole standards committee discussed it and they came up with the thing that makes the most sense.

...

The fact that dict and set use equality is I think the reason that permutations should use equality.

...

It is universally agreed that a list of n distinct symbols has n! permutations.

Steven D'Aprano

October 2013

6:34 a.m.

On Fri, Oct 11, 2013 at 10:55:06PM -0400, Neil Girdhar wrote:

...

So do I. And that is exactly what itertools.permutations already does. -- Steven

Neil Girdhar

7:17 a.m.

...

Nick Coghlan

3:07 p.m.

On 12 Oct 2013 17:18, "Neil Girdhar" <mistersheik@gmail.com> wrote:

...

I'm sorry, but I can't find a reference supporting the statement that the

...

Best,

Neil

On Sat, Oct 12, 2013 at 2:34 AM, Steven D'Aprano <steve@pearwood.info>

...

...
On Fri, Oct 11, 2013 at 10:55:06PM -0400, Neil Girdhar wrote:

...
I honestly think that Python should stick to the mathematical

definition of

...
...
permutations rather than some kind of consensus of the tiny minority of people here.

So do I. And that is exactly what itertools.permutations already does.

-- Steven _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas

--

--- You received this message because you are subscribed to a topic in the Google Groups "python-ideas" group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/python-ideas/dDttJfkyu2k/unsubscribe. To unsubscribe from this group and all its topics, send an email to

wrote: python-ideas+unsubscribe@googlegroups.com.

...

...
For more options, visit https://groups.google.com/groups/opt_out.

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas

David Mertz

4:56 p.m.

On Sat, Oct 12, 2013 at 8:07 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:

...

One interesting point is that combining collections.Counter.elements() with itertools.permutations() currently does the wrong thing, since itertools.permutations() *always* considers iterable order significant, while for collections.Counter.elements() it's explicitly arbitrary.

I hadn't thought about it, but as I read the docs for 3.4 (and it's the same back through 2.7), not only would both of these be permissible in a Python implementation:

...

...
...
list(collections.Counter({'a':2,'b':1}).elements()) ['a', 'a', 'b']

Or:

...

...
...
list(collections.Counter({'a':2,'b':1}).elements()) ['b', 'a', 'a']

But even this would be per documentation (although really unlikely as an implementation):

...

...
...
list(collections.Counter({'a':2,'b':1}).elements()) ['a', 'b', 'a']

Neil Girdhar

6:56 p.m.

...

On 12 Oct 2013 17:18, "Neil Girdhar" <mistersheik@gmail.com> wrote:

...
I'm sorry, but I can't find a reference supporting the statement that

the current permutations function is consistent with the mathematical definition. Perhaps you would like to find a reference? A quick search yielded the book "the Combinatorics of Permutations": http://books.google.ca/books?id=Op-nF-mBR7YC&lpg=PP1 Please look in the chapter "Permutation of multisets".

Itertools effectively produces the permutation of (index, value) pairs. Hence Steven's point about the permutations of a list not being mathematically defined, so you have to decide what set to map it to in order to decide what counts as a unique value. The mapping itertools uses considers position in the iterable relevant so exchanging two values that are themselves equivalent is still considered a distinct permutation since their original position is taken into account. Like a lot of mathematics, it's a matter of paying close attention to which entities are actually being manipulated and how the equivalence classes are being defined :)

Hence the current proposal amounts to adding another variant that provides the permutations of an unordered multiset instead of those of a set of (index, value) 2-tuples (with the indices stripped from the results).

One interesting point is that combining collections.Counter.elements() with itertools.permutations() currently does the wrong thing, since itertools.permutations() *always* considers iterable order significant, while for collections.Counter.elements() it's explicitly arbitrary.

Cheers, Nick.

...
Best,

Neil

On Sat, Oct 12, 2013 at 2:34 AM, Steven D'Aprano <steve@pearwood.info>

...
...
On Fri, Oct 11, 2013 at 10:55:06PM -0400, Neil Girdhar wrote:

...
I honestly think that Python should stick to the mathematical

definition of

...
...
permutations rather than some kind of consensus of the tiny minority of people here.

So do I. And that is exactly what itertools.permutations already does.

-- Steven _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas

--

--- You received this message because you are subscribed to a topic in the Google Groups "python-ideas" group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/python-ideas/dDttJfkyu2k/unsubscribe. To unsubscribe from this group and all its topics, send an email to

wrote: python-ideas+unsubscribe@googlegroups.com.

...
...
For more options, visit https://groups.google.com/groups/opt_out.

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas

Raymond Hettinger

12:44 a.m.

On Oct 12, 2013, at 11:56 AM, Neil Girdhar <mistersheik@gmail.com> wrote:

...

, I find the current behaviour surprising and would like to see a distinct_permutations function. How do I start to submit a patch?

Ethan Furman

October 2013

1:11 a.m.

On 10/12/2013 05:44 PM, Raymond Hettinger wrote:

...

Neil Girdhar

1:24 a.m.

...

Steven D'Aprano

1:47 a.m.

On Sat, Oct 12, 2013 at 05:44:38PM -0700, Raymond Hettinger wrote:

...

Raymond Hettinger

3:03 a.m.

On Oct 12, 2013, at 6:47 PM, Steven D'Aprano <steve@pearwood.info> wrote:

...

the proposal is not to eliminate duplicates from the population, but from the permutations themselves.

I'm curious about the use cases for this. Other than red/blue marble examples and some puzzle problems, does this come-up in any real problems? Do we actually need this? Raymond

Neil Girdhar

7:38 a.m.

...

Nick Coghlan

9:27 a.m.

On 13 October 2013 17:38, Neil Girdhar <mistersheik@gmail.com> wrote:

...

Neil Girdhar

October 2013

8:56 p.m.

...

On 13 October 2013 17:38, Neil Girdhar <mistersheik@gmail.com> wrote:

...
My intuition is that we want Python to be "complete". Many other languages can find the permutations of a multiset. Python has a permutations function. Many people on stackoverflow expected that function to be able to find those permutations.

Nope, we expressly *don't* want the standard library to be "complete", because that would mean growing to the size of PyPI (or larger). There's always going to be scope for applications to adopt new domain specific dependencies with more in-depth support than that provided by the standard library.

...

Many standard library modules are in fact deliberately designed as "stepping stone" modules that will meet the needs of code which have an incidental relationship to that task, but will need to be replaced with something more sophisticated for code directly related to that domain. Many times, that means they will ignore as irrelevant distinctions that are critical in certain contexts, simply because they don't come up all that often outside those specific domains, and addressing them involves making the core module more complicated to use for more typical cases.

Good point.

...

In this case, the proposed alternate permutations mechanism only makes a difference when:

1. The data set contains equivalent values 2. Input order is not considered significant, so exchanging equivalent values should *not* create a new permutation (i.e. multiset permutations rather than sequence permutations).

If users aren't likely to encounter situations where that makes a difference, then providing both in the standard library isn't being helpful, it's being actively user hostile by asking them to make a decision they're not yet qualified to make for the sake of the few experts that specifically need . Hence Raymond's request for data modelling problems outside the "learning or studying combinatorics" context to make the case for standard library inclusion.

Interestingly, I just found another language which has the equivalent of the currrent behaviour of itertools.permutations: Haskell has it as Data.List.permutations. As far as I can tell, Haskell doesn't offer support for multiset permutations in the core, you need an additional package like Math.Combinatorics (see:

http://hackage.haskell.org/package/multiset-comb-0.2.3/docs/Math-Combinatori... ).

Since iterator based programming in Python is heavily inspired by Haskell, this suggests that the current behaviour of itertools.permutations is appropriate and that Raymond is right to be dubious about including multiset permutations support directly in the standard library.

You've convinced me that itertools permutations is doing the right thing :) I'm not sure if multiset permutations should be in the standard library or not. It is very useful.

...

Those interested in improving the experience of writing combinatorics code in Python may wish to look into helping out with the combinatorics package on PyPI: http://phillipmfeldman.org/Python/for_developers.html (For example, politely approach Phillip to see if he is interested in hosting it on GitHub or BitBucket, providing Sphinx docs on ReadTheDocs, improving the PyPI metadata, etc - note I have no experience with this package, it's just the first hit for "python combinatorics")

...
One suggestion: Why not make it so that itertools.permutations checks if its argument is an instance of collections.Mapping? If it is, we could interpret the items as a mapping from elements to positive integers, which is a compact representation of a multiset. Then, it could do the right thing for that case.

If you want to go down the path of only caring about hashable values, you may want to argue for a permutations method on collections.Counter (it's conceivable that approach has the potential to be even faster than an approach based on accepting and processing an arbitrary iterable, since it can avoid generating repeated values in the first place).

A Counter based multiset permutation algorithm was actually posted to python-list back in 2009, just after collections.Counter was introduced: https://mail.python.org/pipermail/python-list/2009-January/521685.html

Nice find!

...

I just created an updated version of that recipe and posted it as https://bitbucket.org/ncoghlan/misc/src/default/multiset_permutations.py

...

Nick.

-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Mark Dickinson

12:29 p.m.

On Sun, Oct 13, 2013 at 9:56 PM, Neil Girdhar <mistersheik@gmail.com> wrote:

...

By complete I meant that just as if you were to add the "error function, erf" to math, you would want to add an equivalent version to cmath.

Neil Girdhar

12:37 p.m.

...

Oscar Benjamin

1:11 p.m.

On 14 October 2013 13:37, Neil Girdhar <mistersheik@gmail.com> wrote:

...

Neil Girdhar

1:15 p.m.

...

Mark Lawrence

1:26 p.m.

On 14/10/2013 14:15, Neil Girdhar wrote:

...

Why are you comparing core Python modules with third party ones? -- Roses are red, Violets are blue, Most poems rhyme, But this one doesn't. Mark Lawrence

Steven D'Aprano

October 2013

1:27 a.m.

On Mon, Oct 14, 2013 at 08:37:59AM -0400, Neil Girdhar wrote:

...

Neil Girdhar

1:29 a.m.

You make a good point. It was just a random example to illustrate that desire for completeness. On Mon, Oct 14, 2013 at 9:27 PM, Steven D'Aprano <steve@pearwood.info>wrote:

...

Nick Coghlan

1:39 a.m.

On 15 October 2013 11:29, Neil Girdhar <mistersheik@gmail.com> wrote:

...

You make a good point. It was just a random example to illustrate that desire for completeness.

Neil Girdhar

1:40 a.m.

On Mon, Oct 14, 2013 at 9:39 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:

...

Totally agree.

Mark Lawrence

7:30 a.m.

On 15/10/2013 02:39, Nick Coghlan wrote:

...

Bruce Leban

8:34 p.m.

On Mon, Oct 14, 2013 at 6:39 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:

...

"It isn't worth the hassle" is the answer to a whole lot of "Why not X?" questions in software development :)

Raymond Hettinger

October 2013

5:56 p.m.

On Oct 13, 2013, at 1:56 PM, Neil Girdhar <mistersheik@gmail.com> wrote:

...

I'm now convinced that itertools.permutations is fine as it is. I am not totally convinced that multiset_permutations doesn't belong in itertools,

Neil Girdhar

8:28 p.m.

Excellent!

My top two names are
1. multiset_permutations (reflects the mathematical name)
2. anagrams

Note that we may also want to add multiset_combinations.  It hasn't been
part of this discussion, but it may be part of another discussion and I
wanted to point this out as I know many of you are future-conscious.

We seem to be all agreed that we want to accept "r", the length of the
permutation desired.

With permutations, the *set* is passed in as a iterable representing
distinct elements.  With multiset_permutations, there are three ways to
pass in the *multiset*:
- 1. an iterable whose elements (or an optional key function applied to
which) are compared using __eq__
- 2. a dict (of which collections.Counter) is a subclass
- 3. an iterable whose elements are key-value pairs and whose values are
counts

Example uses:
1. multiset_permutations(word)
2. multiset_permutations(Counter(word))
3. multiset_permutations(Counter(word).items())

>From a dictionary:
1. multiset_permutations(itertools.chain.from_iterable(itertools.repeat(k,
v) for k, v in d.items()))
2. multiset_permutations(d)
3. multiset_permutations(d.items())

>From an iterable of key-value pairs:
1. multiset_permutations(itertools.chain.from_iterable(itertools.repeat(k,
v) for k, v in it))
2. multiset_permutations({k: v for k, v in it})
3. multiset_permutations(it)

The advantage of 2 is that no elements are compared by
multiset_permutations (so it is simpler and faster).
The advantage of 3 is that no elements are compared, and they need not be
comparable or hashable.  This version is truly a generalization of the
"permutations" function.  This way, for any input "it" you could pass to
permutations, you could equivalently pass zip(it, itertools.repeat(1)) to
multiset_permutations.

Comments?

Neil

On Mon, Oct 14, 2013 at 1:56 PM, Raymond Hettinger <
raymond.hettinger@gmail.com> wrote:

>
> On Oct 13, 2013, at 1:56 PM, Neil Girdhar <mistersheik@gmail.com> wrote:
>
>  I'm now convinced that itertools.permutations is fine as it is.  I am not
> totally convinced that multiset_permutations doesn't belong in itertools,
>
>
> Now that we have a good algorithm,  I'm open to adding this to itertools,
> but it would need to have a name that didn't create any confusion
> with respect to the existing tools, perhaps something like:
>
>     anagrams(population, r)
>
>     Return an iterator over a all distinct r-length permutations
>     of the population.
>
>     Unlike permutations(), element uniqueness is determined
>     by value rather than by position.  Also, anagrams() makes
>     no guarantees about the order the tuples are generated.
>
>
>
> Raymond
>

Terry Reedy

10:59 p.m.

On 10/14/2013 4:28 PM, Neil Girdhar wrote:

...

I like anagrams. I did not completely get what this issue was about until someone finally mentioned anagrams as use case. -- Terry Jan Reedy

Tim Peters

12:48 a.m.

[Raymond Hettinger]

...

Now that we have a good algorithm, I'm open to adding this to itertools,

...

but it would need to have a name that didn't create any confusion with respect to the existing tools, perhaps something like:

anagrams(population, r)

...

Return an iterator over a all distinct r-length permutations of the population.

Unlike permutations(), element uniqueness is determined by value rather than by position. Also, anagrams() makes no guarantees about the order the tuples are generated.

...

...
...
from fractions import Fraction for a in anagrams([3, 3.0, Fraction(3)], 3): ... print(a)

MRAB

1:17 a.m.

On 15/10/2013 01:48, Tim Peters wrote:

...

Tim Peters

1:40 a.m.

[MRAB]

...

Mark Lawrence

October 2013

11:05 a.m.

On 13/10/2013 08:38, Neil Girdhar wrote:

...

My intuition is that we want Python to be "complete".

Tim Peters

2:45 a.m.

...

permutations([1, 2. 3]); {[1,2,3],[1,3,2],[2,1,3],[2,3,1],[3,1,2],[3,2,1]}

...

permutations([[1, 2], [1, 2], [2, 3]]) {[[1,2],[1,2],[2,3]], [[1,2],[2,3],[1,2]], [[2,3],[1,2],[1,2]]}

...

permutations({1, 1.0, 1, 1.0}) {[1,1.0],[1.0,1]}

That last one may be surprising at first, but note that it's the first example where I passed a _set_ (instead of a list). And:

...

{1, 1.0, 1, 1.0} {1,1.0}

Andrew Barnert

2:57 a.m.

On Oct 11, 2013, at 19:48, David Mertz <mertz@gnosis.cx> wrote:

...

My feeling, however, is that if one were to trim down the results from a permutations-related function, it is more interesting to me to only eliminate IDENTICAL items, not to eliminate merely EQUAL ones.

David Mertz

5:38 a.m.

...

David Mertz

5:48 a.m.

...

Hi Andrew,

I've sort of said as much in my last reply to Nick. But maybe I can clarify further. I can imagine *someone* wanting a filtering of permutations by either identify or equality. Maybe, in fact, by other comparisons also for generality.

This might suggest an API like the following:

equal_perms = distinct_permutations(items, r, filter_by=operator.eq) ident_perms = distinct_permutations(items, r, filter_by=operator.is_)

Or even perhaps, in some use-case that isn't clear to me, e.g.

start_same_perms = distinct_permutations(items, r, filter_by=lambda a,b: a[0]==b[0])

Or perhaps more plausibly, some predicate that, e.g. tests if two returned tuples are the same under case normalization of the strings within them.

I guess the argument then would be what the default value of 'filter_by' might be... but that seems less important to me if there were an option to pass a predicate as you liked.

On Fri, Oct 11, 2013 at 7:57 PM, Andrew Barnert <abarnert@yahoo.com>wrote:

...
On Oct 11, 2013, at 19:48, David Mertz <mertz@gnosis.cx> wrote:

...
My feeling, however, is that if one were to trim down the results from a permutations-related function, it is more interesting to me to only eliminate IDENTICAL items, not to eliminate merely EQUAL ones.

I agree with the rest of your message, but I still think you're wrong here. Anyone who is surprised by distinct_permutations((3.0, 3)) treating the two values the same would be equally surprised by {3.0, 3} having only one member. Or by groupby((3.0, 'a'), (3, 'b')) only having one group. And so on.

In Python, sets, dict keys, groups, etc. work by ==. That was a choice that could have been made differently, but Python made that choice long ago, and has applied it completely consistently, and it would be very strange to choose differently in this case.

-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

Neil Girdhar

7:02 a.m.

Why not just use the standard python way to generalize this: "key" rather than the nonstandard "filter_by". On Sat, Oct 12, 2013 at 1:38 AM, David Mertz <mertz@gnosis.cx> wrote:

...

Hi Andrew,

I've sort of said as much in my last reply to Nick. But maybe I can clarify further. I can imagine *someone* wanting a filtering of permutations by either identify or equality. Maybe, in fact, by other comparisons also for generality.

This might suggest an API like the following:

equal_perms = distinct_permutations(items, r, filter_by=operator.eq) ident_perms = distinct_permutations(items, r, filter_by=operator.is_)

Or even perhaps, in some use-case that isn't clear to me, e.g.

start_same_perms = distinct_permutations(items, r, filter_by=lambda a,b: a[0]==b[0])

Or perhaps more plausibly, some predicate that, e.g. tests if two returned tuples are the same under case normalization of the strings within them.

I guess the argument then would be what the default value of 'filter_by' might be... but that seems less important to me if there were an option to pass a predicate as you liked.

On Fri, Oct 11, 2013 at 7:57 PM, Andrew Barnert <abarnert@yahoo.com>wrote:

...
On Oct 11, 2013, at 19:48, David Mertz <mertz@gnosis.cx> wrote:

...
My feeling, however, is that if one were to trim down the results from a permutations-related function, it is more interesting to me to only eliminate IDENTICAL items, not to eliminate merely EQUAL ones.

I agree with the rest of your message, but I still think you're wrong here. Anyone who is surprised by distinct_permutations((3.0, 3)) treating the two values the same would be equally surprised by {3.0, 3} having only one member. Or by groupby((3.0, 'a'), (3, 'b')) only having one group. And so on.

In Python, sets, dict keys, groups, etc. work by ==. That was a choice that could have been made differently, but Python made that choice long ago, and has applied it completely consistently, and it would be very strange to choose differently in this case.

-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

David Mertz

October 2013

7:09 a.m.

On Sat, Oct 12, 2013 at 12:02 AM, Neil Girdhar <mistersheik@gmail.com>wrote:

...

Why not just use the standard python way to generalize this: "key" rather than the nonstandard "filter_by".

...

On Sat, Oct 12, 2013 at 1:38 AM, David Mertz <mertz@gnosis.cx> wrote:

...
Hi Andrew,

I've sort of said as much in my last reply to Nick. But maybe I can clarify further. I can imagine *someone* wanting a filtering of permutations by either identify or equality. Maybe, in fact, by other comparisons also for generality.

This might suggest an API like the following:

equal_perms = distinct_permutations(items, r, filter_by=operator.eq) ident_perms = distinct_permutations(items, r, filter_by=operator.is_)

Or even perhaps, in some use-case that isn't clear to me, e.g.

start_same_perms = distinct_permutations(items, r, filter_by=lambda a,b: a[0]==b[0])

Or perhaps more plausibly, some predicate that, e.g. tests if two returned tuples are the same under case normalization of the strings within them.

I guess the argument then would be what the default value of 'filter_by' might be... but that seems less important to me if there were an option to pass a predicate as you liked.

On Fri, Oct 11, 2013 at 7:57 PM, Andrew Barnert <abarnert@yahoo.com>wrote:

...
On Oct 11, 2013, at 19:48, David Mertz <mertz@gnosis.cx> wrote:

...
My feeling, however, is that if one were to trim down the results from a permutations-related function, it is more interesting to me to only eliminate IDENTICAL items, not to eliminate merely EQUAL ones.

I agree with the rest of your message, but I still think you're wrong here. Anyone who is surprised by distinct_permutations((3.0, 3)) treating the two values the same would be equally surprised by {3.0, 3} having only one member. Or by groupby((3.0, 'a'), (3, 'b')) only having one group. And so on.

In Python, sets, dict keys, groups, etc. work by ==. That was a choice that could have been made differently, but Python made that choice long ago, and has applied it completely consistently, and it would be very strange to choose differently in this case.

-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

Ron Adam

4:10 p.m.

On 10/12/2013 02:09 AM, David Mertz wrote:

...

Steven D'Aprano

7:35 a.m.

On Fri, Oct 11, 2013 at 10:37:02PM -0400, Neil Girdhar wrote:

...

Steven D'Aprano

7:39 a.m.

On Sat, Oct 12, 2013 at 06:35:31PM +1100, Steven D'Aprano wrote:

...

/s/three/five/ -- Steven

TB

8:18 a.m.

On 10/12/2013 10:35 AM, Steven D'Aprano wrote:

...

Neil Girdhar

8:20 a.m.

+1 On Sat, Oct 12, 2013 at 4:18 AM, TB <bauertomer@gmail.com> wrote:

...

Mark Lawrence

October 2013

8:28 a.m.

On 12/10/2013 09:18, TB wrote:

...

I look forward to seeing your suggested doc patch on the Python bug tracker. -- Roses are red, Violets are blue, Most poems rhyme, But this one doesn't. Mark Lawrence

Terry Reedy

8:41 a.m.

On 10/12/2013 3:35 AM, Steven D'Aprano wrote:

...

The items of a set are, by definition of a set, distinct, so the question of different but equal permutations does not arise.

...

Back to a set of distinct items again. You are correct that itertools.permutations does the right thing by standard definition.

...

The question is whether this particular variation is important inportant enough to put in itertools. It is not a combinatorics module and did not start with permutations. -- Terry Jan Reedy

4171

Age (days ago)

4177

Last active (days ago)

List overview

Download

56 comments

18 participants

participants (18)

Andrew Barnert
Bruce Leban
David Mertz
Ethan Furman
Mark Dickinson
Mark Lawrence
MRAB
Neil Girdhar
Nick Coghlan
Oscar Benjamin
Raymond Hettinger
Ron Adam
Serhiy Storchaka
Stephen J. Turnbull
Steven D'Aprano
TB
Terry Reedy
Tim Peters

Extremely weird itertools.permutations

Mark Lawrence

Mark Lawrence

Mark Lawrence

Mark Lawrence

Mark Lawrence

Mark Lawrence

Mark Lawrence

Mark Lawrence

tags

participants (18)