[Python-ideas] Extremely weird itertools.permutations

Tue Oct 15 02:48:17 CEST 2013

[Raymond Hettinger]
> Now that we have a good algorithm,  I'm open to adding this to itertools,

I remain reluctant, because I still haven't seen a compelling use
case.  Yes, it generates all distinct r-letter anagrams - but so what?
 LOL ;-)  Seriously, I've written anagram programs several times in my
life, and generating "all possible" never occurred to me because it's
so crushingly inefficient.

> but it would need to have a name that didn't create any confusion
> with respect to the existing tools, perhaps something like:
>
>     anagrams(population, r)

"anagrams" is great!  Inspired :-)

What about an optional argument to define what the _user_ means by
"equality"?  The algorithm I posted had an optional
`equal=operator.__eq__` argument.  Else you're going to be pushed to
add a clumsy `TransformAnagrams` later <0.4 wink>.

>     Return an iterator over a all distinct r-length permutations
>     of the population.
>
>     Unlike permutations(), element uniqueness is determined
>     by value rather than by position.  Also, anagrams() makes
>     no guarantees about the order the tuples are generated.

Well, MRAB's algorithm (and my rewrite) guarantees that _if_ the
elements support a total order, and appear in the iterable in
non-decreasing order, then the anagrams are generated in
non-decreasing lexicographic order.  And that may be a useful
guarantee (hard to tell without a real use case, though!).

There's another ambiguity I haven't seen addressed explicitly.  Consider this:

>>> from fractions import Fraction
>>> for a in anagrams([3, 3.0, Fraction(3)], 3):
...        print(a)

(3, 3.0, Fraction(3, 1))

All the algorithms posted here work to show all 3 elements in this
case.  But why?  If the elements all equal, then other outputs "should
be" acceptable too.  Like

(3, 3, 3)

or

(3.0, Fraction(3, 1), 3.0)

etc.  All those outputs compare equal!

This isn't visible if, e.g., the iterable's elements are letters
(where a == b if and only if str(a) == str(b), so the output looks the
same no matter what).

At least "my" algorithm could be simplified materially if it only
saved (and iterated over) a (single) canonical representative for each
equivalence class, instead of saving entire equivalence classes and
then jumping through hoops to cycle through each equivalence class's
elements.

But, for some reason, output (3, 3, 3) just "looks wrong" above.  I'm
not sure why.