Implement `itertools.permutations.__getitem__` and `itertools.permutations.index`

I suggest implementing: - `itertools.permutations.__getitem__`, for getting a permutation by its index number, and possibly also slicing, and - `itertools.permutations.index` for getting the index number of a given permutation. What do you think? Thanks, Ram.

On Mon, May 05, 2014 at 09:07:27AM -0700, Ram Rachum wrote:
And now that I think about it, I'd also like to give it a `__len__`, and to give `itertools.product` the same treatment. What do you think?
Consider: p = itertools.permutations('CAT') assert len(p) == 6 So far, that's obvious. But: next(p) => returns a permutation Now what will len(p) return? If it still returns 6, that will lead to bugs when people check the len, but fail to realise that some of those permutations have already been consumed. In the most extreme case, you could have: assert len(p) == 6 list(p) == [] which is terribly surprising. On the other hand, if len(p) returns the number of permutations remaining, apart from increasing the complexity of the iterator, it will also be surprising to those who expect the length to be the total number of permutations. I would rather have a separate API, perhaps something like this: p.number() # returns the total number of permutations -- Steven

On Mon, May 05, 2014 at 06:17:16AM -0700, Ram Rachum wrote:
An intriguing idea. range() objects also implement indexing, and len. But range() objects have an obvious, unambiguous order: range(2, 6) *must* give 2, 3, 4, 5, in that order, by the definition of range. Permutations aren't like that. The order of the permutations is an implementation detail, not part of the definition. If permutations provides indexing operations, then the order becomes part of the interface. I'm not sure that's such a good idea. I think, rather that adding __getitem__ to permutations, I would rather see a separate function (not iterator) which returns the nth permutation. -- Steven

I don't think the order of permutation is implementation detail. Python implementations should follow CPython's documented order. https://docs.python.org/3.4/library/itertools.html#itertools.permutations
Permutations are emitted in lexicographic sort order. So, if the input iterable is sorted, the permutation tuples will be produced in sorted order.
On Tue, May 6, 2014 at 2:15 AM, Steven D'Aprano <steve@pearwood.info> wrote:
-- INADA Naoki <songofacandy@gmail.com>

On 05/05/2014 03:22 PM, INADA Naoki wrote:
What does that mean? If permutations are emitted in an order, why does the input iterable have to be ordered? What happens if it's not? --> list(''.join(p) for p in permutations('abc')) ['abc', 'acb', 'bac', 'bca', 'cab', 'cba'] --> list(''.join(p) for p in permutations('cab')) ['cab', 'cba', 'acb', 'abc', 'bca', 'bac'] Okay, read http://en.wikipedia.org/wiki/Lexicographical_order -- I think 'lexicographic' is not the best choice of word... maybe positional? -- ~Ethan~

On Tue, May 06, 2014 at 07:22:56AM +0900, INADA Naoki wrote:
Hmmm. Well, since the order of permutations is documented, I suppose my objection is answered. In that case, it becomes a question of whether or not there is an easy way to generate the Nth permutation without having to iterate through the previous N-1 permutations.
I think I know what the docs are trying to say, but I'm not sure if they are quite saying it correctly. If the permutations are emitted in "lexicographic sort order", that implies that they are sortable, but that's not necessarily the case: py> 4j > 2j Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: no ordering relation is defined for complex numbers py> list(itertools.permutations([4j, 2j])) [(4j, 2j), (2j, 4j)] I think that just removing the word "sort" is sufficient: "Permutations are emitted in lexicographic order" is meaningful, and correct, even when the elements are not sortable. -- Steven

On Tue, May 6, 2014 at 5:39 AM, Steven D'Aprano <steve@pearwood.info> wrote:
Yes, it is possible using factorial decomposition of N. See, for an example: http://stackoverflow.com/a/7919887/40076 - Tal Einat

On 6 May 2014 10:35, Tal Einat <taleinat@gmail.com> wrote:
For large N, this is much slower than itertools.permutations when you only want the first few entries. p = itertools.permutations(range(10000)) for i in range(5): print(next(p)) vs for i in range(5): print(ithperm(10000, i)) The first is substantially faster. That's not to say that ithperm isn't useful, just that its computational complexity may be surprising if it's spelled as an indexing operation. Paul

On Tue, May 6, 2014 at 3:45 PM, Paul Moore <p.f.moore@gmail.com> wrote:
If someone just wants the first few entries, they probably aren't worried about it being super fast. And if they were, they could just iterate to get the first permutations. As for getting anything past the first few permutations (e.g. an arbitrary one), factorial decomposition would be faster by several orders of magnitude than iterating from the beginning. For relatively large permutations, iterating from the beginning could be unfeasible, while factorial decomposition would still take far less than a second. The real question IMO is if this is useful enough to bother including in the stdlib. For example, I don't think it would pass the "potential uses in the stdlib" test. Perhaps Ram (the OP) has some actual use-cases for this? - Tal

On 6 May 2014 16:40, Tal Einat <taleinat@gmail.com> wrote:
Agreed, I suspect this is more appropriate as a utility on PyPI. But I stand by my statement that wherever it's implemented, it should *not* be spelled permutations(x)[N], as having indexing with a small index being significantly slower than a few calls to next() is a nasty performance trap for the unwary (no matter how rare it will be in practice). Paul

Hi Tal, I'm using it for a project of my own (optimizing keyboard layout) but I can't make the case that it's useful for the stdlib. I'd understand if it would be omitted for not being enough of a common need. On Tue, May 6, 2014 at 6:40 PM, Tal Einat <taleinat@gmail.com> wrote:

On Wed, May 7, 2014 at 8:21 PM, Ram Rachum <ram.rachum@gmail.com> wrote:
At the least, this (a function for getting a specific permutation by lexicographical-order index) could make a nice cookbook recipe. - Tal

I'm probably going to implement it in my python_toolbox package. I already implemented 30% and it's really cool. It's at the point where I doubt that I want it in the stdlib because I've gotten so much awesome functionality into it and I'd hate to (a) have 80% of it stripped and (b) have the class names changed to be non-Pythonic :) On Wed, May 7, 2014 at 8:40 PM, Tal Einat <taleinat@gmail.com> wrote:

I really like this and hope that it eventually makes it into the stdlib. It's also a good argument for your other suggestion whereby some of the itertools to return Iterables rather than Iterators like range does. Best, Neil On Wednesday, May 7, 2014 1:43:20 PM UTC-4, Ram Rachum wrote:

On Mon, May 05, 2014 at 09:07:27AM -0700, Ram Rachum wrote:
And now that I think about it, I'd also like to give it a `__len__`, and to give `itertools.product` the same treatment. What do you think?
Consider: p = itertools.permutations('CAT') assert len(p) == 6 So far, that's obvious. But: next(p) => returns a permutation Now what will len(p) return? If it still returns 6, that will lead to bugs when people check the len, but fail to realise that some of those permutations have already been consumed. In the most extreme case, you could have: assert len(p) == 6 list(p) == [] which is terribly surprising. On the other hand, if len(p) returns the number of permutations remaining, apart from increasing the complexity of the iterator, it will also be surprising to those who expect the length to be the total number of permutations. I would rather have a separate API, perhaps something like this: p.number() # returns the total number of permutations -- Steven

On Mon, May 05, 2014 at 06:17:16AM -0700, Ram Rachum wrote:
An intriguing idea. range() objects also implement indexing, and len. But range() objects have an obvious, unambiguous order: range(2, 6) *must* give 2, 3, 4, 5, in that order, by the definition of range. Permutations aren't like that. The order of the permutations is an implementation detail, not part of the definition. If permutations provides indexing operations, then the order becomes part of the interface. I'm not sure that's such a good idea. I think, rather that adding __getitem__ to permutations, I would rather see a separate function (not iterator) which returns the nth permutation. -- Steven

I don't think the order of permutation is implementation detail. Python implementations should follow CPython's documented order. https://docs.python.org/3.4/library/itertools.html#itertools.permutations
Permutations are emitted in lexicographic sort order. So, if the input iterable is sorted, the permutation tuples will be produced in sorted order.
On Tue, May 6, 2014 at 2:15 AM, Steven D'Aprano <steve@pearwood.info> wrote:
-- INADA Naoki <songofacandy@gmail.com>

On 05/05/2014 03:22 PM, INADA Naoki wrote:
What does that mean? If permutations are emitted in an order, why does the input iterable have to be ordered? What happens if it's not? --> list(''.join(p) for p in permutations('abc')) ['abc', 'acb', 'bac', 'bca', 'cab', 'cba'] --> list(''.join(p) for p in permutations('cab')) ['cab', 'cba', 'acb', 'abc', 'bca', 'bac'] Okay, read http://en.wikipedia.org/wiki/Lexicographical_order -- I think 'lexicographic' is not the best choice of word... maybe positional? -- ~Ethan~

On Tue, May 06, 2014 at 07:22:56AM +0900, INADA Naoki wrote:
Hmmm. Well, since the order of permutations is documented, I suppose my objection is answered. In that case, it becomes a question of whether or not there is an easy way to generate the Nth permutation without having to iterate through the previous N-1 permutations.
I think I know what the docs are trying to say, but I'm not sure if they are quite saying it correctly. If the permutations are emitted in "lexicographic sort order", that implies that they are sortable, but that's not necessarily the case: py> 4j > 2j Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: no ordering relation is defined for complex numbers py> list(itertools.permutations([4j, 2j])) [(4j, 2j), (2j, 4j)] I think that just removing the word "sort" is sufficient: "Permutations are emitted in lexicographic order" is meaningful, and correct, even when the elements are not sortable. -- Steven

On Tue, May 6, 2014 at 5:39 AM, Steven D'Aprano <steve@pearwood.info> wrote:
Yes, it is possible using factorial decomposition of N. See, for an example: http://stackoverflow.com/a/7919887/40076 - Tal Einat

On 6 May 2014 10:35, Tal Einat <taleinat@gmail.com> wrote:
For large N, this is much slower than itertools.permutations when you only want the first few entries. p = itertools.permutations(range(10000)) for i in range(5): print(next(p)) vs for i in range(5): print(ithperm(10000, i)) The first is substantially faster. That's not to say that ithperm isn't useful, just that its computational complexity may be surprising if it's spelled as an indexing operation. Paul

On Tue, May 6, 2014 at 3:45 PM, Paul Moore <p.f.moore@gmail.com> wrote:
If someone just wants the first few entries, they probably aren't worried about it being super fast. And if they were, they could just iterate to get the first permutations. As for getting anything past the first few permutations (e.g. an arbitrary one), factorial decomposition would be faster by several orders of magnitude than iterating from the beginning. For relatively large permutations, iterating from the beginning could be unfeasible, while factorial decomposition would still take far less than a second. The real question IMO is if this is useful enough to bother including in the stdlib. For example, I don't think it would pass the "potential uses in the stdlib" test. Perhaps Ram (the OP) has some actual use-cases for this? - Tal

On 6 May 2014 16:40, Tal Einat <taleinat@gmail.com> wrote:
Agreed, I suspect this is more appropriate as a utility on PyPI. But I stand by my statement that wherever it's implemented, it should *not* be spelled permutations(x)[N], as having indexing with a small index being significantly slower than a few calls to next() is a nasty performance trap for the unwary (no matter how rare it will be in practice). Paul

Hi Tal, I'm using it for a project of my own (optimizing keyboard layout) but I can't make the case that it's useful for the stdlib. I'd understand if it would be omitted for not being enough of a common need. On Tue, May 6, 2014 at 6:40 PM, Tal Einat <taleinat@gmail.com> wrote:

On Wed, May 7, 2014 at 8:21 PM, Ram Rachum <ram.rachum@gmail.com> wrote:
At the least, this (a function for getting a specific permutation by lexicographical-order index) could make a nice cookbook recipe. - Tal

I'm probably going to implement it in my python_toolbox package. I already implemented 30% and it's really cool. It's at the point where I doubt that I want it in the stdlib because I've gotten so much awesome functionality into it and I'd hate to (a) have 80% of it stripped and (b) have the class names changed to be non-Pythonic :) On Wed, May 7, 2014 at 8:40 PM, Tal Einat <taleinat@gmail.com> wrote:

I really like this and hope that it eventually makes it into the stdlib. It's also a good argument for your other suggestion whereby some of the itertools to return Iterables rather than Iterators like range does. Best, Neil On Wednesday, May 7, 2014 1:43:20 PM UTC-4, Ram Rachum wrote:
participants (7)
-
Ethan Furman
-
INADA Naoki
-
Neil Girdhar
-
Paul Moore
-
Ram Rachum
-
Steven D'Aprano
-
Tal Einat