Include partitioning in itertools

Hi all, I really hope this isn't proposed before - I couldn't find anything in the archives. I want to propose to include a 'partition' (mathematically more correct: a set partition) function in itertools. To my knowledge, partitioning of a set (iterable) is quite a common thing to do and a logic extension of the combinatoric generators in itertools. It is implemented as a Python recipe (http://code.activestate.com/recipes/576795-partitioning-a-sequence/). I found that implementing a partition generator of an iterable isn't very straightforward, which, in my opinion, strengthens the case for implementing it as a separate function. Definitions of partitions: http://mathworld.wolfram.com/SetPartition.html or https://en.wikipedia.org/wiki/Partition_of_a_set Humble regards, Albert.

Hi Albert, sounds useful. I can remember needing it myself. On 01.11.2015 12:08, Albert ten Oever wrote:
Hi all,
I really hope this isn't proposed before - I couldn't find anything in the archives.
I want to propose to include a 'partition' (mathematically more correct: a set partition) function in itertools.
This operates over iterables, right? So, it's not quite about sets alone.
To my knowledge, partitioning of a set (iterable) is quite a common thing to do and a logic extension of the combinatoric generators in itertools.
It is implemented as a Python recipe (http://code.activestate.com/recipes/576795-partitioning-a-sequence/). I found that implementing a partition generator of an iterable isn't very straightforward, which, in my opinion, strengthens the case for implementing it as a separate function.
Not quite sure if I understand the implementation. Reading this http://mathworld.wolfram.com/BellNumber.html for the given example of 6, it should have 203 possible partitions. What am I missing? The implementation over there reads as if one gets a list of all possible partitions. Picking a specific subset ("all 4-part-partitions") could be useful as well.
Definitions of partitions: http://mathworld.wolfram.com/SetPartition.html or https://en.wikipedia.org/wiki/Partition_of_a_set
Humble regards,
Albert.
Best, Sven

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 Hi Sven, The issue here is that such an implementation does not allow “gaps” and implicitly enforces that the elements of such a partition only contain elements directly following each other, thus making it not a complete implementation of producing a set partition. To give a specific counterexample, `['acd', 'bef']` will never be produced. The question to ask is whether you want to implement exact mathematical set partition, or if you implement something with the above constraint, and call it something different like “iterable partition”. If choosing the latter, however, it should be made clear by the function name and in the documentation that this is *not* comparable to set partition. Perhaps it will be useful to implement both concepts, since such an “iterable partition” seems to be easier to implement and will perhaps be faster (which, however, is a blind guess and depends on the implementation of producing a set partition). - -- Sincerely, Lukas -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBCAAGBQJWNlotAAoJENBP4KTXGFXeSJ8P/33IfUqVe+LqKjdjghOVrb2A 2wcj85GJHV5hu8UIYNfchVwvBML7w0LpgvHsopD7NkXLYepOy1AuSDTlePYZGu2g TvVFLd861HZVAQFtFwuuY1bAkNlCDDtNOUZYrortkc7F05TFBaGiTu7Z7zrHG9PN 5eLZ7dpr5y9Ok8nCaerUVnxAn81+A+mr0O2KPAAOfTLA0C0bo4uWJsp1WJMcLF1N o0RpzrZ68Qy1vzfnRaf5T/MpcxIFaPxTYee8wxg/pGse5gPJidvK83G5MV/ylHbK INqG7iPTcD8DJ0OpvR/462QaxBf225fP3Dg+xUtKvpBw2SMrSfc8X9VXOE7EFMJ+ m9Wajx5d1+wVvlTzdEa5Nu5A7I5DGDO+zW7jeuZ2Ri6q6aeKHRt1JLsi/5VIJ+/B 0xb7/a5JKcjskrZuQrlCzVKlvOUq2Wf1SZKlPLq6jsB3xkT1qJhF2QssoR3s2aNn 998gwRl3sJzrdVh9+QxCXjoQ92aqQCLRsvHll+uZ9bjJEhD3SXlyFhqmhLf396fq BMWpxFrSuf4ltqke2WQrzrvpJnxd1MH7yH+2tVQej8/kUHJyCWdBLZeQJeFljiJa 3zYh6BQ8KjwtjNcSHzmhANO/0FSwzVDPSvP2ocEbjYjv5WMXXk0/sNdRgsb3gkQ1 5zuBS6j2n5n9SeUZ0IcM =Ki11 -----END PGP SIGNATURE-----

Hey Lukas, I agree both version (+ the n-part-partition-version of it) might be very useful to have. But where do they go into? itertools might be the right place for the iter_partition; but where to with the set_partition? Best, Sven On 01.11.2015 19:30, Lukas Juhrich wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256
Hi Sven,
The issue here is that such an implementation does not allow “gaps” and implicitly enforces that the elements of such a partition only contain elements directly following each other, thus making it not a complete implementation of producing a set partition. To give a specific counterexample, `['acd', 'bef']` will never be produced.
The question to ask is whether you want to implement exact mathematical set partition, or if you implement something with the above constraint, and call it something different like “iterable partition”. If choosing the latter, however, it should be made clear by the function name and in the documentation that this is *not* comparable to set partition.
Perhaps it will be useful to implement both concepts, since such an “iterable partition” seems to be easier to implement and will perhaps be faster (which, however, is a blind guess and depends on the implementation of producing a set partition).
- -- Sincerely, Lukas -----BEGIN PGP SIGNATURE----- Version: GnuPG v2
iQIcBAEBCAAGBQJWNlotAAoJENBP4KTXGFXeSJ8P/33IfUqVe+LqKjdjghOVrb2A 2wcj85GJHV5hu8UIYNfchVwvBML7w0LpgvHsopD7NkXLYepOy1AuSDTlePYZGu2g TvVFLd861HZVAQFtFwuuY1bAkNlCDDtNOUZYrortkc7F05TFBaGiTu7Z7zrHG9PN 5eLZ7dpr5y9Ok8nCaerUVnxAn81+A+mr0O2KPAAOfTLA0C0bo4uWJsp1WJMcLF1N o0RpzrZ68Qy1vzfnRaf5T/MpcxIFaPxTYee8wxg/pGse5gPJidvK83G5MV/ylHbK INqG7iPTcD8DJ0OpvR/462QaxBf225fP3Dg+xUtKvpBw2SMrSfc8X9VXOE7EFMJ+ m9Wajx5d1+wVvlTzdEa5Nu5A7I5DGDO+zW7jeuZ2Ri6q6aeKHRt1JLsi/5VIJ+/B 0xb7/a5JKcjskrZuQrlCzVKlvOUq2Wf1SZKlPLq6jsB3xkT1qJhF2QssoR3s2aNn 998gwRl3sJzrdVh9+QxCXjoQ92aqQCLRsvHll+uZ9bjJEhD3SXlyFhqmhLf396fq BMWpxFrSuf4ltqke2WQrzrvpJnxd1MH7yH+2tVQej8/kUHJyCWdBLZeQJeFljiJa 3zYh6BQ8KjwtjNcSHzmhANO/0FSwzVDPSvP2ocEbjYjv5WMXXk0/sNdRgsb3gkQ1 5zuBS6j2n5n9SeUZ0IcM =Ki11 -----END PGP SIGNATURE----- _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

On Tue, Nov 03, 2015 at 05:58:31PM +0100, Sven R. Kunze wrote:
Hey Lukas,
I agree both version (+ the n-part-partition-version of it) might be very useful to have.
But where do they go into?
itertools might be the right place for the iter_partition; but where to with the set_partition?
I think the right place for both of them is your own personal Python toolbox. -- Steve

On Nov 3, 2015, at 08:58, Sven R. Kunze <srkunze@mail.de> wrote:
Hey Lukas,
I agree both version (+ the n-part-partition-version of it) might be very useful to have.
But where do they go into?
itertools might be the right place for the iter_partition; but where to with the set_partition?
One more consideration: if you look at what's in itertools today, it isn't really everything you might want to do with and/or to produce iterators. Instead, it's (mostly) a collection of inner-loop building blocks that are accelerated with C code, and can be assembled together in pure Python to do almost everything you might want. The recipes give extensive examples of such things that are useful but don't need to be in the module itself. So, the question isn't just whether iter_partitions is like permutations in what it does, but whether it's like permutations in not having any more minimal building blocks—if there are such building blocks, it might be better to add those to itertools, and iter_partitions to the recipes. Meanwhile, if you think there's a wider need for these functions than can be served by a blog post or ActiveState recipe, surely they should be on PyPI (whether or not they're in the stdlib—after all, if they're that useful, people will want them in 3.5 and 2.7). That also gives you a way to prove their usefulness—look at all these open source projects that use the library, and these comments from users saying "this really should be in the stdlib", and so on. At least for iter_partition, you might want to consider submitting it as an addition to more-itertools instead of posting your own library. He's already got a bunch of other stuff that's itertools-ish that may not belong in the stdlib (including all the recipes from the stdlib docs) but may still be useful enough for people to install the package.

[Albert ten Oever]
... I want to propose to include a 'partition' (mathematically more correct: a set partition) function in itertools.
To my knowledge, partitioning of a set (iterable) is quite a common thing to do and a logic extension of the combinatoric generators in itertools.
It is implemented as a Python recipe (http://code.activestate.com/recipes/576795-partitioning-a-sequence/).
[Sven R. Kunze]
... Not quite sure if I understand the implementation. Reading this http://mathworld.wolfram.com/BellNumber.html for the given example of 6, it should have 203 possible partitions. What am I missing?
The ActiveState recipe has nothing to do with set partitions. Instead it returns all ways of breaking a sequence into a sequence of subsequences whose sum (catenation) is the original sequence. If there are N elements in the original sequence, there are 2**(N-1) ways to do that. That's why there are 32 results in the recipe's example output (2**(len("abcdef")-1) == 32). There would be 203 "set partitions" of a 6-element set.

On 01.11.15 13:08, Albert ten Oever wrote:
I really hope this isn't proposed before - I couldn't find anything in the archives.
I want to propose to include a 'partition' (mathematically more correct: a set partition) function in itertools. To my knowledge, partitioning of a set (iterable) is quite a common thing to do and a logic extension of the combinatoric generators in itertools.
It is implemented as a Python recipe (http://code.activestate.com/recipes/576795-partitioning-a-sequence/). I found that implementing a partition generator of an iterable isn't very straightforward, which, in my opinion, strengthens the case for implementing it as a separate function.
The implementation can be simpler and don't require itertools: def partition(seq): if seq: yield [seq] for i in range(1, len(seq)): s = seq[i:] for p in partition(seq[:i]): p.append(s) yield p
from pprint import pprint pprint(sorted(partition('abcde'), key=len)) [['abcde'], ['a', 'bcde'], ['ab', 'cde'], ['abc', 'de'], ['abcd', 'e'], ['a', 'b', 'cde'], ['a', 'bc', 'de'], ['ab', 'c', 'de'], ['a', 'bcd', 'e'], ['ab', 'cd', 'e'], ['abc', 'd', 'e'], ['a', 'b', 'c', 'de'], ['a', 'b', 'cd', 'e'], ['a', 'bc', 'd', 'e'], ['ab', 'c', 'd', 'e'], ['a', 'b', 'c', 'd', 'e']]

Albert ten Oever writes:
I want to propose to include a 'partition' (mathematically more correct: a set partition) function in itertools. To my knowledge, partitioning of a set (iterable) is quite a common thing to do and a logic extension of the combinatoric generators in itertools.
Even after other discussion, I don't understand. Sets and iterables are rather different things. The recipe from ActiveState would be perfectly happy with partitioning "aaaaaa", but that sequence is not a reasonable representation of a set (I would normally say it has only one element despite being repeated 6 times), and the results makes no sense as a set, either. For example, as sets ["aa", "aaaa"] and ["aaaa", "aa"] are the same. And I suspect that you might get anomolies using a sequence or iterable-based algorithm (for example, (["a"] is not ["a"]) and ("a" is "a") both evaluate to true because of the way the CPython implementation handles different objects). I'm also not clear on what you would hope to get from an infinite iterator. I assume passing an infinite iterator would be a programmer error, but you could also restrict the function to working on concrete sequences. All that doesn't mean it's a bad idea, but at this point it's definitely underspecified. I'd also prefer a more explicit name, such as enumerate_partitions. Steve

This doesn't make sense to me for `itertools`. There is no way to "partition" an infinite sequence in the style of the recipe you show (nor any sensible one I can think of). I think a better name would be `powerset()` anyway; but in either case, it's not an `itertools` type of operation. I think what might be more useful is to make a mix-in class that gave concrete collections a `.powerset()` method that preserved the types of the original collection. I think you can even do this as an iterable without having to concretize the whole set of subsets (but not from an iterable, from a concrete collection, as input). On Sun, Nov 1, 2015 at 3:08 AM, Albert ten Oever <agtoever@hotmail.com> wrote:
Hi all,
I really hope this isn't proposed before - I couldn't find anything in the archives.
I want to propose to include a 'partition' (mathematically more correct: a set partition) function in itertools. To my knowledge, partitioning of a set (iterable) is quite a common thing to do and a logic extension of the combinatoric generators in itertools.
It is implemented as a Python recipe ( http://code.activestate.com/recipes/576795-partitioning-a-sequence/). I found that implementing a partition generator of an iterable isn't very straightforward, which, in my opinion, strengthens the case for implementing it as a separate function.
Definitions of partitions: http://mathworld.wolfram.com/SetPartition.html or https://en.wikipedia.org/wiki/Partition_of_a_set
Humble regards,
Albert.
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

Oh, yeah... I see what you are asking for isn't quite the same thing as power set. But still, it's not something you can do in finite time with infinite iterators (nor in tractable time with *large* iterators). So `itertools` isn't where it belongs. On Sun, Nov 1, 2015 at 5:49 PM, David Mertz <mertz@gnosis.cx> wrote:
This doesn't make sense to me for `itertools`. There is no way to "partition" an infinite sequence in the style of the recipe you show (nor any sensible one I can think of). I think a better name would be `powerset()` anyway; but in either case, it's not an `itertools` type of operation.
I think what might be more useful is to make a mix-in class that gave concrete collections a `.powerset()` method that preserved the types of the original collection. I think you can even do this as an iterable without having to concretize the whole set of subsets (but not from an iterable, from a concrete collection, as input).
On Sun, Nov 1, 2015 at 3:08 AM, Albert ten Oever <agtoever@hotmail.com> wrote:
Hi all,
I really hope this isn't proposed before - I couldn't find anything in the archives.
I want to propose to include a 'partition' (mathematically more correct: a set partition) function in itertools. To my knowledge, partitioning of a set (iterable) is quite a common thing to do and a logic extension of the combinatoric generators in itertools.
It is implemented as a Python recipe ( http://code.activestate.com/recipes/576795-partitioning-a-sequence/). I found that implementing a partition generator of an iterable isn't very straightforward, which, in my opinion, strengthens the case for implementing it as a separate function.
Definitions of partitions: http://mathworld.wolfram.com/SetPartition.html or https://en.wikipedia.org/wiki/Partition_of_a_set
Humble regards,
Albert.
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.
-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

On 2015-11-01 17:58, David Mertz wrote:
Oh, yeah... I see what you are asking for isn't quite the same thing as power set. But still, it's not something you can do in finite time with infinite iterators (nor in tractable time with *large* iterators). So `itertools` isn't where it belongs.
I don't think that reasoning is sound. Itertools already includes several combinatoric generators like permutations and combinations, which aren't computable for infinite iterators and which take intractable time for large finite iterators. -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown

I'm surprised to have just noticed that `itertools` is oddly broken this way (under Python 3.4.3): This is perfectly happy:
def fibs():
a, b = 1, 1 while True: yield b b, a = a+b, b ...
f = ((a,b) for a in fibs() for b in fibs()) next(f) Out[3]: (1, 1) next(f) Out[4]: (1, 2) next(f) Out[5]: (1, 3)
And yet this freezes be greedily trying to go through the infinite iterator:
from itertools import product product(fibs(), fibs())
On Sun, Nov 1, 2015 at 6:37 PM, Brendan Barnwell <brenbarn@brenbarn.net> wrote:
On 2015-11-01 17:58, David Mertz wrote:
Oh, yeah... I see what you are asking for isn't quite the same thing as power set. But still, it's not something you can do in finite time with infinite iterators (nor in tractable time with *large* iterators). So `itertools` isn't where it belongs.
I don't think that reasoning is sound. Itertools already includes several combinatoric generators like permutations and combinations, which aren't computable for infinite iterators and which take intractable time for large finite iterators.
-- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

On Mon, Nov 2, 2015 at 2:42 PM, David Mertz <mertz@gnosis.cx> wrote:
I'm surprised to have just noticed that `itertools` is oddly broken this way (under Python 3.4.3):
This is perfectly happy:
def fibs():
a, b = 1, 1 while True: yield b b, a = a+b, b ...
f = ((a,b) for a in fibs() for b in fibs()) next(f) Out[3]: (1, 1) next(f) Out[4]: (1, 2) next(f) Out[5]: (1, 3)
And yet this freezes be greedily trying to go through the infinite iterator:
from itertools import product product(fibs(), fibs())
Yeah. The problem is that you're looking at a simplified version of the reference implementation. Further down it says that it's equivalent to a rather longer function, which (critically) includes a call to tuple(), to coalesce the iterables into their values. When you use the naive genexp, what you're actually doing is constructing a new instance of b's fibs() for every iteration of a's one (you won't see that on an infinite iterator, but try a finite iterator with some print() calls and you'll see); itertools.product obviously can't do that, as it's given the iterable and not the means of constructing more. The tiny example up the top is inaccurate in its handling of edge cases, but for stable, repeatable iterables, it will return the same results. ChrisA
participants (11)
-
Albert ten Oever
-
Andrew Barnert
-
Brendan Barnwell
-
Chris Angelico
-
David Mertz
-
Lukas Juhrich
-
Serhiy Storchaka
-
Stephen J. Turnbull
-
Steven D'Aprano
-
Sven R. Kunze
-
Tim Peters