
I think it's necessary to add a segment() method to str type or string module. This method is used to split a string into m parts and return all cases. With segment(), you can avoid tedious calculation and indexing if you want to segment a string. For example: segment('1234', m=3) -> [('1', '2', '34'), ('1', '23', '4'), ('12', '3', '4')] segment('12345', m=3) -> [('1', '2', '345'), ('1', '23', '45'), ('1', '234', '5'), ('12', '3', '45'), ('12', '34', '5'), ('123', '4', '5')] I hope this proposal can be adopted.

On 12/14/19 4:06 AM, smfiles wrote:
My first impression is that if you aren't careful to use a string just slightly bigger then the number of segments, this could easily create a very large number of results. My guess is that the domain for using this is fairly restricted and an algorithm to do this isn't that hard to write and could easily be part of a module to handle that sort of thing. I also suspect different application domains may have differing 'rules' of what they want, for example, why do you assume no empty strings? Also, why just strings, maybe you want to do this to other sequences? -- Richard Damon

This feels much too special purpose for a string method, and probably for anything in the standard library. I'm not sure when someone would want this. But it's an only very sightly special case of integer composition ( https://en.wikipedia.org/wiki/Composition_(combinatorics)). And related to that, of course is integer partition. If we had a composition() function (maybe in itertools... but more likely in more-itertools given the minimalist philosophy of itertools itself), this problem is simply taking slices whose lengths are drawn from the composition. On Sat, Dec 14, 2019, 9:44 AM smfiles <1668151593@qq.com> wrote:

Here's a discussion of both a conceptually simple and a convoluted but fast version (the latter, naturally, by Tim Peters). This is for integer partitions, but compositions are simply the permutations on the full length of the list of each partition. http://code.activestate.com/recipes/218332-generator-for-integer-partitions/ Maybe Timmy will provide something more efficient than his partition code combined with itertools.permutation. I'm not sure if such shortcuts exist myself. On Sat, Dec 14, 2019, 10:41 AM David Mertz <mertz@gnosis.cx> wrote:

On Sat, Dec 14, 2019 at 3:56 PM David Mertz <mertz@gnosis.cx> wrote:
[...] compositions are simply the permutations on the full length of the list of each partition.
Using permutations of partitions would be overkill. For compositions of a given fixed length, it's much easier to compute them directly using a stars-and-bars approach. The OP is simply asking for all possible placements of `m-1` "dividers" in the input string `s`, and those divider positions can be computed directly using `itertools.combinations(range(1, len(s)), m-1)`. That gives the following one-liner: segment = lambda s, m: (tuple(s[i:j] for i, j in zip((0,)+c, c+(len(s),))) for c in itertools.combinations(range(1, len(s)), m-1)) Example output is exactly as the OP requested: >>> list(segment("12345", m=3)) [('1', '2', '345'), ('1', '23', '45'), ('1', '234', '5'), ('12', '3', '45'), ('12', '34', '5'), ('123', '4', '5')] -- Mark

Apologies for the bad formatting. Here are the relevant bits, better formatted (I hope): >>> segment = lambda s, m: (tuple(s[i:j] for i, j in zip((0,)+c, c+(len(s),))) ... for c in itertools.combinations(range(1, len(s)), m-1)) >>> >>> list(segment("12345", m=3)) [('1', '2', '345'), ('1', '23', '45'), ('1', '234', '5'), ('12', '3', '45'), ('12', '34', '5'), ('123', '4', '5')]

[David Mertz <mertz@gnosis.cx>]
Here's a discussion of both a conceptually simple and a convoluted but fast version (the latter, naturally, by Tim Peters).
Not really convoluted - it's the natural thing you'd do "by hand" to move from one partition to the next: look at the current partition, throw out the 1s, also throw out an instance of the smallest component `i` of what remains, then break the sum of what was thrown away into a (sub)partition "greedily" with largest component i-1 and the rest (if any) 1s. Although, no, that may not be obvious at first ;-) The point of the recipe is really to show all that can - with appropriate parallel data structures - be done in O(1) time.
As Mark said, solving this particular problem is a straightforward application of itertools.combinations.
For the ambitious, generating combinatorial objects is a huge topic of its own, but really doesn't belong in itertools. It's unfortunate that `combinations` and `permutations` and `product` live there already, because they really have little to do with "iterator algebra". They're just ordinary generators that can't exploit lazy arguments even when iterators are passed.(first thing they do is convert their iterable arguments to tuples). On the other hand, there is no standard module they _do_ belong in. I'm _starting_ to get concerned that the `math` module may become an incoherent mess over time. For example, we just added `math.comb()`, which didn't really fit anywhere else either, but in an ideal world _would_ live in the same module as `itertools.combinations()`. On the third hand, we haven't yet sprayed enough combinatorial functions all over creation to make much of a case for adding a module dedicated to that area of discrete math. But we certainly _could_ ;-)

I had not known about math.comb() and math.perm() being added in 3.8. Those kinda feel to me like "not every one line function needs to be in the standard library." But I guess wiser people than me saw a reason they are needed. On Sat, Dec 14, 2019, 3:00 PM Tim Peters <tim.peters@gmail.com> wrote:
Not really convoluted - it's the natural thing you'd do "by hand"
I wouldn't do recursion by hand? :-) On the third hand, we haven't yet sprayed enough combinatorial
functions all over creation to make much of a case for adding a module dedicated to that area of discrete math. But we certainly _could_ ;-)
This feels like something that could happily start as a 3rd party package. Maybe it exists and I don't know about it. It would be convenient to have it all bundled in a namespace.

smfiles writes:
In addition to David's pointers to code using combinations, here's a one-liner based on more_itertools. Given combinatoric explosion, it might be better to use a genexp instead of a list as return value, and an "efficient but convoluted" algorithm might be justified if len(s) is at all large. # more_itertools is not in stdlib, but can be pip'ed from more_itertools import partitions as p def segment(s, m): return [tuple("".join(y) for y in x) for x in p(s) if len(x) == m]

Hi I've looked at the example that smfiles (aka 1668151593 at tencent) provided. It looks to me like smfiles is asking for help with a homework problem. If smfiles tells me I'm wrong, I'll apologise. By the way, because the number of parts is fixed, and because the order matters, the OP's problem does not involve integer partitions. Off topic. By chance last week I was looking for Python code to generate integer partitions. I found the following particularly helpful. https://stackoverflow.com/questions/10035752/elegant-python-code-for-integer... http://jeromekelleher.net/generating-integer-partitions.html https://arxiv.org/abs/0909.2331 # Generating All Partitions: A Comparison Of Two Encodings The last item shows "the ascending composition generation algorithm is substantially more efficient than its descending composition counterpart". It so happens that for my problem it's ascending generation that I want. with best regards Jonathan

import itertools as itools def segment(it, n=1): try: len_it = len(it) it_true = it except TypeError: it_true = tuple(it) len_it = len(it_true) size, rest = divmod(len_it, n) sizes = [size] * n for i in range(rest): sizes[-i] += 1 all_sizes = frozenset(itools.permutations(sizes)) res = [] for sizes in all_sizes: elem = [] i = 0 for size in sizes: elem.append(it[i:i+size]) i += size res.append(elem) return res

Excuse me, ignore my previous post, this is the correct implementation. It works for every iterable: import itertools as itools def segment(it, n=1): if n < 1: raise ValueError(f"Number of segment must be > 0") try: len_it = len(it) it[0:0] it_true = it except TypeError: it_true = tuple(it) len_it = len(it_true) if len_it < n: raise ValueError(f"Iterable length {len_it} must be greater than number of segments {n}") size, rest = divmod(len_it, n) sizes = [size] * n orig_sizes = sizes.copy() all_sizes = [] for i in range(1, rest+1): for j in range(1, rest-i+2): sizes[-j] += i all_sizes.append(frozenset(itools.permutations(sizes))) sizes = orig_sizes.copy() if not all_sizes: all_sizes.append((sizes, )) res = [] for perm_sizes in all_sizes: for sizes in perm_sizes: elem = [] i = 0 for size in sizes: elem.append(it_true[i:i+size]) i += size res.append(elem) return res

Why not create a new custom class which has this function there? You can use that object whenever you need the segments. This is a very rare use case and doesn't make sense to implement that in str. On Wed, 25 Dec, 2019, 09:25 python-ideas--- via Python-ideas, < python-ideas@python.org> wrote:

See my implementation, is generic and not only for strings. It could be added to more-itertools, I suppose: https://mail.python.org/archives/list/python-ideas@python.org/message/E452LQ...

Excuse me again, I just relized that my algorithm was flawed. I just inserted in my function the brilliant algorithm of Mark Dickinson and now it works: import itertools as itools def segment(it, n=1): if n < 1: raise ValueError(f"Number of segment must be > 0, {n} found") try: len_it = len(it) it[0:0] it_true = it except TypeError: it_true = tuple(it) len_it = len(it_true) if len_it < n: err = f"Iterable length {len_it} must be greater than number of segments {n}" raise ValueError(err) return ( [it_true[i:j] for i, j in zip((0, ) + startends, startends + (len_it, ))] for startends in itools.combinations(range(1, len_it), n-1) )

On 12/14/19 4:06 AM, smfiles wrote:
My first impression is that if you aren't careful to use a string just slightly bigger then the number of segments, this could easily create a very large number of results. My guess is that the domain for using this is fairly restricted and an algorithm to do this isn't that hard to write and could easily be part of a module to handle that sort of thing. I also suspect different application domains may have differing 'rules' of what they want, for example, why do you assume no empty strings? Also, why just strings, maybe you want to do this to other sequences? -- Richard Damon

This feels much too special purpose for a string method, and probably for anything in the standard library. I'm not sure when someone would want this. But it's an only very sightly special case of integer composition ( https://en.wikipedia.org/wiki/Composition_(combinatorics)). And related to that, of course is integer partition. If we had a composition() function (maybe in itertools... but more likely in more-itertools given the minimalist philosophy of itertools itself), this problem is simply taking slices whose lengths are drawn from the composition. On Sat, Dec 14, 2019, 9:44 AM smfiles <1668151593@qq.com> wrote:

Here's a discussion of both a conceptually simple and a convoluted but fast version (the latter, naturally, by Tim Peters). This is for integer partitions, but compositions are simply the permutations on the full length of the list of each partition. http://code.activestate.com/recipes/218332-generator-for-integer-partitions/ Maybe Timmy will provide something more efficient than his partition code combined with itertools.permutation. I'm not sure if such shortcuts exist myself. On Sat, Dec 14, 2019, 10:41 AM David Mertz <mertz@gnosis.cx> wrote:

On Sat, Dec 14, 2019 at 3:56 PM David Mertz <mertz@gnosis.cx> wrote:
[...] compositions are simply the permutations on the full length of the list of each partition.
Using permutations of partitions would be overkill. For compositions of a given fixed length, it's much easier to compute them directly using a stars-and-bars approach. The OP is simply asking for all possible placements of `m-1` "dividers" in the input string `s`, and those divider positions can be computed directly using `itertools.combinations(range(1, len(s)), m-1)`. That gives the following one-liner: segment = lambda s, m: (tuple(s[i:j] for i, j in zip((0,)+c, c+(len(s),))) for c in itertools.combinations(range(1, len(s)), m-1)) Example output is exactly as the OP requested: >>> list(segment("12345", m=3)) [('1', '2', '345'), ('1', '23', '45'), ('1', '234', '5'), ('12', '3', '45'), ('12', '34', '5'), ('123', '4', '5')] -- Mark

Apologies for the bad formatting. Here are the relevant bits, better formatted (I hope): >>> segment = lambda s, m: (tuple(s[i:j] for i, j in zip((0,)+c, c+(len(s),))) ... for c in itertools.combinations(range(1, len(s)), m-1)) >>> >>> list(segment("12345", m=3)) [('1', '2', '345'), ('1', '23', '45'), ('1', '234', '5'), ('12', '3', '45'), ('12', '34', '5'), ('123', '4', '5')]

[David Mertz <mertz@gnosis.cx>]
Here's a discussion of both a conceptually simple and a convoluted but fast version (the latter, naturally, by Tim Peters).
Not really convoluted - it's the natural thing you'd do "by hand" to move from one partition to the next: look at the current partition, throw out the 1s, also throw out an instance of the smallest component `i` of what remains, then break the sum of what was thrown away into a (sub)partition "greedily" with largest component i-1 and the rest (if any) 1s. Although, no, that may not be obvious at first ;-) The point of the recipe is really to show all that can - with appropriate parallel data structures - be done in O(1) time.
As Mark said, solving this particular problem is a straightforward application of itertools.combinations.
For the ambitious, generating combinatorial objects is a huge topic of its own, but really doesn't belong in itertools. It's unfortunate that `combinations` and `permutations` and `product` live there already, because they really have little to do with "iterator algebra". They're just ordinary generators that can't exploit lazy arguments even when iterators are passed.(first thing they do is convert their iterable arguments to tuples). On the other hand, there is no standard module they _do_ belong in. I'm _starting_ to get concerned that the `math` module may become an incoherent mess over time. For example, we just added `math.comb()`, which didn't really fit anywhere else either, but in an ideal world _would_ live in the same module as `itertools.combinations()`. On the third hand, we haven't yet sprayed enough combinatorial functions all over creation to make much of a case for adding a module dedicated to that area of discrete math. But we certainly _could_ ;-)

I had not known about math.comb() and math.perm() being added in 3.8. Those kinda feel to me like "not every one line function needs to be in the standard library." But I guess wiser people than me saw a reason they are needed. On Sat, Dec 14, 2019, 3:00 PM Tim Peters <tim.peters@gmail.com> wrote:
Not really convoluted - it's the natural thing you'd do "by hand"
I wouldn't do recursion by hand? :-) On the third hand, we haven't yet sprayed enough combinatorial
functions all over creation to make much of a case for adding a module dedicated to that area of discrete math. But we certainly _could_ ;-)
This feels like something that could happily start as a 3rd party package. Maybe it exists and I don't know about it. It would be convenient to have it all bundled in a namespace.

smfiles writes:
In addition to David's pointers to code using combinations, here's a one-liner based on more_itertools. Given combinatoric explosion, it might be better to use a genexp instead of a list as return value, and an "efficient but convoluted" algorithm might be justified if len(s) is at all large. # more_itertools is not in stdlib, but can be pip'ed from more_itertools import partitions as p def segment(s, m): return [tuple("".join(y) for y in x) for x in p(s) if len(x) == m]

Hi I've looked at the example that smfiles (aka 1668151593 at tencent) provided. It looks to me like smfiles is asking for help with a homework problem. If smfiles tells me I'm wrong, I'll apologise. By the way, because the number of parts is fixed, and because the order matters, the OP's problem does not involve integer partitions. Off topic. By chance last week I was looking for Python code to generate integer partitions. I found the following particularly helpful. https://stackoverflow.com/questions/10035752/elegant-python-code-for-integer... http://jeromekelleher.net/generating-integer-partitions.html https://arxiv.org/abs/0909.2331 # Generating All Partitions: A Comparison Of Two Encodings The last item shows "the ascending composition generation algorithm is substantially more efficient than its descending composition counterpart". It so happens that for my problem it's ascending generation that I want. with best regards Jonathan

import itertools as itools def segment(it, n=1): try: len_it = len(it) it_true = it except TypeError: it_true = tuple(it) len_it = len(it_true) size, rest = divmod(len_it, n) sizes = [size] * n for i in range(rest): sizes[-i] += 1 all_sizes = frozenset(itools.permutations(sizes)) res = [] for sizes in all_sizes: elem = [] i = 0 for size in sizes: elem.append(it[i:i+size]) i += size res.append(elem) return res

Excuse me, ignore my previous post, this is the correct implementation. It works for every iterable: import itertools as itools def segment(it, n=1): if n < 1: raise ValueError(f"Number of segment must be > 0") try: len_it = len(it) it[0:0] it_true = it except TypeError: it_true = tuple(it) len_it = len(it_true) if len_it < n: raise ValueError(f"Iterable length {len_it} must be greater than number of segments {n}") size, rest = divmod(len_it, n) sizes = [size] * n orig_sizes = sizes.copy() all_sizes = [] for i in range(1, rest+1): for j in range(1, rest-i+2): sizes[-j] += i all_sizes.append(frozenset(itools.permutations(sizes))) sizes = orig_sizes.copy() if not all_sizes: all_sizes.append((sizes, )) res = [] for perm_sizes in all_sizes: for sizes in perm_sizes: elem = [] i = 0 for size in sizes: elem.append(it_true[i:i+size]) i += size res.append(elem) return res

Why not create a new custom class which has this function there? You can use that object whenever you need the segments. This is a very rare use case and doesn't make sense to implement that in str. On Wed, 25 Dec, 2019, 09:25 python-ideas--- via Python-ideas, < python-ideas@python.org> wrote:

See my implementation, is generic and not only for strings. It could be added to more-itertools, I suppose: https://mail.python.org/archives/list/python-ideas@python.org/message/E452LQ...

Excuse me again, I just relized that my algorithm was flawed. I just inserted in my function the brilliant algorithm of Mark Dickinson and now it works: import itertools as itools def segment(it, n=1): if n < 1: raise ValueError(f"Number of segment must be > 0, {n} found") try: len_it = len(it) it[0:0] it_true = it except TypeError: it_true = tuple(it) len_it = len(it_true) if len_it < n: err = f"Iterable length {len_it} must be greater than number of segments {n}" raise ValueError(err) return ( [it_true[i:j] for i, j in zip((0, ) + startends, startends + (len_it, ))] for startends in itools.combinations(range(1, len_it), n-1) )
participants (12)
-
Anders Hovmöller
-
David Mertz
-
Jonathan Fine
-
Mark Dickinson
-
Mark Dickinson
-
python-ideas@marco.sulla.e4ward.com
-
Random832
-
Richard Damon
-
Siddharth Prajosh
-
smfiles
-
Stephen J. Turnbull
-
Tim Peters