I think it's necessary to add a segment() method to str type or string module. This method is used to split a string into m parts and return all cases. With segment(), you can avoid tedious calculation and indexing if you want to segment a string. For example: segment('1234', m=3) -> [('1', '2', '34'), ('1', '23', '4'), ('12', '3', '4')] segment('12345', m=3) -> [('1', '2', '345'), ('1', '23', '45'), ('1', '234', '5'), ('12', '3', '45'), ('12', '34', '5'), ('123', '4', '5')] I hope this proposal can be adopted.
What are the practical uses for this? I don't recall having done this in my soon 20 years career.
On 14 Dec 2019, at 15:42, smfiles <1668151593@qq.com> wrote:
I think it's necessary to add a segment() method to str type or string module. This method is used to split a string into m parts and return all cases. With segment(), you can avoid tedious calculation and indexing if you want to segment a string.
For example:
segment('1234', m=3) -> [('1', '2', '34'), ('1', '23', '4'), ('12', '3', '4')] segment('12345', m=3) -> [('1', '2', '345'), ('1', '23', '45'), ('1', '234', '5'), ('12', '3', '45'), ('12', '34', '5'), ('123', '4', '5')]
I hope this proposal can be adopted. _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/VYDU3J... Code of Conduct: http://python.org/psf/codeofconduct/
On 12/14/19 4:06 AM, smfiles wrote:
I think it's necessary to add a segment() method to str type or string module. This method is used to split a string into m parts and return all cases. With segment(), you can avoid tedious calculation and indexing if you want to segment a string.
For example:
segment('1234', m=3) -> [('1', '2', '34'), ('1', '23', '4'), ('12', '3', '4')] segment('12345', m=3) -> [('1', '2', '345'), ('1', '23', '45'), ('1', '234', '5'), ('12', '3', '45'), ('12', '34', '5'), ('123', '4', '5')]
I hope this proposal can be adopted.
My first impression is that if you aren't careful to use a string just slightly bigger then the number of segments, this could easily create a very large number of results. My guess is that the domain for using this is fairly restricted and an algorithm to do this isn't that hard to write and could easily be part of a module to handle that sort of thing. I also suspect different application domains may have differing 'rules' of what they want, for example, why do you assume no empty strings? Also, why just strings, maybe you want to do this to other sequences? -- Richard Damon
This feels much too special purpose for a string method, and probably for anything in the standard library. I'm not sure when someone would want this. But it's an only very sightly special case of integer composition ( https://en.wikipedia.org/wiki/Composition_(combinatorics)). And related to that, of course is integer partition. If we had a composition() function (maybe in itertools... but more likely in more-itertools given the minimalist philosophy of itertools itself), this problem is simply taking slices whose lengths are drawn from the composition. On Sat, Dec 14, 2019, 9:44 AM smfiles <1668151593@qq.com> wrote:
I think it's necessary to add a segment() method to str type or string module. This method is used to split a string into m parts and return all cases. With segment(), you can avoid tedious calculation and indexing if you want to segment a string.
For example:
segment('1234', m=3) -> [('1', '2', '34'), ('1', '23', '4'), ('12', '3', '4')] segment('12345', m=3) -> [('1', '2', '345'), ('1', '23', '45'), ('1', '234', '5'), ('12', '3', '45'), ('12', '34', '5'), ('123', '4', '5')]
I hope this proposal can be adopted. _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/VYDU3J... Code of Conduct: http://python.org/psf/codeofconduct/
Here's a discussion of both a conceptually simple and a convoluted but fast version (the latter, naturally, by Tim Peters). This is for integer partitions, but compositions are simply the permutations on the full length of the list of each partition. http://code.activestate.com/recipes/218332-generator-for-integer-partitions/ Maybe Timmy will provide something more efficient than his partition code combined with itertools.permutation. I'm not sure if such shortcuts exist myself. On Sat, Dec 14, 2019, 10:41 AM David Mertz <mertz@gnosis.cx> wrote:
This feels much too special purpose for a string method, and probably for anything in the standard library. I'm not sure when someone would want this.
But it's an only very sightly special case of integer composition ( https://en.wikipedia.org/wiki/Composition_(combinatorics)). And related to that, of course is integer partition.
If we had a composition() function (maybe in itertools... but more likely in more-itertools given the minimalist philosophy of itertools itself), this problem is simply taking slices whose lengths are drawn from the composition.
On Sat, Dec 14, 2019, 9:44 AM smfiles <1668151593@qq.com> wrote:
I think it's necessary to add a segment() method to str type or string module. This method is used to split a string into m parts and return all cases. With segment(), you can avoid tedious calculation and indexing if you want to segment a string.
For example:
segment('1234', m=3) -> [('1', '2', '34'), ('1', '23', '4'), ('12', '3', '4')] segment('12345', m=3) -> [('1', '2', '345'), ('1', '23', '45'), ('1', '234', '5'), ('12', '3', '45'), ('12', '34', '5'), ('123', '4', '5')]
I hope this proposal can be adopted. _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/VYDU3J... Code of Conduct: http://python.org/psf/codeofconduct/
On Sat, Dec 14, 2019 at 3:56 PM David Mertz <mertz@gnosis.cx> wrote:
[...] compositions are simply the permutations on the full length of the list of each partition.
Using permutations of partitions would be overkill. For compositions of a given fixed length, it's much easier to compute them directly using a stars-and-bars approach. The OP is simply asking for all possible placements of `m-1` "dividers" in the input string `s`, and those divider positions can be computed directly using `itertools.combinations(range(1, len(s)), m-1)`. That gives the following one-liner: segment = lambda s, m: (tuple(s[i:j] for i, j in zip((0,)+c, c+(len(s),))) for c in itertools.combinations(range(1, len(s)), m-1)) Example output is exactly as the OP requested: >>> list(segment("12345", m=3)) [('1', '2', '345'), ('1', '23', '45'), ('1', '234', '5'), ('12', '3', '45'), ('12', '34', '5'), ('123', '4', '5')] -- Mark
Apologies for the bad formatting. Here are the relevant bits, better formatted (I hope): >>> segment = lambda s, m: (tuple(s[i:j] for i, j in zip((0,)+c, c+(len(s),))) ... for c in itertools.combinations(range(1, len(s)), m-1)) >>> >>> list(segment("12345", m=3)) [('1', '2', '345'), ('1', '23', '45'), ('1', '234', '5'), ('12', '3', '45'), ('12', '34', '5'), ('123', '4', '5')]
[David Mertz <mertz@gnosis.cx>]
Here's a discussion of both a conceptually simple and a convoluted but fast version (the latter, naturally, by Tim Peters).
Not really convoluted - it's the natural thing you'd do "by hand" to move from one partition to the next: look at the current partition, throw out the 1s, also throw out an instance of the smallest component `i` of what remains, then break the sum of what was thrown away into a (sub)partition "greedily" with largest component i-1 and the rest (if any) 1s. Although, no, that may not be obvious at first ;-) The point of the recipe is really to show all that can - with appropriate parallel data structures - be done in O(1) time.
This is for integer partitions, but compositions are simply the permutations on the full length of the list of each partition.
http://code.activestate.com/recipes/218332-generator-for-integer-partitions/
Maybe Timmy will provide something more efficient than his partition code combined with itertools.permutation. I'm not sure if such shortcuts exist myself.
As Mark said, solving this particular problem is a straightforward application of itertools.combinations.
... If we had a composition() function (maybe in itertools... but more likely in more-itertools given the minimalist philosophy of itertools itself) ....
For the ambitious, generating combinatorial objects is a huge topic of its own, but really doesn't belong in itertools. It's unfortunate that `combinations` and `permutations` and `product` live there already, because they really have little to do with "iterator algebra". They're just ordinary generators that can't exploit lazy arguments even when iterators are passed.(first thing they do is convert their iterable arguments to tuples). On the other hand, there is no standard module they _do_ belong in. I'm _starting_ to get concerned that the `math` module may become an incoherent mess over time. For example, we just added `math.comb()`, which didn't really fit anywhere else either, but in an ideal world _would_ live in the same module as `itertools.combinations()`. On the third hand, we haven't yet sprayed enough combinatorial functions all over creation to make much of a case for adding a module dedicated to that area of discrete math. But we certainly _could_ ;-)
I had not known about math.comb() and math.perm() being added in 3.8. Those kinda feel to me like "not every one line function needs to be in the standard library." But I guess wiser people than me saw a reason they are needed. On Sat, Dec 14, 2019, 3:00 PM Tim Peters <tim.peters@gmail.com> wrote:
Not really convoluted - it's the natural thing you'd do "by hand"
I wouldn't do recursion by hand? :-) On the third hand, we haven't yet sprayed enough combinatorial
functions all over creation to make much of a case for adding a module dedicated to that area of discrete math. But we certainly _could_ ;-)
This feels like something that could happily start as a 3rd party package. Maybe it exists and I don't know about it. It would be convenient to have it all bundled in a namespace.
smfiles writes:
I think it's necessary to add a segment() method to str type or string module. This method is used to split a string into m parts and return all cases. With segment(), you can avoid tedious calculation and indexing if you want to segment a string.
In addition to David's pointers to code using combinations, here's a one-liner based on more_itertools. Given combinatoric explosion, it might be better to use a genexp instead of a list as return value, and an "efficient but convoluted" algorithm might be justified if len(s) is at all large. # more_itertools is not in stdlib, but can be pip'ed from more_itertools import partitions as p def segment(s, m): return [tuple("".join(y) for y in x) for x in p(s) if len(x) == m]
On Sat, Dec 14, 2019, at 04:06, smfiles wrote:
I think it's necessary to add a segment() method to str type or string module. This method is used to split a string into m parts and return all cases. With segment(), you can avoid tedious calculation and indexing if you want to segment a string.
For example:
segment('1234', m=3) -> [('1', '2', '34'), ('1', '23', '4'), ('12', '3', '4')] segment('12345', m=3) -> [('1', '2', '345'), ('1', '23', '45'), ('1', '234', '5'), ('12', '3', '45'), ('12', '34', '5'), ('123', '4', '5')]
This is trivial to do with itertools.combinations - just take all combinations of 2 [m-1] values between 1 and 4 [len(s)-1], and use those as places to cut the string.
Hi I've looked at the example that smfiles (aka 1668151593 at tencent) provided. It looks to me like smfiles is asking for help with a homework problem. If smfiles tells me I'm wrong, I'll apologise. By the way, because the number of parts is fixed, and because the order matters, the OP's problem does not involve integer partitions. Off topic. By chance last week I was looking for Python code to generate integer partitions. I found the following particularly helpful. https://stackoverflow.com/questions/10035752/elegant-python-code-for-integer... http://jeromekelleher.net/generating-integer-partitions.html https://arxiv.org/abs/0909.2331 # Generating All Partitions: A Comparison Of Two Encodings The last item shows "the ascending composition generation algorithm is substantially more efficient than its descending composition counterpart". It so happens that for my problem it's ascending generation that I want. with best regards Jonathan
import itertools as itools def segment(it, n=1): try: len_it = len(it) it_true = it except TypeError: it_true = tuple(it) len_it = len(it_true) size, rest = divmod(len_it, n) sizes = [size] * n for i in range(rest): sizes[-i] += 1 all_sizes = frozenset(itools.permutations(sizes)) res = [] for sizes in all_sizes: elem = [] i = 0 for size in sizes: elem.append(it[i:i+size]) i += size res.append(elem) return res
Excuse me, ignore my previous post, this is the correct implementation. It works for every iterable: import itertools as itools def segment(it, n=1): if n < 1: raise ValueError(f"Number of segment must be > 0") try: len_it = len(it) it[0:0] it_true = it except TypeError: it_true = tuple(it) len_it = len(it_true) if len_it < n: raise ValueError(f"Iterable length {len_it} must be greater than number of segments {n}") size, rest = divmod(len_it, n) sizes = [size] * n orig_sizes = sizes.copy() all_sizes = [] for i in range(1, rest+1): for j in range(1, rest-i+2): sizes[-j] += i all_sizes.append(frozenset(itools.permutations(sizes))) sizes = orig_sizes.copy() if not all_sizes: all_sizes.append((sizes, )) res = [] for perm_sizes in all_sizes: for sizes in perm_sizes: elem = [] i = 0 for size in sizes: elem.append(it_true[i:i+size]) i += size res.append(elem) return res
Why not create a new custom class which has this function there? You can use that object whenever you need the segments. This is a very rare use case and doesn't make sense to implement that in str. On Wed, 25 Dec, 2019, 09:25 python-ideas--- via Python-ideas, < python-ideas@python.org> wrote:
Excuse me, ignore my previous post, this is the correct implementation. It works for every iterable:
import itertools as itools
def segment(it, n=1): if n < 1: raise ValueError(f"Number of segment must be > 0")
try: len_it = len(it) it[0:0] it_true = it except TypeError: it_true = tuple(it) len_it = len(it_true)
if len_it < n: raise ValueError(f"Iterable length {len_it} must be greater than number of segments {n}")
size, rest = divmod(len_it, n) sizes = [size] * n orig_sizes = sizes.copy()
all_sizes = []
for i in range(1, rest+1): for j in range(1, rest-i+2): sizes[-j] += i
all_sizes.append(frozenset(itools.permutations(sizes))) sizes = orig_sizes.copy()
if not all_sizes: all_sizes.append((sizes, ))
res = []
for perm_sizes in all_sizes: for sizes in perm_sizes: elem = []
i = 0
for size in sizes: elem.append(it_true[i:i+size]) i += size
res.append(elem)
return res _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/E452LQ... Code of Conduct: http://python.org/psf/codeofconduct/
See my implementation, is generic and not only for strings. It could be added to more-itertools, I suppose: https://mail.python.org/archives/list/python-ideas@python.org/message/E452LQ...
Excuse me again, I just relized that my algorithm was flawed. I just inserted in my function the brilliant algorithm of Mark Dickinson and now it works: import itertools as itools def segment(it, n=1): if n < 1: raise ValueError(f"Number of segment must be > 0, {n} found") try: len_it = len(it) it[0:0] it_true = it except TypeError: it_true = tuple(it) len_it = len(it_true) if len_it < n: err = f"Iterable length {len_it} must be greater than number of segments {n}" raise ValueError(err) return ( [it_true[i:j] for i, j in zip((0, ) + startends, startends + (len_it, ))] for startends in itools.combinations(range(1, len_it), n-1) )
participants (12)
-
Anders Hovmöller
-
David Mertz
-
Jonathan Fine
-
Mark Dickinson
-
Mark Dickinson
-
python-ideas@marco.sulla.e4ward.com
-
Random832
-
Richard Damon
-
Siddharth Prajosh
-
smfiles
-
Stephen J. Turnbull
-
Tim Peters