# Candidate for a new itertool

bearophileHUGS at lycos.com bearophileHUGS at lycos.com
Sun Mar 8 03:58:44 CET 2009

```Raymond Hettinger, maybe it can be useful to add an optional argument
flag to tell such split_on to keep the separators or not? This is the
xsplit I usually use:

def xsplit(seq, key=bool, keepkeys=True):
"""xsplit(seq, key=bool, keepkeys=True): given an iterable seq and
a predicate
key, splits the iterable where key(item) is True and yields the
parts as lists.
If keepkeys is True then the splitting items are kept at the
beginning of the
sublists (but the first sublist may miss the key item).

>>> list(xsplit([]))
[]

>>> key = lambda x: 0x80 & x
>>> l = [1,2,3,0xF0,4,5,6,0xF1,7,8,0xF2,9,10,11,12,13]
>>> list(xsplit(l, key=key))
[[1, 2, 3], [240, 4, 5, 6], [241, 7, 8], [242, 9, 10, 11, 12, 13]]

>>> l =
[0xF0,1,2,3,0xF0,4,5,6,0xF1,7,8,0xF2,9,10,11,12,13,0xF0,14,0xF1]
>>> list(xsplit(l, key=key, keepkeys=False))
[[1, 2, 3], [4, 5, 6], [7, 8], [9, 10, 11, 12, 13], [14]]

>>> s1 = "100001000101100001000000010000"
>>> ["".join(map(str, g)) for g in xsplit(s1, key=int)]
['10000', '1000', '10', '1', '10000', '10000000', '10000']

>>> from itertools import groupby # To compare against groupby
>>> s2 = "1111100011111100011100101011111"
>>> ["".join(map(str, g)) for h, g in groupby(s2, key=int)]
['11111', '000', '111111', '000', '111', '00', '1', '0', '1', '0',
'11111']
"""
group = []
for el in seq:
if key(el):
if group:
yield group
group = []
if keepkeys:
group.append(el)
else:
group.append(el)
if group:
yield group

Maybe it's better to separate or denote the separators in some way?

A possibility:
"X1X23X456X" => "X", "1", "X", "23", "X", "456", "X"

Another possibility:
"X1X23X456X" => ("", "X"), ("1", "X"), (["2", "3"], "X"), (["4", "5",
"6"], "X")

Another possibility (True == is a separator):
"X1X23X456X" => (True, "X"), (False, ["1"]), (True, "X"), (False,
["2", "3"]), (True, "X"), (False, ["4", "5", "6"]), (True, "X")

Is it useful to merge successive separators (notice two X)?

"X1X23XX456X" => (True, ["X"]), (False, ["1"]), (True, ["X"]), (False,
["2", "3"]), (True, ["X", "X"]), (False, ["4", "5", "6"]), (True,
["X"])

Opps, this is groupby :-)

Is a name like isplitter or splitter better this itertool?

Bye,
bearophile

```