Candidate for a new itertool

bearophileHUGS at lycos.com bearophileHUGS at lycos.com
Sat Mar 7 21:58:44 EST 2009


Raymond Hettinger, maybe it can be useful to add an optional argument
flag to tell such split_on to keep the separators or not? This is the
xsplit I usually use:


def xsplit(seq, key=bool, keepkeys=True):
    """xsplit(seq, key=bool, keepkeys=True): given an iterable seq and
a predicate
    key, splits the iterable where key(item) is True and yields the
parts as lists.
    If keepkeys is True then the splitting items are kept at the
beginning of the
    sublists (but the first sublist may miss the key item).

    >>> list(xsplit([]))
    []

    >>> key = lambda x: 0x80 & x
    >>> l = [1,2,3,0xF0,4,5,6,0xF1,7,8,0xF2,9,10,11,12,13]
    >>> list(xsplit(l, key=key))
    [[1, 2, 3], [240, 4, 5, 6], [241, 7, 8], [242, 9, 10, 11, 12, 13]]

    >>> l =
[0xF0,1,2,3,0xF0,4,5,6,0xF1,7,8,0xF2,9,10,11,12,13,0xF0,14,0xF1]
    >>> list(xsplit(l, key=key, keepkeys=False))
    [[1, 2, 3], [4, 5, 6], [7, 8], [9, 10, 11, 12, 13], [14]]

    >>> s1 = "100001000101100001000000010000"
    >>> ["".join(map(str, g)) for g in xsplit(s1, key=int)]
    ['10000', '1000', '10', '1', '10000', '10000000', '10000']

    >>> from itertools import groupby # To compare against groupby
    >>> s2 = "1111100011111100011100101011111"
    >>> ["".join(map(str, g)) for h, g in groupby(s2, key=int)]
    ['11111', '000', '111111', '000', '111', '00', '1', '0', '1', '0',
'11111']
    """
    group = []
    for el in seq:
        if key(el):
            if group:
                yield group
            group = []
            if keepkeys:
                group.append(el)
        else:
            group.append(el)
    if group:
        yield group


Maybe it's better to separate or denote the separators in some way?

A possibility:
"X1X23X456X" => "X", "1", "X", "23", "X", "456", "X"

Another possibility:
"X1X23X456X" => ("", "X"), ("1", "X"), (["2", "3"], "X"), (["4", "5",
"6"], "X")

Another possibility (True == is a separator):
"X1X23X456X" => (True, "X"), (False, ["1"]), (True, "X"), (False,
["2", "3"]), (True, "X"), (False, ["4", "5", "6"]), (True, "X")

Is it useful to merge successive separators (notice two X)?

"X1X23XX456X" => (True, ["X"]), (False, ["1"]), (True, ["X"]), (False,
["2", "3"]), (True, ["X", "X"]), (False, ["4", "5", "6"]), (True,
["X"])

Opps, this is groupby :-)

Is a name like isplitter or splitter better this itertool?

Bye,
bearophile



More information about the Python-list mailing list