In need of a binge-and-purge idiom

Alex Martelli aleax at aleax.it
Mon Mar 24 06:57:44 EST 2003


Magnus Lie Hetland wrote:

Ah, I recognize the outline of our joint contribution to the
printed Cookbook (recipe 4.8...).

> I've noticed that I use the following in several contexts:
   [fixing as per followups]
>   chunk = []
>   for element in iterable:
>       if isSeparator(element) and chunk:
>           doSomething(chunk)
>           chunk = []
        else: chunk.append(element)
>   if chunk:
>       doSomething(chunk)
>       chunk = []

First refactoring that comes to mind is:

    def maydosomething(chunk):
       if chunk:
          doSomething(chunk)
          chunk[:] = []

    chunk = []
    for element in iterable:
        if isSeparator(element): maydosomething(chunk)
        else: chunk.append(element)
    maydosomething(chunk)

but this wouldn't work for the specific use case you require:

> If the iterable above is a file, isSeparator(element) is simply
> defined as not element.strip() and doSomething(chunk) is
> yield(''.join(chunk)) you have a paragraph splitter. I've been using

i.e., factoring out a *yield* to maydosomething would NOT work.
So I'll focus on the specific case of yield in the following,
assuming a "munge" function such as
def munge(chunk): return ''.join(chunk)
is also passed as an argument.


>   for element in iterable + separator:
>       ...
> 
> but that isn't possible, of course. (It could be possible with some
> fiddling with itertools etc., I guess.)

Indeed, there ain't much "fiddling" needed at all -- you just
DO need to know SOME acceptable separator, however:

import itertools

def chunkitup(iterable, isSeparator, aSeparator, munge=''.join):

    # a sanity check never hurts...
    assert isSeparator(aSeparator)

    chunk = []
    for element in itertools.chain(iterable, [aSeparator]):
        if isSeparator(element):
            yield munge(chunk)
            chunk = []
        else: chunk.append(element)        

> If it were possible to check whether the iterator extracted from the
> iterable was at an end, that could help too -- but I see no elegant
> way of doing it.

Elegance is in the eye of the beholder, but...:

class iter_with_lookahead:
    def __init__(self, iterable):
        self.it = iter(iterable)
        self.done = False
        self.step()
    def __iter__(self): 
        return self
    def step(self):
        try:
            self.lookahead = self.it.next()
        except StopIteration:
            self.done = True
    def next(self):
        if self.done: raise StopIteration
        result = self.lookahead
        self.step()
        return result

...I've had occasion to use variants of this in order to be able
to peek ahead, check if an iterator was done, or in small further
variants to give an iterator one level of "pushback", etc, etc.
So, if you have a wrapper such as this one around somewhere, you
might choose to reuse it (though it probably wouldn't be worth
developing for the sole purpose of this use!-):

def chunkitup1(iterable, isSeparator, munge=''.join):
    chunk = []
    it = iter_with_lookahead(iterable)
    for element in it:
        issep = isSeparator(element)
        if not issep:
            chunk.append(element)
        if issep or it.done: 
            yield munge(chunk)
            chunk = []
  
> I can't really see any good way of using the while/break idiom either,

Well, you COULD use a different wrapper class to obtain code such as:

def chunkitup2(iterable, isSeparator, munge=''.join):
    wit = wild_thing(iterable, isSeparator)
    while wit:
        if wit.isSeparator() and wit.hasChunk():
            yield munge(wit.getChunk())

but the wrapper wouldn't be all that nice under the covers AND it
would in practice have to embody a bit too much of the control
logic and bury it in a non-obvious place -- so I wouldn't pursue
this tack, myself.


Alex





More information about the Python-list mailing list