Identifying the start of good data in a list

George Sakkis george.sakkis at gmail.com
Wed Aug 27 23:09:58 CEST 2008


On Aug 27, 3:00 pm, Gerard flanagan <grflana... at gmail.com> wrote:

> tkp... at hotmail.com wrote:
> > I have a list that starts with zeros, has sporadic data, and then has
> > good data. I define the point at  which the data turns good to be the
> > first index with a non-zero entry that is followed by at least 4
> > consecutive non-zero data items (i.e. a week's worth of non-zero
> > data). For example, if my list is [0, 0, 1, 0, 1, 2, 3, 4, 5, 6, 7, 8,
> > 9], I would define the point at which data turns good to be 4 (1
> > followed by 2, 3, 4, 5).
>
> > I have a simple algorithm to identify this changepoint, but it looks
> > crude: is there a cleaner, more elegant way to do this?
>
> >     flag = True
> >     i=-1
> >     j=0
> >     while flag and i < len(retHist)-1:
> >         i += 1
> >         if retHist[i] == 0:
> >             j = 0
> >         else:
> >             j += 1
> >             if j == 5:
> >                 flag = False
>
> >     del retHist[:i-4]
>
> > Thanks in advance for your help
>
> > Thomas Philips
>
> data = [0, 0, 1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>
> def itergood(indata):
>      indata = iter(indata)
>      buf = []
>      while len(buf) < 4:
>          buf.append(indata.next())
>          if buf[-1] == 0:
>              buf[:] = []
>      for x in buf:
>          yield x
>      for x in indata:
>          yield x
>
> for d in itergood(data):
>      print d

This seems the most efficient so far for arbitrary iterables. With a
few micro-optimizations it becomes:

from itertools import chain

def itergood(indata, good_ones=4):
    indata = iter(indata); get_next = indata.next
    buf = []; append = buf.append
    while len(buf) < good_ones:
        next = get_next()
        if next: append(next)
        else: del buf[:]
    return chain(buf, indata)

$ python -m timeit -s "x = 1000*[0, 0, 0, 1, 2, 3] + [1,2,3,4]; from
itergood import itergood" "list(itergood(x))"
100 loops, best of 3: 3.09 msec per loop

And with Psyco enabled:
$ python -m timeit -s "x = 1000*[0, 0, 0, 1, 2, 3] + [1,2,3,4]; from
itergood import itergood" "list(itergood(x))"
1000 loops, best of 3: 466 usec per loop

George



More information about the Python-list mailing list