# Identifying the start of good data in a list

George Sakkis george.sakkis at gmail.com
Wed Aug 27 22:42:36 CEST 2008

```On Aug 26, 10:39 pm, tkp... at hotmail.com wrote:
> On Aug 26, 7:23 pm, Emile van Sebille <em... at fenx.com> wrote:
>
>
>
> > tkp... at hotmail.com wrote:
> > > I have a list that starts with zeros, has sporadic data, and then has
> > > good data. I define the point at  which the data turns good to be the
> > > first index with a non-zero entry that is followed by at least 4
> > > consecutive non-zero data items (i.e. a week's worth of non-zero
> > > data). For example, if my list is [0, 0, 1, 0, 1, 2, 3, 4, 5, 6, 7, 8,
> > > 9], I would define the point at which data turns good to be 4 (1
> > > followed by 2, 3, 4, 5).
>
> > > I have a simple algorithm to identify this changepoint, but it looks
> > > crude: is there a cleaner, more elegant way to do this?
>
> >  >>> for ii,dummy in enumerate(retHist):
> > ...     if 0 not in retHist[ii:ii+5]:
> > ...         break
>
> >  >>> del retHist[:ii]
>
> > Well, to the extent short and sweet is elegant...
>
> > Emile
>
> This is just what the doctor ordered. Thank you, everyone, for the
> help.

Note that the version above (as well as most others posted) fail for
boundary cases; check out bearophile's doctest to see some of them.
Below are two more versions that pass all the doctests: the first
works only for lists and modifies them in place and the second works
for arbitrary iterables:

def clean_inplace(seq, good_ones=4):
start = 0
n = len(seq)
while start < n:
try: end = seq.index(0, start)
except ValueError: end = n
if end-start >= good_ones:
break
start = end+1
del seq[:start]

def clean_iter(iterable, good_ones=4):
from itertools import chain, islice, takewhile, dropwhile
iterator = iter(iterable)
is_zero = float(0).__eq__
while True:
# consume all zeros up to the next non-zero
iterator = dropwhile(is_zero, iterator)
# take up to `good_ones` non-zeros
good = list(islice(takewhile(bool,iterator), good_ones))
if not good: # iterator exhausted
return iterator
if len(good) == good_ones:
# found `good_ones` consecutive non-zeros;
# chain them to the rest items and return them
return chain(good, iterator)

HTH,
George

```