Identifying the start of good data in a list
Gerard flanagan
grflanagan at gmail.com
Thu Aug 28 11:47:42 EDT 2008
George Sakkis wrote:
> On Aug 27, 3:00 pm, Gerard flanagan <grflana... at gmail.com> wrote:
>
>> tkp... at hotmail.com wrote:
>>> I have a list that starts with zeros, has sporadic data, and then has
>>> good data. I define the point at which the data turns good to be the
>>> first index with a non-zero entry that is followed by at least 4
>>> consecutive non-zero data items (i.e. a week's worth of non-zero
>>> data). For example, if my list is [0, 0, 1, 0, 1, 2, 3, 4, 5, 6, 7, 8,
>>> 9], I would define the point at which data turns good to be 4 (1
>>> followed by 2, 3, 4, 5).
>>> I have a simple algorithm to identify this changepoint, but it looks
>>> crude: is there a cleaner, more elegant way to do this?
>>> flag = True
>>> i=-1
>>> j=0
>>> while flag and i < len(retHist)-1:
>>> i += 1
>>> if retHist[i] == 0:
>>> j = 0
>>> else:
>>> j += 1
>>> if j == 5:
>>> flag = False
>>> del retHist[:i-4]
>>> Thanks in advance for your help
>>> Thomas Philips
>> data = [0, 0, 1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>
>> def itergood(indata):
>> indata = iter(indata)
>> buf = []
>> while len(buf) < 4:
>> buf.append(indata.next())
>> if buf[-1] == 0:
>> buf[:] = []
>> for x in buf:
>> yield x
>> for x in indata:
>> yield x
>>
>> for d in itergood(data):
>> print d
>
> This seems the most efficient so far for arbitrary iterables. With a
> few micro-optimizations it becomes:
>
> from itertools import chain
>
> def itergood(indata, good_ones=4):
> indata = iter(indata); get_next = indata.next
> buf = []; append = buf.append
> while len(buf) < good_ones:
> next = get_next()
> if next: append(next)
> else: del buf[:]
> return chain(buf, indata)
>
> $ python -m timeit -s "x = 1000*[0, 0, 0, 1, 2, 3] + [1,2,3,4]; from
> itergood import itergood" "list(itergood(x))"
> 100 loops, best of 3: 3.09 msec per loop
>
> And with Psyco enabled:
> $ python -m timeit -s "x = 1000*[0, 0, 0, 1, 2, 3] + [1,2,3,4]; from
> itergood import itergood" "list(itergood(x))"
> 1000 loops, best of 3: 466 usec per loop
>
> George
> --
I always forget the 'del slice' method for clearing a list, thanks.
I think that returning a `chain` means that the function is not itself a
generator. And so if the indata has length less than or equal
to the threshold (good_ones), an unhandled StopIteration is raised
before the return statement is reached.
G.
More information about the Python-list
mailing list