Peek inside iterator (is there a PEP about this?)
Peter Otten
__peter__ at web.de
Thu Oct 2 04:40:53 EDT 2008
Luis Zarrabeitia wrote:
> On Wednesday 01 October 2008 01:14:14 pm Peter Otten wrote:
>> Luis Zarrabeitia wrote:
>> > a = iter([1,2,3,4,5]) # assume you got the iterator from a function and
>> > b = iter([1,2,3]) # these two are just examples.
>>
>> Can you provide a concrete use case?
>
> I'd like to... but I've refactored away all the examples I had, as soon as
> I realized that I didn't know which one was the shortest sequence to put
> it first.
>
> But, it went something like this:
>
> ===
> def do_stuff(tasks, params):
> params = iter(params)
> for task in tasks:
> for partial_task, param in zip(task, params):
> pass #blah blah, do stuff here.
> print "task completed"
> ===
>
> Unfortunately that's not the real example (as it is, it shows very bad
> programming), but imagine if params and/or tasks were streams beyond your
> control (a data stream and a control stream). Note that I wouldn't like a
> task or param to be wasted.
This remains a bit foggy to me. Maybe you are better off with deques than
iterators?
> I didn't like the idea of changing both the 'iter' and the 'zip' (changing
> only one of them wouldn't have worked).
>
>> > Will this iterator yield any value? Like with most iterables, a
>> > construct
>> >
>> > if iterator:
>> > # do something
>>
>> I don't think this has a chance. By adding a __len__ to some iterators R.
>> Hettinger once managed to break GvR's code. The BDFL was not amused.
>
> Ouch :D
> But, no no no. Adding a __len__ to iterators makes little sense (specially
> in my example), and adding an optional __len__ that some iterators have
> and some don't (the one that can't know their own lengths) would break too
> many things, and still, wouldn't solve the problem of knowing if there is
> a next element. A __nonzero__() that would move the iterator forward and
> cache the result, with a next() that would check the cache before
> advancing, would be closer to what I'd like.
The problem was that __len__() acts as a fallback for __nonzero__(), see
http://mail.python.org/pipermail/python-dev/2005-September/056649.html
>> > if any(iterator):
>> > # do something ... but the first true value was already consumed and
>> > # cannot be reused. "Any" cannot peek inside the iterator without
>> > # consuming the value.
>>
>> for item in iflter(bool, iterator):
>> # do something
>> break
>
> It is not, but (feel free to consider this silly) I don't like breaks. In
> this case, you would have to read until the end of the block to know that
> what you wanted was an if (if you are lucky you may figure out that you
> wanted to simulate an if test).
Ok, make it
for item in islice(ifilter(bool, iterator), 1):
# do something
then ;)
> (Well, I use breaks sometimes, but most of them are because I need to test
> if an iterator is empty or not)
>
>> Personally I think that Python's choice of EAFP over LBYL is a good one,
>> but one that cannot easily be reconciled with having peekable iterators.
>> If I were in charge I'd rather simplify the iterator protocol (scrap
>> send() and yield expressions) than making it more complex.
>
> Oh, I defend EAFP strongly. On my university LBYL is preferred, so
> whenever I teach python, I have to give strong examples of why I like
> EAFP.
>
> When the iterator is empty means that there is something wrong, I wouldn't
> think of using "if iterator:". That would be masquerading what should be
> an exception. However, if "iterator is empty" is meaningful, that case
> should go in an "else" clause, rather than "except". Consider if you need
> to find the first non-empty iterator from a list (and then sending it to
> another function - can't test for emptiness with a "for" there, or one
> could lose the first element)
You can do it
def non_empty(iterators):
for iterator in iterators:
it = iter(iterator)
try:
yield chain([it.next()], it)
except StopIteration:
pass
for it in non_empty(iterators):
return process(it)
but with iterators as they currently are in Python you better rewrite
process() to handle empty iterators and then write
for it in iterators:
try:
return process(it)
except NothingToProcess: # made up
pass
That's how I understand EAFP. Assume one normal program flow and deal with
problems as they occur.
> But that's one of the cases where one should know what is doing. Both C#
> and Java have iterators that let you know if they are finished before
> consuming the item. (I didn't mean to compare, and I like java's more than
> C#, as java's iterator also promote the 'use once' design).
I think that may be the core of your problem. Good code built on Python's
iterators will not resemble the typical Java approach.
Peter
More information about the Python-list
mailing list