[Python-ideas] Introduce collections.Reiterable
Terry Reedy
tjreedy at udel.edu
Fri Sep 20 03:28:04 CEST 2013
Answering am going answer three people in one response.
In no particular order...
On 9/19/2013 9:02 AM, Nick Coghlan wrote:
> So, my question is a genuine one. While, *in theory*, an object can
> define a stateful __iter__ method that (e.g.) only works the first
> time it is called, or returns a separate object that still stores it's
> "current position" information on the original container, I simply
> can't think of a non-pathological case where "isinstance(obj,
> Iterable) and not isinstance(obj, Iterator)" would give the wrong
> answer.
> In theory, yes, an object could obviously pass that test and still not
> be Reiterable, but I'm interested in what's true in *practice*.
On 9/19/2013 6:26 AM, Antoine Pitrou wrote:
>> A slight problem is that there is no guaranteed that a non-iterator
>> iterable is re-iterable.
> Any useful examples?
On 9/19/2013 7:37 AM, Joshua Landau wrote:> On 19 September 2013 11:28,
Terry Reedy <tjreedy at udel.edu> wrote:
>> Not everything in that category is necessarily re-iterable.
> I cannot think of a non-pathological case where it is not; if it is
> not re-iterable it should be changed to an iterator if it isn't
> already.
[I think 'pathological' is a bit 'heavy' as a synonym for 'poorly
written' ;=]
>> Or if it is serially reiterable, it may not be parallel iterable,
>> as needed for nested loops.
> What do you mean?
To back up a bit: When dev write a function, dev is responsible to
specify acceptible inputs. Neither the language or custom require dev to
test that inputs meet the specification. Looking before leaping may not
always work. I believe this to be true when inputs are iterables.
When user calls a function, user is responsible to provide arguments
that meet the specification and accept the consequences either way.
When dev specifies an 'iterable' argument, he is (should be) saying that
the argument will be iterated at most once and probably will be iterated
eventually. If user passes an iterator, user should (except possibly in
rare cases) not use it otherwise.
The first problem, which impinges on both specification and reiteration,
is than an iterable may be either finite, or not, or 'in between'
depending the hardware and user needs. I think we should take 'iterable'
to mean 'finite iterable' unless dev explicitly relaxes that by saying
'possibly infinite iterable'. (To be clear, infinite iterables are
extremely useful.)
An additional complication, including for reiteration, is that
'practically' finite may be different for time and space. For instance,
'for i in range(10000000000): pass # 10 billion iterations' would take
about 5 minute on my machine while list(range(10000000000)) would fail.
(The opposite situation is possible, but less relevant to this issue.)
Currently, if dev needs to iterate an input more than once, the
specification should say so. If the user wants to pass an iterator, the
user can instead pass list(iter). The reason to have user rather than
dev make this call is that user is in a better position than dev to know
whether iter is effectively finite.
Now to the varieties of reiteration:
A. Serial: iterate the input (typically to exhaustion) and then
reiterate (typically to exhaustion). In the typical case, the iterable
must be finite. Given finite iterator iter, list(iter) is probably more
efficient than tee(iter). But let user decide if either is sensible.
B. Parallel: iterate the input with two iterators that march along more
or less in parallel. The degenerate extreme 'for a,b in zip(iter,iter):'
would be better written 'for a in iter: b = a'. If the two iterators are
mostly in sync, then the second iterator is only really needed when they
diverge. In any case, parallel iteration is best handled internally,
invisible to the caller, with tee or two or more indexes. (Indexes into
a concrete collection are nice because it is so easy to sync one to the
other -- 'i = j' or 'j = i'.) While re does this with finite strings,
the underlying iterable for such functions does not, in general, need to
be finite.
C: Crossed: iterate different dimensions in 'crossed' fashion. "for i in
row: for j in column". For this to involve reiteration, case one is
square arrays iterated by index. But then it is not an issue, as that
will be done with a reiterable range. Case two is with multiple iterator
inputs, with cross products as one example:
def cross(itera, iterb):
for a in itera:
for b in iterb:
yield a,b
The doc should specify that itera and iterb must be independent
iterables. Note that the outermost iterator does not have to be finite.
Useful example and determinism: generator functions are callable but not
iterable. For the simple iterate once situation, one calls and passes
the resulting generator. For reiteration, the following may work:
class GenfIt:
def __init__(self, genf, *args):
self.genf = genf
self.args = args
def __iter__(self):
return self.genf(*args)
However, another hidden assumption in this thread has been that
non-iterator iterables are deterministic, in the sense that re-calling
iter(it) returns an iterator that yields the same sequence of items
before raising StopIteration. Some very useful iterator-producing
functions do not do that (ones returning iterators based on
pseudo-random or external inputs). So we need to add 'deterministic' to
the notion of 'reiterable'. And that cannot be mechanically determined.
(Other possible complications: a resource can only be accessed by one
connection at a time. Or it limits the frequency of connections.)
In summary: A. There are multiple iterable and iteration use cases. B.
We cannot really get away from documenting the requirements for iterable
inputs and keeping some responsibility for meeting them in the hands of
callers.
--
Terry Jan Reedy
More information about the Python-ideas
mailing list