iterable: next() and __iter__() -- and __reset()

Hello, (1) I do not understand an iterable type's __iter__() method to be compulsary. Actually, each time I have defined one, I had to write: def __iter__(self): return self So, I guess that if python does not find __iter__(), but the object defines next(), then by default the said object could be used as its own iterator. This is what I understand by "iterable" and next() is the required method for it. Or even better: only if the object does not define next(), then python falls back to looking for __iter__(). Is there any obstacle for this I cannot see? Side-question: In which cases is it necessary to define the iterator as a separate object? (2) But: for any reason next() is not spelled as a "magic" method. If this method becomes the distinctive method of iterables, then it should be called __next__() for consistency. Side-question: Why is it called next(), as it is a magic method for iterators already? (3) What I miss actually for iterables (which are their own iterator) is a kind of __reset__(). In some cases, it is only needed to allow a new iteration from start. But it may even be needed to set some startup data the first time. __reset__() would thus be called once before the first call to next(). (Sure, "reset" may not be the best term. Maybe "begin" or "startup"? The sense is: "Prepare to yield the first item!") In absence of such a startup mechanism, I end up using __call__ instead: Example: class Powers(object): def __init__(self, exponent): self.exponent = exponent def next(self): n = self.n + 1 if (self.max is not None) and (n > self.max): raise StopIteration self.n = n return n*n def __call__(self, min=1,max=None): self.n = min-1 self.max = max return self #.__iter__() def __iter__(self): return self tripleCubes = [] cubes = Powers(3) for sq in cubes(7,17): if sq%3 == 0: tripleCubes.append(sq) print tripleCubes # ==> [81, 144, 225] __iter__() could be used directly if it would allow "free" args in addition to self (in this case: def __iter__(self, min=0,max=None). This beeing impossible, an aditional method seems to be needed. To sum up, I would enjoy beeing able to write Powers using a scheme like: class Powers(object): def __init__(self, exponent): self.exponent = exponent def __reset__(self, min=1,max=None): self.n = min-1 self.max = max def __next__(self): n = self.n + 1 if (self.max is not None) and (n > self.max): raise StopIteration self.n = n return n*n This matches the overall idea of iterable for me. Denis -- ________________________________ la vita e estrany spir.wikidot.com

On 4 Mar 2010, at 11:35 , spir wrote:
(3) What I miss actually for iterables (which are their own iterator) is a kind of __reset__(). In some cases, it is only needed to allow a new iteration from start. But it may even be needed to set some startup data the first time. __reset__() would thus be called once before the first call to next(). (Sure, "reset" may not be the best term. Maybe "begin" or "startup"? The sense is: "Prepare to yield the first item!")
The use case you propose (and demonstrate in your example) shows that you're creating an iterating view over your container. Which doesn't seem the role of __iter__ at all as far as I understand, so you should indeed use either __call__ or a separate method call. Slice, for instance. In fact, in your case I think supporting slicing/islicing would make far more sense than the solution you've elected): tripleCubes = [] cubes = Powers(3) for sq in cubes[6, 17]: if sq%3 == 0: tripleCubes.append(sq) You could also compose your iterable with existing tools such as those in ``itertools``, or create your own composable iterable transformers/manipulators (I strongly recommend David M. Beazley's presentations on generators for that [1][2]. With those, removing __call__ from your Power[3] class and setting `self.n = 0` (and `self.max = None`) in the constructor: cubes = Powers(3) tripleCubes = filter(lambda sq: sq%3 == 0, (islice(cubes, 6, 17))) Or (to avoid the pretty ugly lambda and use a listcomp): cubes = Powers(3) tripleCubes = [sq for sq in islice(cubes, 6, 16) if sq%3 == 0] [1] Generator Tricks for Systems Programmers http://www.dabeaz.com/generators/ [2] A Curious Course on Coroutines and Concurrency http://www.dabeaz.com/coroutines/ [3] I hope and believe you wouldn't actually write such code outside of an example as there are much better ways to achieve the same thing in Python

On Thu, 4 Mar 2010 12:07:48 +0100 Masklinn <masklinn@masklinn.net> wrote: Thanks for your reply: I find it helpful; will have a look at the pointed references in a short while.
In fact, in your case I think supporting slicing/islicing would make far more sense than the solution you've elected)
Actually my Powers type should be "sliceable".
tripleCubes = [sq for sq in islice(cubes, 6, 16) if sq%3 == 0]
Right, but this pattern only replaces cubes(6, 16) by islice(cubes, 6, 16): tripleCubes = [sq for sq in cubes(6, 16) if sq%3 == 0] Or do I overlook a relevant point? I guess the proper (and pythonic?) solution would be to implement __getitem__ so as to be able to write: tripleCubes = [sq for sq in cubes[6:17] if sq%3 == 0] Does the following match your idea? def __getitem__(self, ranj): self.n , self.max = ranj.start-1 , ranj.stop-1 return self Denis -- ________________________________ la vita e estrany spir.wikidot.com

On 4 Mar 2010, at 12:53 , spir wrote:
Right, but this pattern only replaces cubes(6, 16) by islice(cubes, 6, 16)
Yes, the point is that you don't *need* cubes to be callable to do what you want. Because Python already provides the tools to do it in its stdlib. So you have no reason to concern yourself with that.

On Thu, Mar 4, 2010 at 2:35 AM, spir <denis.spir@gmail.com> wrote:
Not that I can think of; Python just happened to make a different design decision than you, one that simplifies (and likely speeds up) the interpreter at the minor cost of having to write a trivial "return self" __iter__() in some cases: the interpreter can just blindly call __iter__() as opposed to (as you suggest) doing more sophisticated checking for a next() method and only then falling back to __iter__().
Side-question: In which cases is it necessary to define the iterator as a separate object?
Whenever you want to use multiple iterators over the same object simultaneously (you can usually equally use a generator instead of an object, but the iterator is separate from the iterate-ee in either case). For example, if lists were their own iterators, the following code: for item1 in some_list: for item2 in some_list: print item1, item2 rather than outputting the cross-product of the items in the list, would presumably instead output the first element paired with every other element in the list, which is not what was intended.
(2) But: for any reason next() is not spelled as a "magic" method. If this method becomes the distinctive method of iterables, then it should be called __next__() for consistency.
Guido's time machine strikes again! This is fixed in Python 3.x: http://www.python.org/dev/peps/pep-3114/
(3) What I miss actually for iterables (which are their own iterator) is a kind of __reset__(). In some cases, it is only needed to allow a new iteration from start. But it may even be needed to set some startup data the first time. __reset__() would thus be called once before the first call to next().
(a) __reset__() shouldn't be part of the iterator protocol since it's not applicable for all iterators, only some. (b) You can just write a generator and put the setup code before the initial "yield" to much the same effect. Cheers, Chris -- Maven & would-be designer of languages http://blog.rebertia.com

spir wrote:
Almost all containers should use a separate object for their iterators. Note that this is already the case for all of Python's standard container types. The reason relates to the __reset__ suggestion you describe later in your message: How do I reset an iterator over a list? Easy, just call iter() again - it will give me a fresh iterator that starts at the beginning without affecting the list or my original iterator. By producing a fresh object for each invocation of __iter__ the state of the iterators is decoupled from the state of the underlying object which is generally a good thing from a program design point of view. (See Eric's suggestion regarding the use of generators as __iter__ methods to easily achieve this behaviour) Objects with significant state that are also their own iterators are actually quite rare. File objects certainly qualify (since they base their iteration off the file object's file pointer), but I can't think of any others off the top of my head. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------

On Thu, Mar 4, 2010 at 05:23, Nick Coghlan <ncoghlan@gmail.com> wrote:
There is also the issue of backwards-compatibility when iterators were introduced. Just because someone decided to have a method named next() when iterators were introduced does not mean they intended for it to be viewed as a sequence. Requiring an iterable to define __iter__() took care of the ambiguity. -Brett

Am 04.03.2010 11:35, schrieb spir:
I would say that in most cases it makes sense to have the iterator be a separate object. However, when writing an __iter__() in Python, you almost always can make it a generator, which already does this for you -- each call to __iter__() will return a new generator.
Because it is supposed to be called directly. __iter__() isn't. (This changes with Python 3, where you have next() as a builtin.) As such, this is the same question as "why is it called readline(), not __readline__()." readline(), just like next(), is a method defined by a protocol (file vs iterator). Georg

Le Thu, 4 Mar 2010 11:35:53 +0100, spir <denis.spir@gmail.com> a écrit :
Explicit is better than implicit, though. In many non-trivial cases, the iterator will have to be a separate object anyway. Also, please note you can implement __iter__ as a generator, which makes things very easy for the simple cases:
(3) What I miss actually for iterables (which are their own iterator) is a kind of __reset__().
No, really, you don't want this, unless you like PHP. As others said, calling iter() again is the well-defined generic way to "reset" your iterable. Then, particular cases can warrant specific APIs, such as file.seek(). Regards Antoine.

On 4 Mar 2010, at 11:35 , spir wrote:
(3) What I miss actually for iterables (which are their own iterator) is a kind of __reset__(). In some cases, it is only needed to allow a new iteration from start. But it may even be needed to set some startup data the first time. __reset__() would thus be called once before the first call to next(). (Sure, "reset" may not be the best term. Maybe "begin" or "startup"? The sense is: "Prepare to yield the first item!")
The use case you propose (and demonstrate in your example) shows that you're creating an iterating view over your container. Which doesn't seem the role of __iter__ at all as far as I understand, so you should indeed use either __call__ or a separate method call. Slice, for instance. In fact, in your case I think supporting slicing/islicing would make far more sense than the solution you've elected): tripleCubes = [] cubes = Powers(3) for sq in cubes[6, 17]: if sq%3 == 0: tripleCubes.append(sq) You could also compose your iterable with existing tools such as those in ``itertools``, or create your own composable iterable transformers/manipulators (I strongly recommend David M. Beazley's presentations on generators for that [1][2]. With those, removing __call__ from your Power[3] class and setting `self.n = 0` (and `self.max = None`) in the constructor: cubes = Powers(3) tripleCubes = filter(lambda sq: sq%3 == 0, (islice(cubes, 6, 17))) Or (to avoid the pretty ugly lambda and use a listcomp): cubes = Powers(3) tripleCubes = [sq for sq in islice(cubes, 6, 16) if sq%3 == 0] [1] Generator Tricks for Systems Programmers http://www.dabeaz.com/generators/ [2] A Curious Course on Coroutines and Concurrency http://www.dabeaz.com/coroutines/ [3] I hope and believe you wouldn't actually write such code outside of an example as there are much better ways to achieve the same thing in Python

On Thu, 4 Mar 2010 12:07:48 +0100 Masklinn <masklinn@masklinn.net> wrote: Thanks for your reply: I find it helpful; will have a look at the pointed references in a short while.
In fact, in your case I think supporting slicing/islicing would make far more sense than the solution you've elected)
Actually my Powers type should be "sliceable".
tripleCubes = [sq for sq in islice(cubes, 6, 16) if sq%3 == 0]
Right, but this pattern only replaces cubes(6, 16) by islice(cubes, 6, 16): tripleCubes = [sq for sq in cubes(6, 16) if sq%3 == 0] Or do I overlook a relevant point? I guess the proper (and pythonic?) solution would be to implement __getitem__ so as to be able to write: tripleCubes = [sq for sq in cubes[6:17] if sq%3 == 0] Does the following match your idea? def __getitem__(self, ranj): self.n , self.max = ranj.start-1 , ranj.stop-1 return self Denis -- ________________________________ la vita e estrany spir.wikidot.com

On 4 Mar 2010, at 12:53 , spir wrote:
Right, but this pattern only replaces cubes(6, 16) by islice(cubes, 6, 16)
Yes, the point is that you don't *need* cubes to be callable to do what you want. Because Python already provides the tools to do it in its stdlib. So you have no reason to concern yourself with that.

On Thu, Mar 4, 2010 at 2:35 AM, spir <denis.spir@gmail.com> wrote:
Not that I can think of; Python just happened to make a different design decision than you, one that simplifies (and likely speeds up) the interpreter at the minor cost of having to write a trivial "return self" __iter__() in some cases: the interpreter can just blindly call __iter__() as opposed to (as you suggest) doing more sophisticated checking for a next() method and only then falling back to __iter__().
Side-question: In which cases is it necessary to define the iterator as a separate object?
Whenever you want to use multiple iterators over the same object simultaneously (you can usually equally use a generator instead of an object, but the iterator is separate from the iterate-ee in either case). For example, if lists were their own iterators, the following code: for item1 in some_list: for item2 in some_list: print item1, item2 rather than outputting the cross-product of the items in the list, would presumably instead output the first element paired with every other element in the list, which is not what was intended.
(2) But: for any reason next() is not spelled as a "magic" method. If this method becomes the distinctive method of iterables, then it should be called __next__() for consistency.
Guido's time machine strikes again! This is fixed in Python 3.x: http://www.python.org/dev/peps/pep-3114/
(3) What I miss actually for iterables (which are their own iterator) is a kind of __reset__(). In some cases, it is only needed to allow a new iteration from start. But it may even be needed to set some startup data the first time. __reset__() would thus be called once before the first call to next().
(a) __reset__() shouldn't be part of the iterator protocol since it's not applicable for all iterators, only some. (b) You can just write a generator and put the setup code before the initial "yield" to much the same effect. Cheers, Chris -- Maven & would-be designer of languages http://blog.rebertia.com

spir wrote:
Almost all containers should use a separate object for their iterators. Note that this is already the case for all of Python's standard container types. The reason relates to the __reset__ suggestion you describe later in your message: How do I reset an iterator over a list? Easy, just call iter() again - it will give me a fresh iterator that starts at the beginning without affecting the list or my original iterator. By producing a fresh object for each invocation of __iter__ the state of the iterators is decoupled from the state of the underlying object which is generally a good thing from a program design point of view. (See Eric's suggestion regarding the use of generators as __iter__ methods to easily achieve this behaviour) Objects with significant state that are also their own iterators are actually quite rare. File objects certainly qualify (since they base their iteration off the file object's file pointer), but I can't think of any others off the top of my head. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------

On Thu, Mar 4, 2010 at 05:23, Nick Coghlan <ncoghlan@gmail.com> wrote:
There is also the issue of backwards-compatibility when iterators were introduced. Just because someone decided to have a method named next() when iterators were introduced does not mean they intended for it to be viewed as a sequence. Requiring an iterable to define __iter__() took care of the ambiguity. -Brett

Am 04.03.2010 11:35, schrieb spir:
I would say that in most cases it makes sense to have the iterator be a separate object. However, when writing an __iter__() in Python, you almost always can make it a generator, which already does this for you -- each call to __iter__() will return a new generator.
Because it is supposed to be called directly. __iter__() isn't. (This changes with Python 3, where you have next() as a builtin.) As such, this is the same question as "why is it called readline(), not __readline__()." readline(), just like next(), is a method defined by a protocol (file vs iterator). Georg

Le Thu, 4 Mar 2010 11:35:53 +0100, spir <denis.spir@gmail.com> a écrit :
Explicit is better than implicit, though. In many non-trivial cases, the iterator will have to be a separate object anyway. Also, please note you can implement __iter__ as a generator, which makes things very easy for the simple cases:
(3) What I miss actually for iterables (which are their own iterator) is a kind of __reset__().
No, really, you don't want this, unless you like PHP. As others said, calling iter() again is the well-defined generic way to "reset" your iterable. Then, particular cases can warrant specific APIs, such as file.seek(). Regards Antoine.
participants (11)
-
Andrew Bennetts
-
Antoine Pitrou
-
Brett Cannon
-
Chris Rebert
-
Georg Brandl
-
Greg Ewing
-
Masklinn
-
Mathias Panzenböck
-
Nick Coghlan
-
spir
-
Éric Araujo