iterable: next() and __iter__() -- and __reset()
Hello, (1) I do not understand an iterable type's __iter__() method to be compulsary. Actually, each time I have defined one, I had to write: def __iter__(self): return self So, I guess that if python does not find __iter__(), but the object defines next(), then by default the said object could be used as its own iterator. This is what I understand by "iterable" and next() is the required method for it. Or even better: only if the object does not define next(), then python falls back to looking for __iter__(). Is there any obstacle for this I cannot see? Side-question: In which cases is it necessary to define the iterator as a separate object? (2) But: for any reason next() is not spelled as a "magic" method. If this method becomes the distinctive method of iterables, then it should be called __next__() for consistency. Side-question: Why is it called next(), as it is a magic method for iterators already? (3) What I miss actually for iterables (which are their own iterator) is a kind of __reset__(). In some cases, it is only needed to allow a new iteration from start. But it may even be needed to set some startup data the first time. __reset__() would thus be called once before the first call to next(). (Sure, "reset" may not be the best term. Maybe "begin" or "startup"? The sense is: "Prepare to yield the first item!") In absence of such a startup mechanism, I end up using __call__ instead: Example: class Powers(object): def __init__(self, exponent): self.exponent = exponent def next(self): n = self.n + 1 if (self.max is not None) and (n > self.max): raise StopIteration self.n = n return n*n def __call__(self, min=1,max=None): self.n = min-1 self.max = max return self #.__iter__() def __iter__(self): return self tripleCubes = [] cubes = Powers(3) for sq in cubes(7,17): if sq%3 == 0: tripleCubes.append(sq) print tripleCubes # ==> [81, 144, 225] __iter__() could be used directly if it would allow "free" args in addition to self (in this case: def __iter__(self, min=0,max=None). This beeing impossible, an aditional method seems to be needed. To sum up, I would enjoy beeing able to write Powers using a scheme like: class Powers(object): def __init__(self, exponent): self.exponent = exponent def __reset__(self, min=1,max=None): self.n = min-1 self.max = max def __next__(self): n = self.n + 1 if (self.max is not None) and (n > self.max): raise StopIteration self.n = n return n*n This matches the overall idea of iterable for me. Denis -- ________________________________ la vita e estrany spir.wikidot.com
On 4 Mar 2010, at 11:35 , spir wrote:
(3) What I miss actually for iterables (which are their own iterator) is a kind of __reset__(). In some cases, it is only needed to allow a new iteration from start. But it may even be needed to set some startup data the first time. __reset__() would thus be called once before the first call to next(). (Sure, "reset" may not be the best term. Maybe "begin" or "startup"? The sense is: "Prepare to yield the first item!")
The use case you propose (and demonstrate in your example) shows that you're creating an iterating view over your container. Which doesn't seem the role of __iter__ at all as far as I understand, so you should indeed use either __call__ or a separate method call. Slice, for instance. In fact, in your case I think supporting slicing/islicing would make far more sense than the solution you've elected): tripleCubes = [] cubes = Powers(3) for sq in cubes[6, 17]: if sq%3 == 0: tripleCubes.append(sq) You could also compose your iterable with existing tools such as those in ``itertools``, or create your own composable iterable transformers/manipulators (I strongly recommend David M. Beazley's presentations on generators for that [1][2]. With those, removing __call__ from your Power[3] class and setting `self.n = 0` (and `self.max = None`) in the constructor: cubes = Powers(3) tripleCubes = filter(lambda sq: sq%3 == 0, (islice(cubes, 6, 17))) Or (to avoid the pretty ugly lambda and use a listcomp): cubes = Powers(3) tripleCubes = [sq for sq in islice(cubes, 6, 16) if sq%3 == 0] [1] Generator Tricks for Systems Programmers http://www.dabeaz.com/generators/ [2] A Curious Course on Coroutines and Concurrency http://www.dabeaz.com/coroutines/ [3] I hope and believe you wouldn't actually write such code outside of an example as there are much better ways to achieve the same thing in Python
On Thu, 4 Mar 2010 12:07:48 +0100 Masklinn <masklinn@masklinn.net> wrote: Thanks for your reply: I find it helpful; will have a look at the pointed references in a short while.
In fact, in your case I think supporting slicing/islicing would make far more sense than the solution you've elected)
Actually my Powers type should be "sliceable".
tripleCubes = [sq for sq in islice(cubes, 6, 16) if sq%3 == 0]
Right, but this pattern only replaces cubes(6, 16) by islice(cubes, 6, 16): tripleCubes = [sq for sq in cubes(6, 16) if sq%3 == 0] Or do I overlook a relevant point? I guess the proper (and pythonic?) solution would be to implement __getitem__ so as to be able to write: tripleCubes = [sq for sq in cubes[6:17] if sq%3 == 0] Does the following match your idea? def __getitem__(self, ranj): self.n , self.max = ranj.start-1 , ranj.stop-1 return self Denis -- ________________________________ la vita e estrany spir.wikidot.com
On 4 Mar 2010, at 12:53 , spir wrote:
Right, but this pattern only replaces cubes(6, 16) by islice(cubes, 6, 16)
Yes, the point is that you don't *need* cubes to be callable to do what you want. Because Python already provides the tools to do it in its stdlib. So you have no reason to concern yourself with that.
Does the following match your idea? def __getitem__(self, ranj): self.n , self.max = ranj.start-1 , ranj.stop-1 return self I'd return a new item with the relevant attributes set, not an modified set (generally, I'm not fond of mutable object but that might just be me)
On Thu, Mar 4, 2010 at 2:35 AM, spir <denis.spir@gmail.com> wrote:
Hello,
(1) I do not understand an iterable type's __iter__() method to be compulsary. Actually, each time I have defined one, I had to write: def __iter__(self): return self So, I guess that if python does not find __iter__(), but the object defines next(), then by default the said object could be used as its own iterator. This is what I understand by "iterable" and next() is the required method for it. Or even better: only if the object does not define next(), then python falls back to looking for __iter__(). Is there any obstacle for this I cannot see?
Not that I can think of; Python just happened to make a different design decision than you, one that simplifies (and likely speeds up) the interpreter at the minor cost of having to write a trivial "return self" __iter__() in some cases: the interpreter can just blindly call __iter__() as opposed to (as you suggest) doing more sophisticated checking for a next() method and only then falling back to __iter__().
Side-question: In which cases is it necessary to define the iterator as a separate object?
Whenever you want to use multiple iterators over the same object simultaneously (you can usually equally use a generator instead of an object, but the iterator is separate from the iterate-ee in either case). For example, if lists were their own iterators, the following code: for item1 in some_list: for item2 in some_list: print item1, item2 rather than outputting the cross-product of the items in the list, would presumably instead output the first element paired with every other element in the list, which is not what was intended.
(2) But: for any reason next() is not spelled as a "magic" method. If this method becomes the distinctive method of iterables, then it should be called __next__() for consistency.
Guido's time machine strikes again! This is fixed in Python 3.x: http://www.python.org/dev/peps/pep-3114/
(3) What I miss actually for iterables (which are their own iterator) is a kind of __reset__(). In some cases, it is only needed to allow a new iteration from start. But it may even be needed to set some startup data the first time. __reset__() would thus be called once before the first call to next().
(a) __reset__() shouldn't be part of the iterator protocol since it's not applicable for all iterators, only some. (b) You can just write a generator and put the setup code before the initial "yield" to much the same effect. Cheers, Chris -- Maven & would-be designer of languages http://blog.rebertia.com
Hello list Some complementary information to Chris Rebert’s message.
(1) I do not understand an iterable type's __iter__() method to be compulsary. [...] The iterable and the iterator protocols are two different things. Every iterator is iterable, but not every iterable object is an iterator (see <http://docs.python.org/library/collections>). There are situations where it makes sense to use a different class as iterator (can’t find examples right now, hope other people will chime in), and a lot of situations where it’s fine to implement both protocols on the same class.
Actually, each time I have defined one, I had to write: def __iter__(self): return self Instead of returning self in __iter__ and then defining a function in next, i.e. returning values and raising StopIteration, you can define __iter__ as a generator (i.e. yield values instead of returning them) and not need to write next. Helpful documentation here: <http://docs.python.org/reference/datamodel#object.__iter__> and <http://docs.python.org/library/stdtypes#typeiter>
(3) What I miss actually for iterables (which are their own iterator) is a kind of __reset__(). In some cases, it is only needed to allow a new iteration from start.[...] Didn’t really understand your use case here, but perhaps the extension of the yield statement done in version 2.5 can help you here: <http://docs.python.org/reference/expressions#yieldexpr> <http://www.python.org/dev/peps/pep-0342/>
Hope this helps. Regards
spir wrote:
Hello,
(1) I do not understand an iterable type's __iter__() method to be compulsary. Actually, each time I have defined one, I had to write: def __iter__(self): return self So, I guess that if python does not find __iter__(), but the object defines next(), then by default the said object could be used as its own iterator. This is what I understand by "iterable" and next() is the required method for it. Or even better: only if the object does not define next(), then python falls back to looking for __iter__(). Is there any obstacle for this I cannot see? Side-question: In which cases is it necessary to define the iterator as a separate object?
Almost all containers should use a separate object for their iterators. Note that this is already the case for all of Python's standard container types. The reason relates to the __reset__ suggestion you describe later in your message: How do I reset an iterator over a list? Easy, just call iter() again - it will give me a fresh iterator that starts at the beginning without affecting the list or my original iterator. By producing a fresh object for each invocation of __iter__ the state of the iterators is decoupled from the state of the underlying object which is generally a good thing from a program design point of view. (See Eric's suggestion regarding the use of generators as __iter__ methods to easily achieve this behaviour) Objects with significant state that are also their own iterators are actually quite rare. File objects certainly qualify (since they base their iteration off the file object's file pointer), but I can't think of any others off the top of my head. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
On Thu, Mar 4, 2010 at 05:23, Nick Coghlan <ncoghlan@gmail.com> wrote:
spir wrote:
Hello,
(1) I do not understand an iterable type's __iter__() method to be compulsary. Actually, each time I have defined one, I had to write: def __iter__(self): return self So, I guess that if python does not find __iter__(), but the object defines next(), then by default the said object could be used as its own iterator. This is what I understand by "iterable" and next() is the required method for it. Or even better: only if the object does not define next(), then python falls back to looking for __iter__(). Is there any obstacle for this I cannot see? Side-question: In which cases is it necessary to define the iterator as a separate object?
Almost all containers should use a separate object for their iterators. Note that this is already the case for all of Python's standard container types.
There is also the issue of backwards-compatibility when iterators were introduced. Just because someone decided to have a method named next() when iterators were introduced does not mean they intended for it to be viewed as a sequence. Requiring an iterable to define __iter__() took care of the ambiguity. -Brett
The reason relates to the __reset__ suggestion you describe later in your message: How do I reset an iterator over a list? Easy, just call iter() again - it will give me a fresh iterator that starts at the beginning without affecting the list or my original iterator. By producing a fresh object for each invocation of __iter__ the state of the iterators is decoupled from the state of the underlying object which is generally a good thing from a program design point of view.
(See Eric's suggestion regarding the use of generators as __iter__ methods to easily achieve this behaviour)
Objects with significant state that are also their own iterators are actually quite rare. File objects certainly qualify (since they base their iteration off the file object's file pointer), but I can't think of any others off the top of my head.
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia --------------------------------------------------------------- _______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas
Am 04.03.2010 11:35, schrieb spir:
Hello,
(1) I do not understand an iterable type's __iter__() method to be compulsary. Actually, each time I have defined one, I had to write: def __iter__(self): return self So, I guess that if python does not find __iter__(), but the object defines next(), then by default the said object could be used as its own iterator. This is what I understand by "iterable" and next() is the required method for it. Or even better: only if the object does not define next(), then python falls back to looking for __iter__(). Is there any obstacle for this I cannot see? Side-question: In which cases is it necessary to define the iterator as a separate object?
I would say that in most cases it makes sense to have the iterator be a separate object. However, when writing an __iter__() in Python, you almost always can make it a generator, which already does this for you -- each call to __iter__() will return a new generator.
(2) But: for any reason next() is not spelled as a "magic" method. If this method becomes the distinctive method of iterables, then it should be called __next__() for consistency. Side-question: Why is it called next(), as it is a magic method for iterators already?
Because it is supposed to be called directly. __iter__() isn't. (This changes with Python 3, where you have next() as a builtin.) As such, this is the same question as "why is it called readline(), not __readline__()." readline(), just like next(), is a method defined by a protocol (file vs iterator). Georg
On 03/04/2010 10:43 PM, Georg Brandl wrote:
(2) But: for any reason next() is not spelled as a "magic" method. If this method becomes the distinctive method of iterables, then it should be called __next__() for consistency. Side-question: Why is it called next(), as it is a magic method for iterators already? Because it is supposed to be called directly. __iter__() isn't. (This changes with Python 3, where you have next() as a builtin.)
And why is it made a builtin function? What was wrong with it being a normal method? -panzi
Mathias Panzenböck wrote:
On 03/04/2010 10:43 PM, Georg Brandl wrote:
(2) But: for any reason next() is not spelled as a "magic" method. If this method becomes the distinctive method of iterables, then it should be called __next__() for consistency. Side-question: Why is it called next(), as it is a magic method for iterators already? Because it is supposed to be called directly. __iter__() isn't. (This changes with Python 3, where you have next() as a builtin.)
And why is it made a builtin function? What was wrong with it being a normal method?
<http://www.python.org/dev/peps/pep-3114/>. -Andrew.
spir wrote:
(2) But: for any reason next() is not spelled as a "magic" method. If this method becomes the distinctive method of iterables, then it should be called __next__() for consistency.
I gather that it is indeed called __next__() in Py3, and there is a new builtin function next() for invoking it. -- Greg
Le Thu, 4 Mar 2010 11:35:53 +0100, spir <denis.spir@gmail.com> a écrit :
(1) I do not understand an iterable type's __iter__() method to be compulsary. Actually, each time I have defined one, I had to write: def __iter__(self): return self So, I guess that if python does not find __iter__(), but the object defines next(), then by default the said object could be used as its own iterator.
Explicit is better than implicit, though. In many non-trivial cases, the iterator will have to be a separate object anyway. Also, please note you can implement __iter__ as a generator, which makes things very easy for the simple cases:
class C(object): ... def __iter__(self): ... yield 1 ... yield 2 ... c = C() it = iter(c) it <generator object __iter__ at 0xb746c0f4> next(it) 1 next(it) 2 next(it) Traceback (most recent call last): File "<stdin>", line 1, in <module> StopIteration list(c) [1, 2]
(3) What I miss actually for iterables (which are their own iterator) is a kind of __reset__().
No, really, you don't want this, unless you like PHP. As others said, calling iter() again is the well-defined generic way to "reset" your iterable. Then, particular cases can warrant specific APIs, such as file.seek(). Regards Antoine.
participants (11)
-
Andrew Bennetts -
Antoine Pitrou -
Brett Cannon -
Chris Rebert -
Georg Brandl -
Greg Ewing -
Masklinn -
Mathias Panzenböck -
Nick Coghlan -
spir -
Éric Araujo