Single- vs. Multi-pass iterability
I keep running into the problem that there is no reliable way to introspect about whether a type supports multi-pass iterability (in the sense that an input stream might support only a single pass, but a list supports multiple passes). I suppose you could check for __getitem__, but that wouldn't cover linked lists, for example. Can anyone channel Guido's intent for me? Is this an oversight or a deliberate design decision? Is there an interface for checking multi-pass-ability that I've missed? TIA, Dave
On Mon, 8 Jul 2002, David Abrahams wrote:
I keep running into the problem that there is no reliable way to introspect about whether a type supports multi-pass iterability (in the sense that an input stream might support only a single pass, but a list supports multiple passes). I suppose you could check for __getitem__, but that wouldn't cover linked lists, for example.
Can anyone channel Guido's intent for me? Is this an oversight or a deliberate design decision? Is there an interface for checking multi-pass-ability that I've missed?
As far as I can tell, there is no published Python mechanism that distinguishes "input iterators" from "forward iterators" (using the C++ parlance). -Kevin -- Kevin Jacobs The OPAL Group - Enterprise Systems Architect Voice: (216) 986-0710 x 19 E-mail: jacobs@theopalgroup.com Fax: (216) 986-0714 WWW: http://www.theopalgroup.com
[David Abrahams]
I keep running into the problem that there is no reliable way to introspect about whether a type supports multi-pass iterability (in the sense that an input stream might support only a single pass, but a list supports multiple passes). I suppose you could check for __getitem__, but that wouldn't cover linked lists, for example.
Can anyone channel Guido's intent for me? Is this an oversight or a deliberate design decision? Is there an interface for checking multi-pass-ability that I've missed?
The language makes no such distinctions. If an app wants to make them, it's up to the app to implement them. Likewise for a way to tell a multipass iterator to "start over again". The Python iteration protocol has only two methods, .next() to get "the next" item, and .iter() to return self; given a random iterator, those are the only things you can rely on.
On Mon, Jul 08, 2002 at 02:44:25PM -0400, Tim Peters wrote:
[David Abrahams]
I keep running into the problem that there is no reliable way to introspect about whether a type supports multi-pass iterability (in the sense that an input stream might support only a single pass, but a list supports multiple passes). I suppose you could check for __getitem__, but that wouldn't cover linked lists, for example.
Can anyone channel Guido's intent for me? Is this an oversight or a deliberate design decision? Is there an interface for checking multi-pass-ability that I've missed?
The language makes no such distinctions. If an app wants to make them, it's up to the app to implement them. Likewise for a way to tell a multipass iterator to "start over again". The Python iteration protocol has only two methods, .next() to get "the next" item, and .iter() to return self; given a random iterator, those are the only things you can rely on.
I believe that when David was talking about multi-pass iterability he wasn't referring to an iterator that can be told to "start over again" but to an iterable object that can produce multiple independent iterators of itself, each one good for a single iteration. The language does make a distinction between an *iterable* object that may have only an __iter__ method and an *iterator* that has a next method. This distinction is blurred a bit by the fact that iterators also have an __iter__ method that also makes them appear as one-shot iterables. Imagine an altenative universe where a south african programmer called Rossu van Guidom writes a wonderful language called Mamba and in that language iterator semantics are defined like this: * Objects that wish to be iterable define an __iter__() method returning an iterator. * An iterator is an object with a next() method. That's all. * The for statement checks if an object has an __iter__ method. If it does, it calls it and uses the returned iterator. If it doesn't, it tries to use the object itself. If it doesn't have .next either it will fail and report that the object is not iterable. A Mamba programmer called Nero Hsorit has speculated in a mamba-dev posting that in an alternative universe in a language called 'Cobra' people kept getting confused between iterators and iterables :-) Oren
[Oren Tirosh]
I believe that when David was talking about multi-pass iterability he wasn't referring to an iterator that can be told to "start over again" but to an iterable object that can produce multiple independent iterators of itself, each one good for a single iteration.
To an excellent first approximation, it makes no difference. I read David the same as you, and that's where my "no such distinctions" came from. The "likewise" in "Likewise for a way to tell a multipass iterator to 'start over again'" means "and in addition to what you asked about, not that either" (which is something other people have asked about, and more often than what David asked about).
The language does make a distinction between an *iterable* object that may have only an __iter__ method and an *iterator* that has a next method.
Sure. At the wrapper-level David works at, all Python supplies here is PyObject_GetIter(x), which returns an iterator or a NULL, and in the former case the only useful thing he can do with it is call PyIter_Next() on it. There's simply no way for him to know whether calling PyObject_GetIter(x) again will yield an iterator that produces the same sequence of values, or even whether it will yield an iterator again at all. He could hardcode knowledge about a few types, like, e.g., the builtin list type, but that wouldn't even extend to subclasses of list; similarly a subclass of file may well fiddle its iterator to be multi-pass despite that the builtin file doesn't.
... A Mamba programmer called Nero Hsorit has speculated in a mamba-dev posting that in an alternative universe in a language called 'Cobra' people kept getting confused between iterators and iterables :-)
David can't get there from here with or without confusion <wink>.
Imagine an altenative universe where a south african programmer called Rossu van Guidom writes a wonderful language called Mamba and in that language iterator semantics are defined like this:
* Objects that wish to be iterable define an __iter__() method returning an iterator.
* An iterator is an object with a next() method. That's all.
But that doesn't allow for things like file objects, which, although not iterators themselves, are capable of producing iterators of different sorts which iterate over them in different ways -- and yet they can only be iterated over once. In other words, there are such things as one-shot iterables, even if iterables and iterators are kept separate. Maybe a one-shot iterable should raise an exception if you try to obtain a second iterator from it? Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+
On Tue, Jul 09, 2002, Greg Ewing wrote:
Maybe a one-shot iterable should raise an exception if you try to obtain a second iterator from it?
Then you couldn't do this: done = False for line in f: if not check(line): break process(line) else: done = True if not done: for line in file: another_process(line) -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Project Vote Smart: http://www.vote-smart.org/
Then you couldn't do this:
done = False for line in f: if not check(line): break process(line) else: done = True
if not done: for line in file: another_process(line)
That's already broken, see SF bug 524804. --Guido van Rossum (home page: http://www.python.org/~guido/)
On Wed, Jul 10, 2002 at 09:10:18PM -0400, Guido van Rossum wrote:
Then you couldn't do this:
done = False for line in f: if not check(line): break process(line) else: done = True
if not done: for line in file: another_process(line)
That's already broken, see SF bug 524804.
Xreadlines is buffered and therefore leaves the file position of the file in an unexpected state. If you use xreadlines explicitly you should expect that. The fact that file.__iter__ returns an xreadlines object implicitly is therefore a bit surprising. What's the reason for using xreadlines as a file iterator? Was it performance or was it just the easiest way to implement it using an existing object? "Files support the iterator protocol. Each iteration returns the same result as file.readline()" This is not correct. Files support what I call the iterable protocol. Objects supporting the iterator protocol have a .next() method, files don't. While it's true that each iteration has the same result as readline it doesn't have the same side effects. Proposal: make files really support the iterator protocol. __iter__ would return self and next() would call readline and raise StopIteration if ''. If anyone wants the xreadline performance improvement it should be explicit. definitions: iterable := hasattr(obj, '__iter__') iterator := hasattr(obj, '__iter__') and hasattr(obj, 'next') If object is iterable and not an iterator it would be reasonable to expect that it is also re-iterable. I don't know if this should be a requirement but I think it would be a good idea if all builtin objects should conform to it anyway. Currently files are the only builtin that is iterable, not an iterator and not re-iterable. explicit-is-better-than-implicit-ly yours, Oren
Oren Tirosh wrote:
Xreadlines is buffered and therefore leaves the file position of the file in an unexpected state. If you use xreadlines explicitly you should expect that. The fact that file.__iter__ returns an xreadlines object implicitly is therefore a bit surprising.
What's the reason for using xreadlines as a file iterator? Was it performance or was it just the easiest way to implement it using an existing object?
The rationale was something like "the simple most way to iterate over the lines in a file should be the fastest". I'd agree with that, but not at the expense of the surprises mentioned in the bug. I would perhaps help if the file object would cache the xreadlines iterator, that would limit the scope of the problem to the case where iteration and explicit .read() calls are mixed.
"Files support the iterator protocol. Each iteration returns the same result as file.readline()"
This is not correct. Files support what I call the iterable protocol. Objects supporting the iterator protocol have a .next() method, files don't. While it's true that each iteration has the same result as readline it doesn't have the same side effects.
Proposal: make files really support the iterator protocol. __iter__ would return self and next() would call readline and raise StopIteration if ''. If anyone wants the xreadline performance improvement it should be explicit.
+1 (But, since the bug is closed as "won't fix" I doubt this has a big chance of happening.) Just
Proposal: make files really support the iterator protocol. __iter__ would return self and next() would call readline and raise StopIteration if ''. If anyone wants the xreadline performance improvement it should be explicit.
No. I won't have "for line in file" be slower than attainable. The only solution I accept is a complete rewrite of the I/O system without using stdio, so xreadlines can be integrated. --Guido van Rossum (home page: http://www.python.org/~guido/)
On Thursday 11 July 2002 12:47 pm, Guido van Rossum wrote:
Proposal: make files really support the iterator protocol. __iter__ would return self and next() would call readline and raise StopIteration if ''. If anyone wants the xreadline performance improvement it should be explicit.
No. I won't have "for line in file" be slower than attainable.
+1. I _intensely_ want to be able to teach beginners to use "for line in file" and have it be fast in the common case. "Nice" behavior for rarer cases of prematurely interrupted loops is OK, if feasible, but secondary. Having "for line in file" play nicely with other method calls on 'file' has no importance to me in this context -- no more than, e.g., having "for item in alist" play nicely with calls to mutating methods of object alist.
The only solution I accept is a complete rewrite of the I/O system without using stdio, so xreadlines can be integrated.
I thought Just's suggestion (about having the file object remember the xreadlines object in use, so that another for loop would continue right where the first one exited) seemed like a reasonable hack -- a compromise of reasonably little effort for some small secondary gain. Guess I must be missing something? Of course the "complete rewrite" is an alluring prospect -- for many other reasons, such as enabling user control of file buffering in cross-platform ways, *yum* -- but it's not going to happen in time for 2.3 anyway, is it? Alex
What's the reason for using xreadlines as a file iterator? Was it performance or was it just the easiest way to implement it using an existing object?
I thought this was answered adequately by my last entry in the SF bug report. The short answer is performance in the common case. --Guido van Rossum (home page: http://www.python.org/~guido/)
From: "Oren Tirosh" <oren-py-d@hishome.net>
I believe that when David was talking about multi-pass iterability he wasn't referring to an iterator that can be told to "start over again"
but
to an iterable object that can produce multiple independent iterators of itself, each one good for a single iteration.
That's right.
The language does make a distinction between an *iterable* object that may have only an __iter__ method and an *iterator* that has a next method. This distinction is blurred a bit by the fact that iterators also have an __iter__ method that also makes them appear as one-shot iterables.
Yep. [Part of the reason I want to know whether I've got a one-shot sequence is that inspecting that sequence then becomes an information-destroying operation -- only being able to touch it once changes how you have to handle it] I was thinking one potentially nice way to introspect about multi-pass-ability might be to get an iterator and to see whether it was copyable. Currently even most multi-pass iterators can't be copied with copy.copy(). -Dave
On Tue, Jul 09, 2002 at 04:43:25AM -0400, David Abrahams wrote:
Yep. [Part of the reason I want to know whether I've got a one-shot sequence is that inspecting that sequence then becomes an information-destroying operation -- only being able to touch it once changes how you have to handle it]
I was thinking one potentially nice way to introspect about multi-pass-ability might be to get an iterator and to see whether it was copyable. Currently even most multi-pass iterators can't be copied with copy.copy().
I wouldn't call it a one-shot sequence - it's just an iterator. The name iterator is enough to suggest that it is disposable and good for just one pass through the container. If the object has an __iter__ method but no next it's not an iterator and therefore most likely re-iterable. One notable exception is a file object. File iterators affect the current position of the file. If you think about it you'll see that file objects aren't really containers - they are already iterators. The real container is the file on the disk and the file object represents a pointer to a position in this container used for scanning it which a pretty good definition of an iterator. The difference is cosmetic: the next method is called readline and it returns an empty string instead of raising StopIteration. class ifile(file): def __iter__(self): return self def next(self): s = self.readline() if s: return s raise StopIteration class xfile: def __init__(self, filename): self.filename = filename def __iter__(self): return ifile(self.filename) This pair of objects has a proper container/iterator relationship. The xfile (stands for eXternal file, nothing to do with Mulder and Scully) represents the file on the disk and each call to iter(xfileobject) returns a new and independent iterator of the same container. Oren
From: "Oren Tirosh" <oren-py-d@hishome.net>
On Tue, Jul 09, 2002 at 04:43:25AM -0400, David Abrahams wrote:
Yep. [Part of the reason I want to know whether I've got a one-shot sequence is that inspecting that sequence then becomes an information-destroying operation -- only being able to touch it once changes how you have to handle it]
I was thinking one potentially nice way to introspect about multi-pass-ability might be to get an iterator and to see whether it was copyable. Currently even most multi-pass iterators can't be copied with copy.copy().
I wouldn't call it a one-shot sequence - it's just an iterator. The name iterator is enough to suggest that it is disposable and good for just one pass through the container.
If the object has an __iter__ method but no next it's not an iterator and therefore most likely re-iterable. One notable exception is a file object. File iterators affect the current position of the file.
No kidding, that's the problem I'm talking about. It does me no good to have a criterion for determinining re-iterability which fails for the case I'm most concerned with ;-)
If you think about it you'll see that file objects aren't really containers - they are already iterators. The real container is the file on the disk
There might not be a "real container" -- if it's an input pipe the data disappears as you iterate it. -Dave
[Oren Tirosh]
class ifile(file): def __iter__(self): return self
def next(self): s = self.readline() if s: return s raise StopIteration
class xfile: def __init__(self, filename): self.filename = filename
def __iter__(self): return ifile(self.filename)
This pair of objects has a proper container/iterator relationship.
This is all clear to me, except for one little thing. I wonder why class `ifile' has an `__iter__' method itself. I know it is said to be the "iterator protocol", and I wonder why it has to be. My understanding is that `__iter__' returns an iterator all ready to be enquired a number of times through `.next()' calls, and I presume that if any re-initialisation has to take place, it is within `__iter__'. However, as the iterator maintains its own progressive state, I do not see the intent and purpose of the iterator having an `__iter__' method itself. Would it make sense using the iterator `__iter__' as the preferred place where it re-initialises itself? -- François Pinard http://www.iro.umontreal.ca/~pinard
On Tue, Jul 09, 2002 at 08:14:38AM -0400, François Pinard wrote:
This is all clear to me, except for one little thing. I wonder why class `ifile' has an `__iter__' method itself. I know it is said to be the "iterator protocol", and I wonder why it has to be.
I don't like it either. In my previous message about the language 'Mamba' in an alternative universe I have an example of an alternative: if object has a tp_iter it is called, otherwise the object must have a tp_next.
My understanding is that `__iter__' returns an iterator all ready to be enquired a number of times through `.next()' calls, and I presume that if any re-initialisation has to take place, it is within `__iter__'. However, as the iterator maintains its own progressive state, I do not see the intent and purpose of the iterator having an `__iter__' method itself. Would it make sense using the iterator `__iter__' as the preferred place where it re-initialises itself?
As far as I can tell this was done so that for could iterate over both iterables and iterators. I just don't see why it has to be done by all iterators instead of in just one place, adding much confusion between iterators and iterables in the process. Oren
[Oren Tirosh]
[François Pinard]
However, as the iterator maintains its own progressive state, I do not see the intent and purpose of the iterator having an `__iter__' method itself.
As far as I can tell this was done so that for could iterate over both iterables and iterators.
That is, that an iterator is always itself an iterable. I guess the real question is: could we have the guarantee that if an iterable returns an iterator through the iterable's __iter__, the iterator's __iter__ method will never be called from looping over the iterable? If we do not have that guarantee, then, when (and why) will the iterator's __iter__ be called? I did not find an answer to these questions neither from the Reference Manual nor the PEP, yet I confess that the exposition of the C API might hold an answer I could not understand. Could it be explained without referring to C? -- François Pinard http://www.iro.umontreal.ca/~pinard
pinard@iro.umontreal.ca:
could we have the guarantee that if an iterable returns an iterator through the iterable's __iter__, the iterator's __iter__ method will never be called from looping over the iterable?
[...pause while Greg's brain parses that sentence...] Yes, I believe that's true. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+
[Greg Ewing]
pinard@iro.umontreal.ca:
could we have the guarantee that if an iterable returns an iterator through the iterable's __iter__, the iterator's __iter__ method will never be called from looping over the iterable?
[...pause while Greg's brain parses that sentence...]
(Sorry for my bad English.)
Yes, I believe that's true.
If yes, then, the Library Reference is misleading, at the page: http://www.python.org/dev/doc/devel/lib/typeiter.html when it strongly says that any iterator's __iter__ method is "required". I guess this is a possible source of confusion. The context does not make it clear that the iterator's __iter__ method is *only* required whenever one *also* wants to use an iterator as an iterable. Better would be to describe __iter__ only once, the first time through, saying everything there that has to be said, and only retain for iterators the requirement of having a `next()' method. We should describe the truth. P.S. - Also, I do not understand the tiny bit about the `in' statement in the above page. Has `in' ever been a statement? If it refers to the comparison operator `in', then has it any special properties when used with iterators? I'm unsuccessful at seeing any hint about this from the documentation: http://www.python.org/dev/doc/devel/ref/comparisons.html -- François Pinard http://www.iro.umontreal.ca/~pinard
[François Pinard]
If yes, then, the Library Reference is misleading, at the page:
http://www.python.org/dev/doc/devel/lib/typeiter.html
when it strongly says that any iterator's __iter__ method is "required". I guess this is a possible source of confusion.
This is how Python the language is defined. The current C Python implementation doesn't enforce it, and you may be able to get away with defining an iterator that doesn't supply an __iter__ method in some contexts under the current C implementation. If you do, though, you're breaking the rules, and there's no guarantee your code will continue to work.
The context does not make it clear that the iterator's __iter__ method is *only* required whenever one *also* wants to use an iterator as an iterable.
That's not how the iteration protocol is defined, and isn't how it should be defined either. Requiring *some* method with a reserved name is an aid to introspection, lest it become impossible to distinguish, say, an iterator from an instance of a doubly-linked list node class that just happens to supply methods named .prev() and .next() for an unrelated purpose.
Better would be to describe __iter__ only once, the first time through, saying everything there that has to be said, and only retain for iterators the requirement of having a `next()' method. We should describe the truth.
Except that iterators are required to have an __iter__ method: this is a matter of definition, not just of reverse-engineering the minimum you can get away with under the current implementation in assorted contexts. You'll discover this hard way the first time you try to pass an iterator without an __iter__ method to a routine you didn't write that says it accepts any iterable object as an argument. Such a routine is entitled-- by the documented requirements --to rely on its argument responding sensibly to an __iter__ message.
P.S. - Also, I do not understand the tiny bit about the `in' statement in the above page. Has `in' ever been a statement?
I figure you're talking about this: ... to be used with the for and in statements The tail end of that is indeed worded poorly; 'in' isn't a statement.
If it refers to the comparison operator `in',
Yes, that's the intent.
then has it any special properties when used with iterators?
In x in y y can be any iterable object. As an extreme example, if "error\n" in file('msgs'):
On Tue, 9 Jul 2002, Tim Peters wrote:
The context does not make it clear that the iterator's __iter__ method is *only* required whenever one *also* wants to use an iterator as an iterable.
That's not how the iteration protocol is defined, and isn't how it should be defined either. Requiring *some* method with a reserved name is an aid to introspection
This is a terrible reason for the existence of an __iter__ method, because (a) it's a bad way to do type-checking and (b) it doesn't even work. (a) If we followed this logic, we'd insist on having a useless __dict__ method on dictionaries, a useless __list__ method on lists, etc. etc. just so we could check types by looking for these methods. As i understood it, the Python way is to let the protocol speak for itself. Something that wants to give out keys can implement the keys() method, something that wants to act like a container can implement __getitem__, and so on. There's no need to make an additional declaration of dict-ness by adding a dummy __dict__ method -- indeed, sometimes we don't *want* to make that kind of commitment, and Python allows that flexibility. It seems to me that dictionaries are to keys() as iterators are to next(). (b) Looking for __iter__ is not a valid test for iterator-ness. Files and other iterable objects supply __iter__, but they are not iterators. So it doesn't work as a type test. I agree with Oren that it makes more sense for iterator-fetching to be a convenience handled by the implementations of "for" and "in", rather than foisting the extra hassle of "def __iter__(self): return self" on every individual iterator implementation. -- ?!ng
David> I keep running into the problem that there is no reliable way David> to introspect about whether a type supports multi-pass David> iterability (in the sense that an input stream might support David> only a single pass, but a list supports multiple passes). I David> suppose you could check for __getitem__, but that wouldn't David> cover linked lists, for example. Here's a suggestion for a long-term strategy for solving this problem, should it be deemed desirable to do so: Right now, every iterator, and every object that supports iteration, must have an __iter__() method. Suppose we augment that with the following: A new kind of iterator, called a multiple iterator, that supports multiple iterations over the same sequence. A requirement that every object that supports multiple iteration have a __multiter__() method that yields a multiple iterator over its sequence, in addition to an __iter__() method that yields a (multiple or single) iterator (so that every sequence that supports multiple iteration also supports single iteration). A requirement that every multiple iterator support the following methods: __multiter__() yields the iterator object itself __iter__() also yields the iterator object itself (so that every multiple iterator is also an iterator) __next__() return the next item from the container or raise StopIteration __copy__() return a distinct, newly created multiple iterator that iterates over the same sequence as the original, starting from the current element. Note that when the last multiple iterator has left an element, there is no possibility of going back to that element again unless the sequence itself provides a way of doing so. Therefore, for example, it might be possible for files to provide multiple iterators without undue space inefficiency. -- Andrew Koenig, ark@research.att.com, http://www.research.att.com/info/ark
From: "Andrew Koenig" <ark@research.att.com>
David> I keep running into the problem that there is no reliable way David> to introspect about whether a type supports multi-pass David> iterability (in the sense that an input stream might support David> only a single pass, but a list supports multiple passes). I David> suppose you could check for __getitem__, but that wouldn't David> cover linked lists, for example.
Here's a suggestion for a long-term strategy for solving this problem, should it be deemed desirable to do so:
Right now, every iterator, and every object that supports iteration, must have an __iter__() method. Suppose we augment that with the following:
A new kind of iterator, called a multiple iterator, that supports multiple iterations over the same sequence.
A requirement that every object that supports multiple iteration have a __multiter__() method that yields a multiple iterator over its sequence, in addition to an __iter__() method that yields a (multiple or single) iterator (so that every sequence that supports multiple iteration also supports single iteration).
A requirement that every multiple iterator support the following methods:
__multiter__() yields the iterator object itself __iter__() also yields the iterator object itself (so that every multiple iterator is also an iterator) __next__() return the next item from the container or raise StopIteration __copy__() return a distinct, newly created multiple iterator that iterates over the same sequence as the original, starting from the current element.
Why bother with __multiter__? If you can distinguish a multiple iterator by the presence of __copy__, You can always do hasattr(x.__iter__(),"__copy__") to find out whether something is multi-iteratable. -Dave
David> Why bother with __multiter__? If you can distinguish a multiple David> iterator by the presence of __copy__, You can always do David> hasattr(x.__iter__(),"__copy__") to find out whether something David> is multi-iteratable. Because explicit is better than implicit :-) More seriously, I can imagine distinguishing a multiple iterator by the presence of __copy__, but I can't imagine using the presence of __copy__ to determine whether a *container* supports multiple iteration. For example, there surely exist containers today that support __copy__ but whose __iter__ methods yield iterators that do not themselves support __copy__. Another reason is that I can imagine this idea extended to encompass, say, ambidextrous iterators that support prev() as well as next(), and I would want to use __ambiter__ as a marker for those rather than having to create an iterator and see if it has prev().
On Thu, 11 Jul 2002, Andrew Koenig wrote:
More seriously, I can imagine distinguishing a multiple iterator by the presence of __copy__, but I can't imagine using the presence of __copy__ to determine whether a *container* supports multiple iteration. For example, there surely exist containers today that support __copy__ but whose __iter__ methods yield iterators that do not themselves support __copy__.
Just fetch the iterator from the container and look for __copy__ on that. Or, what if there is no container to begin with, but the iterator is still copyable? You can't flag that by putting __multiter__ on anything; again it makes more sense to just provide __copy__ on the iterator. All that's really necessary here is to document the convention about what __copy__ is supposed to mean if it's available on an iterator. If we all agree that __copy__ should preserve an independent copy of the current state of the iterator, we're all set.
Another reason is that I can imagine this idea extended to encompass, say, ambidextrous iterators that support prev() as well as next(), and I would want to use __ambiter__ as a marker for those rather than having to create an iterator and see if it has prev().
I think a proliferation of iterator-fetching methods would be a messy and unpleasant prospect. After __iter__, __multiter__, and __ambiter__, what next? __mutableiter__? __depthfirstiter__? __breadthfirstiter__? -- ?!ng "If I have seen farther than others, it is because I was standing on a really big heap of midgets." -- K. Eric Drexler
Ping> Just fetch the iterator from the container and look for __copy__ on that. Yes, that's an alternative. However the purpose my suggestion of __multiter__ was not to use it to test for multiple iteration, but to enable a container to be able to yield either a single or a multiple iterator on request. Ping> Or, what if there is no container to begin with, but the iterator is still Ping> copyable? You can't flag that by putting __multiter__ on anything; again Ping> it makes more sense to just provide __copy__ on the iterator. You could flag it by putting __multiter__ on the iterator, just as iterators presently have __iter__. Ping> All that's really necessary here is to document the convention about what Ping> __copy__ is supposed to mean if it's available on an iterator. If we Ping> all agree that __copy__ should preserve an independent copy of the Ping> current state of the iterator, we're all set. Not quite. We also need an agreement that calling __iter__ on a container is not a destructive operation unless you call next() on the iterator that you get back.
Another reason is that I can imagine this idea extended to encompass, say, ambidextrous iterators that support prev() as well as next(), and I would want to use __ambiter__ as a marker for those rather than having to create an iterator and see if it has prev().
Ping> I think a proliferation of iterator-fetching methods would be a Ping> messy and unpleasant prospect. After __iter__, __multiter__, Ping> and __ambiter__, what next? __mutableiter__? Ping> __depthfirstiter__? __breadthfirstiter__? A data structure that supports several different kinds of iteration has to provide that support somehow. What's your suggestion?
Ping> Just fetch the iterator from the container and look for __copy__ on
From: "Andrew Koenig" <ark@research.att.com> that.
Yes, that's an alternative.
However the purpose my suggestion of __multiter__ was not to use it to test for multiple iteration, but to enable a container to be able to yield either a single or a multiple iterator on request.
Why would you want that? Seems like a corner case at best.
A data structure that supports several different kinds of iteration has to provide that support somehow. What's your suggestion?
class DataStructure(object): def __init__(self): self._numbers = range(10); self._names = [ str(x) for x in range(10) ]; names = property(lambda self: iter(self._names)) numbers = property(lambda self: iter(self._numbers)) x = DataStructure(); for y in x.names: print repr(y), print for y in x.numbers: print repr(y), [Y'know, Python is great. That worked the first time I ran it.] -Dave
However the purpose my suggestion of __multiter__ was not to use it to test for multiple iteration, but to enable a container to be able to yield either a single or a multiple iterator on request.
David> Why would you want that? Seems like a corner case at best. You're right -- I wasn't thinking clearly. What I meant to say was that I would like a program that expects to be able to use a multiple iterator to be able to say so simply and efficiently in code. For example: for i in multiter(x): // whatever I would like this to fail cleanly if x does not support multiple iterators.
A data structure that supports several different kinds of iteration has to provide that support somehow. What's your suggestion?
David> class DataStructure(object): David> def __init__(self): David> self._numbers = range(10); David> self._names = [ str(x) for x in range(10) ]; David> names = property(lambda self: iter(self._names)) David> numbers = property(lambda self: iter(self._numbers)) David> x = DataStructure(); David> for y in x.names: David> print repr(y), David> print David> for y in x.numbers: David> print repr(y), David> [Y'know, Python is great. That worked the first time I ran it.] I don't understand how this code answers my question. You've asked for iterators over two different data structures. What I was asking was, for example, how one might arrange for a single tree to yield either a depth-first or breadth-first iterator.
From: "Andrew Koenig" <ark@research.att.com>
A data structure that supports several different kinds of iteration has to provide that support somehow. What's your suggestion?
David> class DataStructure(object): David> def __init__(self): David> self._numbers = range(10); David> self._names = [ str(x) for x in range(10) ];
David> names = property(lambda self: iter(self._names)) David> numbers = property(lambda self: iter(self._numbers))
David> x = DataStructure(); David> for y in x.names: David> print repr(y),
David> print
David> for y in x.numbers: David> print repr(y),
David> [Y'know, Python is great. That worked the first time I ran it.]
I don't understand how this code answers my question. You've asked for iterators over two different data structures. What I was asking was, for example, how one might arrange for a single tree to yield either a depth-first or breadth-first iterator.
Just replace 'names' by breadth_first and 'numbers' by depth_first. or-vice-versa-ly y'rs, dave
David> Just replace 'names' by breadth_first and 'numbers' by depth_first. David> or-vice-versa-ly y'rs, which doesn't address the question of a uniform convention.
From: "Andrew Koenig" <ark@research.att.com>
David> Just replace 'names' by breadth_first and 'numbers' by depth_first.
David> or-vice-versa-ly y'rs,
which doesn't address the question of a uniform convention.
I'm with you on the desire to have a way to get a multipass-iterator-or-error in one swell foop. That says "I want to iterate over this thing without changing it". I still think that hasattr(iter(x), '__copy__') is a pretty clean way to do that, despite the fact that it potentially creates an iterator (which some people apparently view as too heavyweight as an introspection step). However, I don't see any point in trying to define a protocol for every different possible iteration view of a thing. Dicts have keys, values, and items. Trees have breadth-first, depth-first, inorder, postorder, blah, blah, blah. There are just too many of these, and they're all different. -Dave
David> I'm with you on the desire to have a way to get a David> multipass-iterator-or-error in one swell foop. That says "I David> want to iterate over this thing without changing it". I still David> think that hasattr(iter(x), '__copy__') is a pretty clean way David> to do that, despite the fact that it potentially creates an David> iterator (which some people apparently view as too heavyweight David> as an introspection step). In particular, creating an iterator had better not be a destructive operation. David> However, I don't see any point in trying to define a protocol David> for every different possible iteration view of a thing. Dicts David> have keys, values, and items. Trees have breadth-first, David> depth-first, inorder, postorder, blah, blah, blah. There are David> just too many of these, and they're all different. I wasn't suggesting defining a protocol for every possible iteration view. I was raising the question of whether multi-pass iteration was likely to be a common enough operation that it is worth defining a protocol for it, while leaving the door open to defining protocols for others should it turn out to be desirable to do so.
From: "Andrew Koenig" <ark@research.att.com>
I wasn't suggesting defining a protocol for every possible iteration view. I was raising the question of whether multi-pass iteration was likely to be a common enough operation that it is worth defining a protocol for it, while leaving the door open to defining protocols for others should it turn out to be desirable to do so.
I think your examples are confusing different beasts, then. Multipass (=copyable in these examples) should be a capability of iterators, just as bidirectional or random-access would be. Breadth-first/depth-first is not a capability of the iterator in that sense, but an implementatoin detail -- from the POV of the iterator's user, there's no way to tell what the traversal order is. -Dave
On Mon, 15 Jul 2002, Andrew Koenig wrote:
However the purpose my suggestion of __multiter__ was not to use it to test for multiple iteration, but to enable a container to be able to yield either a single or a multiple iterator on request.
I see what you want, though i have a hard time imagining a situation where it's really necessary to have both (as opposed to just the multiple iterator, which is strictly more capable). I can certainly see how you might want to be able to ask for a breadth-first or depth-first iterator on a tree, though.
Or, what if there is no container to begin with, but the iterator is still copyable? You can't flag that by putting __multiter__ on anything; again it makes more sense to just provide __copy__ on the iterator.
You could flag it by putting __multiter__ on the iterator, just as iterators presently have __iter__.
Ugh. I don't like this, for the reasons i outlined in another message: an iterator is not the same as a container. Iterators always mutate; containers usually do not (at least not as a result of looking at the elements).
All that's really necessary here is to document the convention about what __copy__ is supposed to mean if it's available on an iterator. If we all agree that __copy__ should preserve an independent copy of the current state of the iterator, we're all set.
Not quite. We also need an agreement that calling __iter__ on a container is not a destructive operation unless you call next() on the iterator that you get back.
What i'd like is an agreement that calling __iter__ on a container is not a destructive operation at all. If it's destructive, then what you started with is not really a container, and we should encourage people to call attention to this irregularity in their documentation.
I think a proliferation of iterator-fetching methods would be a messy and unpleasant prospect. After __iter__, __multiter__, and __ambiter__, what next? __mutableiter__? __depthfirstiter__? __breadthfirstiter__?
A data structure that supports several different kinds of iteration has to provide that support somehow.
Agreed. I was unclear: what makes me uncomfortable is the pollution of the double-underscore namespace. When you do have a container-like object that supports various kinds of iteration, naturally you are going to need some methods for getting iterators. I just think it's not appropriate to establish special names for them. To me, the presence of double-underscores around a method name means that the method is called automatically. My expectation is that when i write a method with a "normal" name, the name itself will appear after a dot wherever that method is used; and that when there's a method with a "__special__" name, the method is called implicitly. The implicit call can occur via an operator (e.g. __add__), or to implement a protocol defined in the language (e.g. __init__), etc. If you see the string ".__" it means that something unusual is going on. If you follow this convention, then "__iter__" deserves a special name, because it is the specially blessed iterator-getter used by "for". There may be other iterator-getters, but they must be called explicitly, so they shouldn't get underscores. * * * An aside on "next" vs. "__next__": Note that this convention would also suggest that "next" should be called "__next__", since "for" calls "next" implicitly. I forget why we ended up going with "next" instead of "__next__". I think "__next__" would have been better, especially in light of this: Tim Peters wrote:
Requiring *some* method with a reserved name is an aid to introspection, lest it become impossible to distinguish, say, an iterator from an instance of a doubly-linked list node class that just happens to supply methods named .prev() and .next() for an unrelated purpose.
This is exactly why the iterator protocol should consist of one method named "__next__" rather than two methods named "__iter__" (which has nothing to do with the act of iterating!) and "next" (which is the one we really care about, but can collide with existing method names). As far as i know, "next" is the only implicitly-called method of an internal protocol that has no underscores. It's a little late to fix the name of "next" in Python 2, though it might be worth considering for Python 3. -- ?!ng
An aside on "next" vs. "__next__":
Note that this convention would also suggest that "next" should be called "__next__", since "for" calls "next" implicitly. I forget why we ended up going with "next" instead of "__next__". I think "__next__" would have been better, especially in light of this:
Tim Peters wrote:
Requiring *some* method with a reserved name is an aid to introspection, lest it become impossible to distinguish, say, an iterator from an instance of a doubly-linked list node class that just happens to supply methods named .prev() and .next() for an unrelated purpose.
This is exactly why the iterator protocol should consist of one method named "__next__" rather than two methods named "__iter__" (which has nothing to do with the act of iterating!) and "next" (which is the one we really care about, but can collide with existing method names).
As far as i know, "next" is the only implicitly-called method of an internal protocol that has no underscores. It's a little late to fix the name of "next" in Python 2, though it might be worth considering for Python 3.
Yup. I regret this too. We should have had a built-in next(x) which calls x.__next__(). I think that if it had been __next__() we wouldn't have the mistake that I just discovered -- that all the iterator types that define a next() method shouldn't have done so, because you get one automatically which is the tp_iternext slot wrapped. :-( But yes, it's too late to change now. --Guido van Rossum (home page: http://www.python.org/~guido/)
On Tue, 16 Jul 2002, Guido van Rossum wrote:
Yup. I regret this too. We should have had a built-in next(x) which calls x.__next__().
Ah! I think one of the hang-ups i (we? the BOF?) got stuck on was that users of iterators would sometimes call next() directly, and so it wouldn't do to call it __next__. But it's clear to me now that a built-in next() is exactly the right answer, by analogy to the built-in repr() and method __repr__.
But yes, it's too late to change now.
Sigh. Well, in the hope that it makes the change a little easier to swallow, i'll say now that if the protocol is fixed in some future version of Python, i'll volunteer to update the standard library to the new protocol. I guess when Python 3 comes around, there's going to me some sort of migration helper tool, and that tool can check for classes that have __iter__ and next, and suggest changing the name to __next__. -- ?!ng
On Tue, Jul 16, 2002 at 09:29:55PM -0400, Guido van Rossum wrote: | > An aside on "next" vs. "__next__": | > | > As far as i know, "next" is the only implicitly-called method of | > an internal protocol that has no underscores. It's a little late | > to fix the name of "next" in Python 2, though it might be worth | > considering for Python 3. | | Yup. I regret this too. | But yes, it's too late to change now. I don't think it is too late. 90% ++ of the python code base out there doesn't use iterators yet... people are still wrapping their minds around it to see how they can use it in their applications. If it was publicly stated that this could be "fixed" in the next version I don't think that it would hurt. These things happen, and sometimes its best to "roll back". Programmers understand this. Best, Clark -- Clark C. Evans Axista, Inc. http://www.axista.com 800.926.5525 XCOLLA Collaborative Project Management Software
"CC" == Clark C <cce@clarkevans.com> writes:
CC> I don't think it is too late. 90% ++ of the python code base CC> out there doesn't use iterators yet... people are still CC> wrapping their minds around it to see how they can use it in CC> their applications. If it was publicly stated that this could CC> be "fixed" in the next version I don't think that it would CC> hurt. These things happen, and sometimes its best to "roll CC> back". Programmers understand this. And besides (to continue Clark's devils advocacy), how much of the code out there that /does/ use iterators, calls .next() explicitly? -Barry
From: "Barry A. Warsaw" <barry@zope.com>
CC> I don't think it is too late. 90% ++ of the python code base CC> out there doesn't use iterators yet... people are still CC> wrapping their minds around it to see how they can use it in CC> their applications. If it was publicly stated that this could CC> be "fixed" in the next version I don't think that it would CC> hurt. These things happen, and sometimes its best to "roll CC> back". Programmers understand this.
And besides (to continue Clark's devils advocacy), how much of the code out there that /does/ use iterators, calls .next() explicitly?
Hmm, I'm getting excited! We rarely get an opportunity to fix mistakes in language design. Probably someone will bring me back to reality shortly, though ;-) Maybe I'll do it: the problem is really the iterators people have written. However, you could implicitly generate __next__() which calls next() when the result of __iter__() lacks a __next__() function... with a warning, of course. -Dave
I don't think it is too late. 90% ++ of the python code base out there doesn't use iterators yet... people are still wrapping their minds around it to see how they can use it in their applications. If it was publicly stated that this could be "fixed" in the next version I don't think that it would hurt. These things happen, and sometimes its best to "roll back". Programmers understand this.
I find this really hard to believe, given that such a big deal has been made of iterators. Care to conduct a survey on c.l.py? Given that it's really only a very minor problem, I'd rather not expend the effort to 'fix" this. --Guido van Rossum (home page: http://www.python.org/~guido/)
On Wed, Jul 17, 2002 at 10:09:13AM -0400, Guido van Rossum wrote: | > I don't think it is too late. 90% ++ of the python code base out | > there doesn't use iterators yet... people are still wrapping their | > minds around it to see how they can use it in their applications. | > If it was publicly stated that this could be "fixed" in the next | > version I don't think that it would hurt. These things happen, | > and sometimes its best to "roll back". Programmers understand this. | | I find this really hard to believe, given that such a big deal has | been made of iterators. i None of my code uses explicit use of iterators, and I was very aware of them. My new code that I'm building now does, but it wouldn't take much effort to fix it. I myself personally would rather keep Python "clean" of blemish. For the most part, Python is really free of dragons and that's why I like it. I'm willing to put up with short-term pain for long term gain. Unlike Java or Visual Basic, I intend to be programming in Python 10+ years from now; so from my perspective, it is an investment. Plus, most features don't get used by the public for at least a year or so as it takes a while for the code-examples to start using them and books to be updated. | Care to conduct a survey on c.l.py? Sure. I'll run the survey and report back. What would be the options? It'll be a simple CGI form using a radio or check boxes and a button. I'll aggregate the results. To do this I need: - A specific description of what would change - An example of what would break, plus what it would be replaced with. - An explanation of what problems occur when the blemish isn't fixed (what can't you do?) | Given that it's really only a very minor problem, I'd rather not | expend the effort to 'fix" this. Well, if it is a minor problem, it shouldn't be that hard to fix. *evil grins* Clark -- Clark C. Evans Axista, Inc. http://www.axista.com 800.926.5525 XCOLLA Collaborative Project Management Software
None of my code uses explicit use of iterators, and I was very aware of them. My new code that I'm building now does, but it wouldn't take much effort to fix it. I myself personally would rather keep Python "clean" of blemish. For the most part, Python is really free of dragons and that's why I like it. I'm willing to put up with short-term pain for long term gain. Unlike Java or Visual Basic, I intend to be programming in Python 10+ years from now; so from my perspective, it is an investment.
Calling it a dragon sounds way overstated. Another issue is that we can't really fix this retroactively in Python 2.2. Python 2.2 has been elected to be the "Python-in-a-Tie" favored by the Python Business Forum, giving it a very long life expectancy -- 18 months from the first official release of Python-in-a-Tie (probably Python 2.2.2), plus however long it takes people to want to ulgrade after that.
Plus, most features don't get used by the public for at least a year or so as it takes a while for the code-examples to start using them and books to be updated.
| Care to conduct a survey on c.l.py?
Sure. I'll run the survey and report back. What would be the options? It'll be a simple CGI form using a radio or check boxes and a button. I'll aggregate the results. To do this I need:
- A specific description of what would change - An example of what would break, plus what it would be replaced with. - An explanation of what problems occur when the blemish isn't fixed (what can't you do?)
- The mapping between the next() method and the tp_iternext slot in the type object would disappear, and instead the __next__() method would be mapped to this slot. This means that every iterator definition written in Python has to be changed from "def next(self): ..." to "def __next__(self): ...". - There would be a new built-in function, next(), which calls the __next__() method on its argument. - Calls to it.next() will have to be changed to call next(it) instead. (it.__next__() would also work but is not recommended.) - There really isn't anything "broken" about the current situation; it's just that "next" is the only method name mapped to a slot in the type object that doesn't have leading and trailing double underscores. --Guido van Rossum (home page: http://www.python.org/~guido/)
On Wed, Jul 17, 2002 at 11:03:48AM -0400, Guido van Rossum wrote: | | Calling it a dragon sounds way overstated. Oh. I wasn't calling it a dragon... I was stating that Python is dragon free. | > | Care to conduct a survey on c.l.py? | > | > Sure. I'll run the survey and report back. Ok. Here is the survey form for comment before it is posted. http://yaml.org/wk/survey?id=pyiter I'll summarize the results after the survey has run its course... Best, Clark http://yaml.org
Ok. Here is the survey form for comment before it is posted.
http://yaml.org/wk/survey?id=pyiter
I'll summarize the results after the survey has run its course...
Fine with me. Maybe the 4th para ("There really isn't ...") should be moved up so it becomes the 2nd. --Guido van Rossum (home page: http://www.python.org/~guido/)
Guido:
- The mapping between the next() method and the tp_iternext slot in the type object would disappear, and instead the __next__() method would be mapped to this slot.
For what it's worth, I took it upon myself to "fix" this already in Pyrex extension types. So if you make this change, you'll be making Python more compatible with Pyrex. :-) Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+
Guido van Rossum wrote:
- There really isn't anything "broken" about the current situation; it's just that "next" is the only method name mapped to a slot in the type object that doesn't have leading and trailing double underscores.
Are you saying the _only_ reason to rename it is for consistency with the other type slot method names? That's really weak, IMHO, and not worth any kind of backwards incompatibility (which seems unavoidable). Neil
Guido van Rossum wrote:
- There really isn't anything "broken" about the current situation; it's just that "next" is the only method name mapped to a slot in the type object that doesn't have leading and trailing double underscores.
Are you saying the _only_ reason to rename it is for consistency with the other type slot method names? That's really weak, IMHO, and not worth any kind of backwards incompatibility (which seems unavoidable).
Neil
Almost. This means that we're retroactively saying that all objects with a next method are iterators, thereby slightly stomping on the user's namespace. But as long a you don't use such an object as an iterator, it's harmless. And if my position wasn't clear already, I agree it's not worth "fixing". :-) --Guido van Rossum (home page: http://www.python.org/~guido/)
[Guido van Rossum]
- There really isn't anything "broken" about the current situation; it's just that "next" is the only method name mapped to a slot in the type object that doesn't have leading and trailing double underscores.
I'm way behind on the email for this list, but I wanted to chime in with an idea related to this old thread. I know we want to limit the rate of language/feature changes for the business community. At the same time, this situation with iterators is proof that even the best thought out new features can still have a few blemishes that get discovered after they've been incorporated into Python proper. It's just terribly difficult to get anything "right" the very first time, and it would be nice to fix these blemishes sooner, rather than later. So perhaps we need some sort of concept of a "grace period" on brand-new features during which blemishes can be polished off, even if the polishing breaks backward compatibility. After the grace period, breaking backward compatibility becomes a higher priority. Since we are talking about backward compatibility only as it relates to the brand-new features themselves, Python-In-A-Tie folks can avoid the issue altogether by not using the new features during the grace period. Would something like this be an acceptable compromise? -- Patrick K. O'Brien Orbtech ----------------------------------------------- "Your source for Python programming expertise." ----------------------------------------------- Web: http://www.orbtech.com/web/pobrien/ Blog: http://www.orbtech.com/blog/pobrien/ Wiki: http://www.orbtech.com/wiki/PatrickOBrien -----------------------------------------------
[Patrick K. O'Brien]
So perhaps we need some sort of concept of a "grace period" on brand-new features during which blemishes can be polished off, even if the polishing breaks backward compatibility. [...] Would something like this be an acceptable compromise?
I know it was not the original intent of importing from __future__, but maybe this could be linked with __future__ as well. People wanting guaranteed stability should just never import from __future__ at all, It's just an idea, I'm not pushing for it. I do not even like it... For one, I'm quite ready to adjust the things I'm responsible for, whenever the need arises, not being part of the bullet tied to Python development. On the other hand, I know administrators not far from me that get very upset when they learn about specification changes for any software they rely on for production, and I do understand their need for a peaceful life. Surely, it's not easy to please everybody. In French, there is this nice proverb, which is in fact the last verses of one of LaFontaine's fables: "On ne peut, dit le meunier, plaire à tout le monde et à son père: bien faire et laisser braire!". In word, that means "do well and let mumble!". :-) -- François Pinard http://www.iro.umontreal.ca/~pinard
On Sun, Aug 04, 2002 at 04:07:08PM -0500, Patrick K. O'Brien wrote:
[Guido van Rossum]
- There really isn't anything "broken" about the current situation; it's just that "next" is the only method name mapped to a slot in the type object that doesn't have leading and trailing double underscores.
I'm way behind on the email for this list, but I wanted to chime in with an idea related to this old thread. I know we want to limit the rate of language/feature changes for the business community. At the same time, this situation with iterators is proof that even the best thought out new features can still have a few blemishes that get discovered after they've been incorporated into Python proper.
I think I have a reasonable solution for the re-iteration blemish in the iteration protcol without breaking backward compatibility: http://mail.python.org/pipermail/python-dev/2002-July/026960.html
So perhaps we need some sort of concept of a "grace period" on brand-new features during which blemishes can be polished off, even if the polishing breaks backward compatibility. After the grace period, breaking backward compatibility becomes a higher priority.
Giving more people a chance to play with new features before they are finalized is a very good idea. When a significant new feature is checked in to the CVS a preview version can be released in source and precompiled form to encourage more people to test it. Most CVS snapshots seem stable enough for a programmer's daily use. A good example of such a significant new feature is the source encoding just checked in. Oren
[Oren Tirosh]
On Sun, Aug 04, 2002 at 04:07:08PM -0500, Patrick K. O'Brien wrote:
[Guido van Rossum]
- There really isn't anything "broken" about the current situation; it's just that "next" is the only method name mapped to a slot in the type object that doesn't have leading and trailing double underscores.
But would you define it as __next__() if you had to do it again? A __next__()/next() relationship does seem to fit more neatly.
I'm way behind on the email for this list, but I wanted to chime in with an idea related to this old thread. I know we want to limit the rate of language/feature changes for the business community. At the same time, this situation with iterators is proof that even the best thought out new features can still have a few blemishes that get discovered after they've been incorporated into Python proper.
I think I have a reasonable solution for the re-iteration blemish in the iteration protcol without breaking backward compatibility:
http://mail.python.org/pipermail/python-dev/2002-July/026960.html
So perhaps we need some sort of concept of a "grace period" on brand-new features during which blemishes can be polished off, even if the
breaks backward compatibility. After the grace period, breaking backward compatibility becomes a higher priority.
Giving more people a chance to play with new features before they are finalized is a very good idea. When a significant new feature is checked in to the CVS a preview version can be released in source and precompiled
polishing form
to encourage more people to test it. Most CVS snapshots seem stable enough for a programmer's daily use.
Given the general lack of alpha- and beta-testing there'd be very little feedback. I seem to remember that the CVS snapshots went missing in action recently without anyone noticing, which shows that they aren't much used, and I guess the same would be true of preview versions. Tracking the CVS repository will test such features, but getting more testing than that would be difficult. I *do* agree that such feature testing would be inestimably useful.
A good example of such a significant new feature is the source encoding just checked in.
Indeed. regards ----------------------------------------------------------------------- Steve Holden http://www.holdenweb.com/ Python Web Programming http://pydish.holdenweb.com/pwp/ -----------------------------------------------------------------------
[me]
- There really isn't anything "broken" about the current situation; it's just that "next" is the only method name mapped to a slot in the type object that doesn't have leading and trailing double underscores.
[Patrick]
I'm way behind on the email for this list, but I wanted to chime in with an idea related to this old thread. I know we want to limit the rate of language/feature changes for the business community. At the same time, this situation with iterators is proof that even the best thought out new features can still have a few blemishes that get discovered after they've been incorporated into Python proper. It's just terribly difficult to get anything "right" the very first time, and it would be nice to fix these blemishes sooner, rather than later.
So perhaps we need some sort of concept of a "grace period" on brand-new features during which blemishes can be polished off, even if the polishing breaks backward compatibility. After the grace period, breaking backward compatibility becomes a higher priority. Since we are talking about backward compatibility only as it relates to the brand-new features themselves, Python-In-A-Tie folks can avoid the issue altogether by not using the new features during the grace period.
Would something like this be an acceptable compromise?
I guess we could explicitly label certain features as experimental in 2.3. I don't think we can interpret 2.2 like this retroactively -- while the new type stuff was labeled experimental at some point, the iterators and generators were not, and the new type stuff was pretty much fixed by releasing 2.2.1 (and by declaring 2.2 the tie-wearing Python). --Guido van Rossum (home page: http://www.python.org/~guido/)
On Wed, Jul 17, 2002, Clark C . Evans wrote:
Sure. I'll run the survey and report back. What would be the options? It'll be a simple CGI form using a radio or check boxes and a button. I'll aggregate the results.
Make sure you ask for e-mail addresses to prevent duplicates. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Project Vote Smart: http://www.vote-smart.org/
On Wed, 17 Jul 2002, Guido van Rossum wrote:
Given that it's really only a very minor problem, I'd rather not expend the effort to 'fix" this.
I do agree that there is a policy decision to be made about when it's appropriate to make a protocol change, and that this should be left to you, Guido. But i think this is more than a minor problem. This is a namespace collision problem, and that's significant. Naming the method "next" means that any object with a "next" method cannot be adapted to support the iterator protocol. Unfortunately "next" is a pretty common word and it's quite possible that such a method name is already in use. So it's worth thinking through. -- ?!ng
But i think this is more than a minor problem. This is a namespace collision problem, and that's significant. Naming the method "next" means that any object with a "next" method cannot be adapted to support the iterator protocol. Unfortunately "next" is a pretty common word and it's quite possible that such a method name is already in use.
Can you explain this? Last time I checked CVS, PEP 246 wasn't implemented yet, so I don't think you mean "adapted" in that sense. Generally speaking, iterator implementations aren't created by making changes to an existing class -- they're created by creating new a class. The only change to *existing* classes needed is the addition of an __iter__ method to the underlying container object. So I'm not sure what you mean. --Guido van Rossum (home page: http://www.python.org/~guido/)
On Wed, 17 Jul 2002, Guido van Rossum wrote:
the method "next" means that any object with a "next" method cannot be adapted to support the iterator protocol. Unfortunately "next" is a pretty common word and it's quite possible that such a method name is already in use.
Can you explain this? Last time I checked CVS, PEP 246 wasn't implemented yet, so I don't think you mean "adapted" in that sense.
No, i didn't -- i just meant in the more general sense. To make an object support one of the other internal protocols, like repr(), you can just add a special method (in Python) or fill in a slot (in C).
Generally speaking, iterator implementations aren't created by making changes to an existing class
Well, i guess that's part of the protocol philosophy. There exist cursor-like objects that would be natural candidates for being used like iterators (files are one example, database cursors are another). Choosing __next__ makes it possible to add support to an existing object when appropriate, instead of requiring an auxiliary object regardless of whether it's appropriate or inappropriate. To me, the former feels like the more typical Python thing to do, because it's consistent with the way all the other protocols work. So it's from this perspective that "next" without underscores is a wart to me. For example, when something is container-like you can implement __getitem__ on the object itself, and then you can use [] with the object. Some objects let you fetch containers and some objects implement __getitem__ on their own. But we don't force everybody to provide a convert-to-container operation in all cases before allowing them to provide __getitem__. -- ?!ng
Well, i guess that's part of the protocol philosophy. There exist cursor-like objects that would be natural candidates for being used like iterators (files are one example, database cursors are another). Choosing __next__ makes it possible to add support to an existing object when appropriate, instead of requiring an auxiliary object regardless of whether it's appropriate or inappropriate.
OK, that's clear.
To me, the former feels like the more typical Python thing to do, because it's consistent with the way all the other protocols work. So it's from this perspective that "next" without underscores is a wart to me.
Yes.
For example, when something is container-like you can implement __getitem__ on the object itself, and then you can use [] with the object. Some objects let you fetch containers and some objects implement __getitem__ on their own. But we don't force everybody to provide a convert-to-container operation in all cases before allowing them to provide __getitem__.
Correct. Now, weren't you a co-author of the Iterator PEP? I wish you'd brought this up then. Or maybe you did, and I overruled you. Sorry then. But I don't think we can withdraw this so easily. It's not the end of the world. --Guido van Rossum (home page: http://www.python.org/~guido/)
On Wed, 17 Jul 2002, Guido van Rossum wrote: [...]
OK, that's clear. [...] Yes. [...] Correct.
Neat! So much understanding.
Now, weren't you a co-author of the Iterator PEP? I wish you'd brought this up then. Or maybe you did, and I overruled you. Sorry then.
Indeed, i wrote the first draft of the PEP, though it was very different from what we have today; it's been largely rewritten. The big design changes happened at the iterator BOF, so unfortunately there's no e-mail record of the debate. I recall that __iter__ made me uncomfortable, but i don't recall to what extent i expressed this. I don't remember whether there was any overruling. But it doesn't really matter; it's today now, and here we are. It is true that i failed to understand or express the issue well enough to have an effect on the design. I will cheerfully accept blame if it somehow means we'll end up with a nicer language. :)
But I don't think we can withdraw this so easily. It's not the end of the world.
I would be pleased to see a migration path (perhaps along the lines of Dave's suggestion, with warnings for a while), but i won't throw myself off a bridge if it doesn't happen. I do think there is some potential for errors caused by misunderstandings about whether or not "for x in y" is destructive. That's the thing that worries me the most. I think this is the main reason why the old practice of abusing __getitem__ was bad, and thus helped to motivate iterators in the first place. It seems serious enough that migrating to something that distinguishes destructive-for from non-destructive-for could indeed be worth the cost. The destructive-for issue may seem superficially unrelated to the __next__-naming issue. As i see it, the __next__-naming issue is related to the mandatory-__iter__ issue (because some people view __iter__ as a type flag), which is related to the destructive-for issue. -- ?!ng
I do think there is some potential for errors caused by misunderstandings about whether or not "for x in y" is destructive. That's the thing that worries me the most. I think this is the main reason why the old practice of abusing __getitem__ was bad, and thus helped to motivate iterators in the first place. It seems serious enough that migrating to something that distinguishes destructive-for from non-destructive-for could indeed be worth the cost.
I'm not sure I understand this (this seems to be my week for not understanding what people write :-( ). First of all, I'm not sure what exactly the issue is with destructive for-loops. If I have a function that contains a for-loop over its argument, and I pass iter(x) as the argument, then the iterator is destroyed in the process, but x may or may not be, depending on what it is. Maybe the for-loop is a red herring? Calling next() on an iterator may or may not be destructive on the underlying "sequence" -- if it is a generator, for example, I would call it destructive. Perhaps you're trying to assign properties to the iterator abstraction that aren't really there? Next, I'm not sure how renaming next() to __next__() would affect the situation w.r.t. the destructivity of for-loops. Or were you talking about some other migration? --Guido van Rossum (home page: http://www.python.org/~guido/)
On Thu, 18 Jul 2002, Guido van Rossum wrote:
First of all, I'm not sure what exactly the issue is with destructive for-loops.
It's just not the way i expect for-loops to work. Perhaps we would need to survey people for objective data, but i feel that most people would be surprised if for x in y: print x for x in y: print x did not print the same thing twice, or if if x in y: print 'got it' if x in y: print 'got it' did not do the same thing twice. I realize this is my own opinion, but it's a fairly strong impression i have. Even if it's okay for for-loops to destroy their arguments, i still think it sets up a bad situation: we may end up with functions manipulating sequence-like things all over, but it becomes unclear whether they destroy their arguments or not. It becomes possible to write a function which sometimes destroys its argument and sometimes doesn't. Bugs get deeper and harder to find. I believe this is where the biggest debate lies: whether "for" should be non-destructive. I realize we are currently on the other side of the fence, but i foresee enough potential pain that i would like you to consider the value of keeping "for" loops non-destructive.
Maybe the for-loop is a red herring? Calling next() on an iterator may or may not be destructive on the underlying "sequence" -- if it is a generator, for example, I would call it destructive.
Well, for a generator, there is no underlying sequence. while 1: print next(gen) makes it clear that there is no sequence, but for x in gen: print x seems to give me the impression that there is.
Perhaps you're trying to assign properties to the iterator abstraction that aren't really there?
I'm assigning properties to "for" that you aren't. I think they are useful properties, though, and worth considering. I don't think i'm assigning properties to the iterator abstraction; i expect iterators to destroy themselves. But the introduction of iterators, in the way they are now, breaks this property of "for" loops that i think used to hold almost all the time in Python, and that i think holds all the time in almost all other languages.
Next, I'm not sure how renaming next() to __next__() would affect the situation w.r.t. the destructivity of for-loops. Or were you talking about some other migration?
The connection is indirect. The renaming is related to: (a) making __next__() a real, honest-to-goodness protocol independent of __iter__; and (b) getting rid of __iter__ on iterators. It's the presence of __iter__ on iterators that breaks the non-destructive-for property. I think the renaming of next() to __next__() is a good idea in any case. It is distant enough from the other issues that it can be done independently of any decisions about __iter__. -- ?!ng
On Fri, Jul 19, 2002, Ka-Ping Yee wrote:
I believe this is where the biggest debate lies: whether "for" should be non-destructive. I realize we are currently on the other side of the fence, but i foresee enough potential pain that i would like you to consider the value of keeping "for" loops non-destructive.
Consider for line in f.readlines(): in any version of Python. Adding iterators made this more convenient and efficient, but I just can't see your POV in the general case. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Project Vote Smart: http://www.vote-smart.org/
On Friday 19 July 2002 04:23 pm, Aahz wrote:
On Fri, Jul 19, 2002, Ka-Ping Yee wrote:
I believe this is where the biggest debate lies: whether "for" should be non-destructive. I realize we are currently on the other side of the fence, but i foresee enough potential pain that i would like you to consider the value of keeping "for" loops non-destructive.
Consider
for line in f.readlines():
in any version of Python. Adding iterators made this more convenient and efficient, but I just can't see your POV in the general case.
The 'for', per se, is destroying nothing here -- the object returned by f.readlines() is destroyed by its reference count falling to 0 after the for, just as, say: for c in raw_input(): or x = raw_input()+raw_input() and so forth. I.e., any object gets destroyed if there are no more references to it -- that's a completely different issue. In all of these cases, you can, if you want, just bind a name to the object as you call the function, then use that object over and over again at will. _Method calls_ mutating the object on which they're called is indeed quite common, of course. f.readlines() does mutate f's state. But the object it returns, as long as there are references to it, remains. Alex
aahz wrote:
I believe this is where the biggest debate lies: whether "for" should be non-destructive. I realize we are currently on the other side of the fence, but i foresee enough potential pain that i would like you to consider the value of keeping "for" loops non-destructive.
Consider
for line in f.readlines():
in any version of Python.
and? for-in doesn't modify the object returned by f.readlines(), and never has. </F>
On Fri, Jul 19, 2002, Fredrik Lundh wrote:
aahz wrote:
Ping:
I believe this is where the biggest debate lies: whether "for" should be non-destructive. I realize we are currently on the other side of the fence, but i foresee enough potential pain that i would like you to consider the value of keeping "for" loops non-destructive.
Consider
for line in f.readlines():
in any version of Python.
and? for-in doesn't modify the object returned by f.readlines(), and never has.
While technically true, that seems to be sidestepping the point from my POV. I think that few people see for loops as inherently non-destructive due to the use case I presented above. Beyond that, the for loop is itself inherently mutating in Python older than 2.2, which I see as functionally equivalent to "destructive"; the primary intention of iterators (from my recollections of the tenor of the discussions) was to package that mutating state in a way that could capture the iterability of objects other than sequences. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Project Vote Smart: http://www.vote-smart.org/
aahz wrote:
While technically true, that seems to be sidestepping the point from my POV.
really? are you arguing that when Ping says that for-in shouldn't destroy the target, he's really saying that python shouldn't allow methods to have side effects if they can be called from an expression used in a for-in statement? why would he say that?
I think that few people see for loops as inherently non-destructive due to the use case I presented above.
I think most people can tell the difference between an object and a method with side-effects. I doubt they would be able to get much done in Python if they couldn't.
Beyond that, the for loop is itself inherently mutating in Python older than 2.2
in what sense? it calls the object's __getitem__ method with an integer index value, until it gets an IndexError. in what way is that "inherently mutating"? </F>
On Fri, Jul 19, 2002, Fredrik Lundh wrote:
aahz wrote:
While technically true, that seems to be sidestepping the point from my POV.
really? are you arguing that when Ping says that for-in shouldn't destroy the target, he's really saying that python shouldn't allow methods to have side effects if they can be called from an expression used in a for-in statement? why would he say that?
I'm saying that I think Ping is overstating the case in terms of the way people look at things. Whatever the technicalities of an implicit method versus an explicit method, people have long used for loops in destructive ways.
I think that few people see for loops as inherently non-destructive due to the use case I presented above.
I think most people can tell the difference between an object and a method with side-effects. I doubt they would be able to get much done in Python if they couldn't.
To be sure. But I don't think there's much difference in the way for loops are actually used. Continuing my point above, I see the current usage of for loops as calling an implicit method with side-effects as opposed to an explicit method with side-effects. Lo and behold! That's actually the case.
Beyond that, the for loop is itself inherently mutating in Python older than 2.2
in what sense? it calls the object's __getitem__ method with an integer index value, until it gets an IndexError. in what way is that "inherently mutating"?
And how does that integer index change? The for loop in Python <2.2 has an internal state object. Iterators are the external manifestation of that state object, generalized to objects other than sequences. I'm surprised that anyone is surprised that the state object gets mutated/destroyed. I'm also surprised that people are surprised about what happens when that state object is coupled to an inherently mutating object such as file objects. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Project Vote Smart: http://www.vote-smart.org/
"A" == Aahz <aahz@pythoncraft.com> writes:
A> The for loop in Python <2.2 has an internal state object. A> Iterators are the external manifestation of that state object, A> generalized to objects other than sequences. I'm surprised A> that anyone is surprised that the state object gets A> mutated/destroyed. I'm also surprised that people are A> surprised about what happens when that state object is coupled A> to an inherently mutating object such as file objects. Well said. -Barry
On Fri, 19 Jul 2002, Aahz wrote:
And how does that integer index change? The for loop in Python <2.2 has an internal state object. Iterators are the external manifestation of that state object, generalized to objects other than sequences. I'm surprised that anyone is surprised that the state object gets mutated/destroyed. I'm also surprised that people are surprised about what happens when that state object is coupled to an inherently mutating object such as file objects.
All the surprises I see stem from confusion between what is the object being iterated over, and what is the object holding the state of the iteration. Iterators returning self for __iter__() is the major cause of this confusion. I agree that in the general case, the boundary may not always be clear, but Ping's proposal cleans up what's seen 99.9% of the time. Pending the pain of the yet unseen migration plan, I'm +1 on removing __iter__ from all core iterators +1 on renaming next() to __next__() +1 on presenting file objects as iterators rather than iterables +0 on the new 'for x from y' syntax /Paul
On Fri, Jul 19, 2002, Paul Svensson wrote:
Pending the pain of the yet unseen migration plan, I'm +1 on removing __iter__ from all core iterators +1 on renaming next() to __next__() +1 on presenting file objects as iterators rather than iterables +0 on the new 'for x from y' syntax
I'd vote this way: -0 on removing __iter__ +1 on renaming next() to __next__() +0 on presenting file objects as iterators +1 on finishing up the patch that fixes the xreadlines() mess -1 on for x from y -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Project Vote Smart: http://www.vote-smart.org/
"KY" == Ka-Ping Yee <ping@zesty.ca> writes:
KY> It's just not the way i expect for-loops to work. Perhaps we KY> would need to survey people for objective data, but i feel KY> that most people would be surprised if | for x in y: print x | for x in y: print x KY> did not print the same thing twice, or if As with many things Pythonic, it all depends. Specifically, I think it depends on the type of y. Certainly in a pre-iterator world there was little preventing (or encouraging?) you to write y's __getitem__() non-destructively, so I don't see much difference if y is an iterator. KY> Even if it's okay for for-loops to destroy their arguments, i KY> still think it sets up a bad situation: we may end up with KY> functions manipulating sequence-like things all over, but it KY> becomes unclear whether they destroy their arguments or not. KY> It becomes possible to write a function which sometimes KY> destroys its argument and sometimes doesn't. Bugs get deeper KY> and harder to find. How is that different than pre-iterators with __getitem__()? KY> I'm assigning properties to "for" that you aren't. I think KY> they are useful properties, though, and worth considering. These aren't properties of for-loops, they are properties of the things you're iterating (little-i) over. -Barry
[Ping]
... I believe this is where the biggest debate lies: whether "for" should be non-destructive. I realize we are currently on the other side of the fence, but i foresee enough potential pain that i would like you to consider the value of keeping "for" loops non-destructive.
I'm having a hard time getting excited about this. If you had made this argument before the iterator protocol was implemented, it may have been more or less intriguing. But it was implemented and released some time ago, and I just haven't seen any evidence of such problems on c.l.py, the Help list, or the Tutor list (all of which I still pay significant attention to). "for" did and does work in accord with a simple protocol, and whether that's "destructive" depends on how the specific objects involved implement their pieces of the protocol, not on the protocol itself. The same is true of all of Python's hookable protocols. What's so special about "for" that it should pretend to deliver purely functional behavior in a highly non-functional language? State mutates. That's its purpose <wink>.
I'm having a hard time getting excited about this. If you had made this argument before the iterator protocol was implemented, it may have been more or less intriguing. But it was implemented and released some time ago, and I just haven't seen any evidence of such problems on c.l.py, the Help list, or the Tutor list (all of which I still pay significant attention to).
This is an important argument IMO that the theorists here seem to be missing somewhat. Releasing a feature and monitoring feedback is a good way of user testing, something that has been ignored too often by language designers. Elegant or minimal abstractions have their place; but in the end, users are more important. Quoting Steven Pemberton's home page (http://www.cwi.nl/~steven/): ABC: Simple but Powerful Interactive Programming Language and Environment. : A Simple but Powerful Interactive Programming Language and Environment. We did requirements and task analysis, iterative design, and user testing. You'd almost think programming languages were an interface between people and computers. Now famous because Python was strongly influenced by it. I still favor this approach to language design. --Guido van Rossum (home page: http://www.python.org/~guido/)
On Fri, 19 Jul 2002, Tim Peters wrote:
"for" did and does work in accord with a simple protocol, and whether that's "destructive" depends on how the specific objects involved implement their pieces of the protocol, not on the protocol itself. The same is true of all of Python's hookable protocols.
Name any protocol for which the question "does this mutate?" has no answer. (I ask you to accept that __call__ is a special case.)
What's so special about "for" that it should pretend to deliver purely functional behavior in a highly non-functional language?
Who said anything about functional behaviour? I'm not requiring that looping *never* mutate. I just want to be able to tell *whether* it will. -- ?!ng
[Ping]
Name any protocol for which the question "does this mutate?" has no answer.
Heh -- you must not use Zope much <0.6 wink>. I'm hard pressed to think of a protocol where that does have a reliable answer. Here: x1 = y.z x2 = y.z Are x1 and x2 the same object after that? At least equal? Did either line mutate y? You simply can't know without knowing how y's type implements __getattr__, and with the introduction of computed attributes (properties) it's just going to get muddier.
(I ask you to accept that __call__ is a special case.)
It's not to me -- if a protocol invokes user-defined Python code, there's nothing you can say about mutability "in general", and people do both use and abuse that.
What's so special about "for" that it should pretend to deliver purely functional behavior in a highly non-functional language?
Who said anything about functional behaviour? I'm not requiring that looping *never* mutate. I just want to be able to tell *whether* it will.
I don't blame you, and sometimes I'd like to know whether y.z (or "y += z", etc) mutates y too. It cuts deeper than loops, so a loop-focused gimmick seems inadequate to me (provided "something needs to be done about it" at all -- I'm not sure, but doubt it).
On Sun, 21 Jul 2002, Tim Peters wrote:
x1 = y.z x2 = y.z
Are x1 and x2 the same object after that? At least equal? Did either line mutate y? You simply can't know without knowing how y's type implements __getattr__, and with the introduction of computed attributes (properties) it's just going to get muddier.
That's not the point. You could claim that *any* polymorphism in Python is useless by the same argument. But Python is not useless; Python code really is reusable; and that's because there are good conventions about what the behaviour *should* be. People who do really find this upsetting should go use a strongly-typed language. In general, getting "y.z" should be idempotent, and should not mutate y. I think everyone would agree on the concept. If it does mutate y with visible effects, then the implementor is breaking the convention. Sure, Python won't prevent you from writing a file-like class where you write the string "blah" to the file by fetching f.blah and you close the file by mentioning f[42]. But when users of this class then come running after you with pointed sticks, i'm not going to fight them off. :) This is a list of all the type slots accessible from Python, before iterators (i.e. pre-2.2). Beside each is the answer to the question: Suppose you look at the value of x, then do this operation to x, then look at the value of x. Should we expect the two observed values to be the same or different? nb_add same nb_subtract same nb_multiply same nb_divide same nb_remainder same nb_divmod same nb_power same nb_negative same nb_positive same nb_absolute same nb_nonzero same nb_invert same nb_lshift same nb_rshift same nb_and same nb_xor same nb_or same nb_coerce same nb_int same nb_long same nb_float same nb_oct same nb_hex same nb_inplace_add different nb_inplace_subtract different nb_inplace_multiply different nb_inplace_divide different nb_inplace_remainder different nb_inplace_power different nb_inplace_lshift different nb_inplace_rshift different nb_inplace_and different nb_inplace_xor different nb_inplace_or different nb_floor_divide same nb_true_divide same nb_inplace_floor_divide different nb_inplace_true_divide different sq_length same sq_concat same sq_repeat same sq_item same sq_slice same sq_ass_item different sq_ass_slice different sq_contains same sq_inplace_concat different sq_inplace_repeat different mp_length same mp_subscript same mp_ass_subscript different bf_getreadbuffer same bf_getwritebuffer same bf_getsegcount same bf_getcharbuffer same tp_print same tp_getattr same tp_setattr different tp_compare same tp_repr same tp_hash same tp_call ? tp_str same tp_getattro same tp_setattro different In every case except for __call__, there exists a canonical answer. We all rely on these conventions every time we write a Python program. And learning these conventions is a necessary part of learning Python. You can argue, as Guido has, that in the particular case of for-loops distinguishing between mutating and non-mutating behaviour is not worth the trouble. But you can't say that we should give up on the whole concept *in general*. -- ?!ng
[Ping]
Name any protocol for which the question "does this mutate?" has no answer.
[Tim]
Heh -- you must not use Zope much <0.6 wink>. I'm hard pressed to think of a protocol where that does have a reliable answer. Here:
x1 = y.z x2 = y.z
Are x1 and x2 the same object after that? At least equal? Did either line mutate y? You simply can't know without knowing how y's type implements __getattr__, and with the introduction of computed attributes (properties) it's just going to get muddier.
[Ping]
That's not the point.
It answered the question you asked.
You could claim that *any* polymorphism in Python is useless by the same argument.
It's your position that "for" is semi-useless because of the possibility for mutation. That isn't my position, and that some people write mutating __getattr__ (etc) doesn't make y.z (etc) unattractive to me either.
But Python is not useless; Python code really is reusable;
Provided you play along with code's often-undocumented preconditions, absolutely.
and that's because there are good conventions about what the behaviour *should* be. People who do really find this upsetting should go use a strongly-typed language.
Sorry, I couldn't follow this part. It's a fact that mutating __getattr__ (etc) implementations exist, and it's a fact that I'm not much bothered by it. I don't suggest they move to a different language, either (assuming that by "strongly-typed" you meant "statically typed" -- Python is already strongly typed).
In general, getting "y.z" should be idempotent, and should not mutate y. I think everyone would agree on the concept. If it does mutate y with visible effects, then the implementor is breaking the convention.
No argument, although I have to emphasize that it's *just* "a convention", and repeat my prediction that the introduction of properties is going to make this particular convention less reliable in real life over time.
Sure, Python won't prevent you from writing a file-like class where you write the string "blah" to the file by fetching f.blah and you close the file by mentioning f[42]. But when users of this class then come running after you with pointed sticks, i'm not going to fight them off. :)
While properties aren't going to stop you from saying self.transactionid = self.session_manager.newid and get a new result each time you do it. Spelling no-argument method calls without parens is popular in some other languages, and it's "a feature" that properties make that easy to spell in Python 2.2 too.
This is a list of all the type slots accessible from Python, before iterators (i.e. pre-2.2). Beside each is the answer to the question:
Suppose you look at the value of x, then do this operation to x, then look at the value of x. Should we expect the two observed values to be the same or different? ...
I don't know why you're bothering with this, but it's got holes. For example, some people overly fond <wink> of C++ enjoy overloading "<<" in highly non-functional ways. For another, the section on the inplace operators seems confused; after x1 = x x += y there's no single best answer to whether x is x1 is true, or to whether the value of x1 before is == to the value of x1 after. The most popular convention*s* for the inplace operators are - If x is of a mutable type, then x is x1 after, and the pre- and post- values of x1 are !=. - If x is of an immutable type, then x is not x1 after, and the pre- and post- values of x1 are ==. The second case is forced, but the first one isn't. In light of all that, the intended meaning of "different" in
nb_inplace_add different
is either incorrect, or so weak that it's not worth much. I suppose you mean that, in Python code x += y the object bound to the name "x" before the operation most likely has a different (!=) value than the object bound to the name "x" after the operation. That's true, but relies on what the generated code does *with* the result of nb_inplace_add. If you just call the method x.__iadd__(y) there's simply no guessing whether x is "different" as a result (it never is for x of an immutable type, it usually is for x of a mutable type, and there's no way to tell the difference just by staring at x).
nb_hex same
I sure hope so <wink>.
... In every case except for __call__, there exists a canonical answer.
If by "canonical" you mean "most common", sure, with at least the exceptions noted above.
We all rely on these conventions every time we write a Python program. And learning these conventions is a necessary part of learning Python.
You can argue, as Guido has, that in the particular case of for-loops distinguishing between mutating and non-mutating behavior is not worth the trouble. But you can't say that we should give up on the whole concept *in general*.
To the contrary, in a language with state it's crucial for the programmer to know when they're mutating state. If you use a mutating __getattr__, you better be careful that the code you call doesn't rely on __getattr__ not mutating; if you use an iterator object, you better be careful that the code you call doesn't require something stronger than an iterator object. It's all the same to me, and as Guido repeated until he got tired of it, the possibility for "for" and "x in y" (etc) to mutate has always been there, and has always been used. I didn't and still don't have any notable real-life problems dealing with this, although I too have gotten bit when passing a generator-iterator to code that required a sequence. I suppose the difference is that I said "oops! I screwed up!", fixed it, and moved on. It would have helped most if Python had a scheme for declaring and enforcing interfaces, *and* I bothered to use it (doubtful); second-most if the docs for the callee had spelled out its preconditions better; I doubt it would have helped at all if a variant spelling of "for" had been used, because I didn't eyeball the body of the callee first. As is, I just stuffed the generator-iterator object inside tuple() at the call site, and everything was peachy. That took a lot less effort than reading this thread <0.9 wink>.
It's just not the way i expect for-loops to work. Perhaps we would need to survey people for objective data, but i feel that most people would be surprised if
for x in y: print x for x in y: print x
did not print the same thing twice, or if
if x in y: print 'got it' if x in y: print 'got it'
did not do the same thing twice. I realize this is my own opinion, but it's a fairly strong impression i have.
I think it's a naive persuasion that doesn't hold under scrutiny. For a long time people have faked iterators by providing pseudo-sequences that did unspeakable things. In general, I'm pretty sure that if I asked an uninitiated user what "for line in file" would do, if it did anything, they would understand that if you tried that a second time you'd hit EOF right away.
Even if it's okay for for-loops to destroy their arguments, i still think it sets up a bad situation: we may end up with functions manipulating sequence-like things all over, but it becomes unclear whether they destroy their arguments or not. It becomes possible to write a function which sometimes destroys its argument and sometimes doesn't. Bugs get deeper and harder to find.
This sounds awfully similar to the old argument "functions (as opposed to procedures) should never have side effects". ABC implemented that literally (the environment was saved and restored around function calls, with an exception for the seed for the built-in random generator), with the hope that it would provide fewer surprises. It did the opposite: it drove people crazy because the language was trying to be smarter than them.
I believe this is where the biggest debate lies: whether "for" should be non-destructive. I realize we are currently on the other side of the fence, but i foresee enough potential pain that i would like you to consider the value of keeping "for" loops non-destructive.
I don't see any real debate. I only see you chasing windmills. Sorry. For-loops have had the possibility to destroy their arguments since the day __getitem__ was introduced.
Maybe the for-loop is a red herring? Calling next() on an iterator may or may not be destructive on the underlying "sequence" -- if it is a generator, for example, I would call it destructive.
Well, for a generator, there is no underlying sequence.
while 1: print next(gen)
makes it clear that there is no sequence, but
for x in gen: print x
seems to give me the impression that there is.
This seems to be a misrepresentation. The idiom for using any iterator (not just generators) *without* using a for-loop would have to be something like: while 1: try: item = it.next() # or it.__next__() or next(it) except StopIteration: break ...do something with item... (Similar to the traditional idiom for looping over the lines of a file.) The for-loop over an iterator was invented so you could write this as: for item in it: ...do something with item... I'm not giving that up so easily!
Perhaps you're trying to assign properties to the iterator abstraction that aren't really there?
I'm assigning properties to "for" that you aren't. I think they are useful properties, though, and worth considering.
I'm trying to be open-minded, but I just don't see it. The for loop is more flexible than you seem to want it to be. Alas, it's been like this for years, and I don't think the for-loop needs a face lift.
I don't think i'm assigning properties to the iterator abstraction; i expect iterators to destroy themselves. But the introduction of iterators, in the way they are now, breaks this property of "for" loops that i think used to hold almost all the time in Python, and that i think holds all the time in almost all other languages.
Again, the widespread faking of iterators using destructive __getitem__ methods that were designed to be only used in a for-loop defeats your assertion.
Next, I'm not sure how renaming next() to __next__() would affect the situation w.r.t. the destructivity of for-loops. Or were you talking about some other migration?
The connection is indirect. The renaming is related to: (a) making __next__() a real, honest-to-goodness protocol independent of __iter__;
next() is a real, honest-to-goodness protocol now, and it is independent of __iter__() now.
and (b) getting rid of __iter__ on iterators. It's the presence of __iter__ on iterators that breaks the non-destructive-for property.
So you prefer the while-loop version above over the for-loop version? Gotta be kidding.
I think the renaming of next() to __next__() is a good idea in any case. It is distant enough from the other issues that it can be done independently of any decisions about __iter__.
Yeah, it's just a pain that it's been deployed in Python 2.2 since last December, and by the time 2.3 is out it will probably have been at least a full year. Worse, 2.2 is voted to be Python-in-a-Tie, giving that particular idiom a very long lifetime. I simply don't think we can break compatibility that easily. Remember the endless threads we've had about the pace of change and stability. We have to live with warts, alas. And this is a pretty minor one if you ask me. (I realize that you're proposing another way out in a separate message. I'll reply to that next. Since you changed the subject, I can't wery well reply to it here.) --Guido van Rossum (home page: http://www.python.org/~guido/)
On Fri, Jul 19, 2002, Guido van Rossum wrote:
Ping:
I think the renaming of next() to __next__() is a good idea in any case. It is distant enough from the other issues that it can be done independently of any decisions about __iter__.
Yeah, it's just a pain that it's been deployed in Python 2.2 since last December, and by the time 2.3 is out it will probably have been at least a full year. Worse, 2.2 is voted to be Python-in-a-Tie, giving that particular idiom a very long lifetime. I simply don't think we can break compatibility that easily. Remember the endless threads we've had about the pace of change and stability. We have to live with warts, alas. And this is a pretty minor one if you ask me.
Is this a Pronouncement, or are we still waiting on the results of the survey? Note that several people have suggested a multi-release strategy for fixing this problem; does that make any difference? -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Project Vote Smart: http://www.vote-smart.org/
Aahz wrote:
On Fri, Jul 19, 2002, Guido van Rossum wrote:
Ping:
I think the renaming of next() to __next__() is a good idea in any case. It is distant enough from the other issues that it can be done independently of any decisions about __iter__.
Yeah, it's just a pain that it's been deployed in Python 2.2 since last December, and by the time 2.3 is out it will probably have been at least a full year. Worse, 2.2 is voted to be Python-in-a-Tie, giving that particular idiom a very long lifetime. I simply don't think we can break compatibility that easily. Remember the endless threads we've had about the pace of change and stability. We have to live with warts, alas. And this is a pretty minor one if you ask me.
Is this a Pronouncement, or are we still waiting on the results of the survey? Note that several people have suggested a multi-release strategy for fixing this problem; does that make any difference?
Would it be good to use __next__() if it exists, else try next()? This doesn't fix the current 'wart,' however, it could allow moving closer to the desired end. It could cause confusion. For compatability, one would only need to do: next = __next__ or vica versa. Not sure this is worth it. But if there is a transition, it could ease the pain. Neal
Would it be good to use __next__() if it exists, else try next()?
Then the code in typeobject.c (e.g. resolve_slotdups) would have to map tp_iternext to *both* __next__ and next.
This doesn't fix the current 'wart,' however, it could allow moving closer to the desired end. It could cause confusion. For compatability, one would only need to do:
next = __next__
or vica versa.
Not sure this is worth it. But if there is a transition, it could ease the pain.
I don't think it's worth it. --Guido van Rossum (home page: http://www.python.org/~guido/)
Yeah, it's just a pain that it's been deployed in Python 2.2 since last December, and by the time 2.3 is out it will probably have been at least a full year. Worse, 2.2 is voted to be Python-in-a-Tie, giving that particular idiom a very long lifetime. I simply don't think we can break compatibility that easily. Remember the endless threads we've had about the pace of change and stability. We have to live with warts, alas. And this is a pretty minor one if you ask me.
Is this a Pronouncement, or are we still waiting on the results of the survey?
That is my current opinion. I'm waiting for the results of the survey to see if I'll be swayed (but I don't think it's likely).
Note that several people have suggested a multi-release strategy for fixing this problem; does that make any difference?
Such a big gun for such a minor problem. --Guido van Rossum (home page: http://www.python.org/~guido/)
Ka-Ping Yee <ping@zesty.ca>:
I believe this is where the biggest debate lies: whether "for" should be non-destructive.
It's not the for-loop's fault if it's argument is of such a nature that iterating over it destroys it. Given suitable values for x and y, it's possible for evaluating "x+y" to be a destructive operation. Does that mean we should revise the "+" protocol somehow to prevent this from happening? I don't think so. This sort of thing is all-pervasive in Python due to its dynamic nature. It's not something that can be easily "fixed", even if it were desirable to do so. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+
Guido:
Generally speaking, iterator implementations aren't created by making changes to an existing class
Continuing with my scanner example, it's conceivable that I might want to give it an iterator interface as an alternative to the existing one -- it's already an iterator, really, it just doesn't have the new standard iterator interface. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+
On Wed, Jul 17, 2002 at 02:58:55PM -0500, Ka-Ping Yee wrote: | But i think this is more than a minor problem. This is a | namespace collision problem, and that's significant. Naming | the method "next" means that any object with a "next" method | cannot be adapted to support the iterator protocol. Unfortunately | "next" is a pretty common word and it's quite possible that such | a method name is already in use. Right, but such objects wouldn't be mis-leading beacuse they'd be missing a __iter__ method, correct? Best, Clark
On Wed, 17 Jul 2002, Clark C . Evans wrote:
On Wed, Jul 17, 2002 at 02:58:55PM -0500, Ka-Ping Yee wrote: | Naming | the method "next" means that any object with a "next" method | cannot be adapted to support the iterator protocol.
Right, but such objects wouldn't be mis-leading beacuse they'd be missing a __iter__ method, correct?
__iter__ is a red herring. It has nothing to do with the act of iterating. It exists only to support the use of "for" directly on the iterator. Iterators that currently implement "next" but not "__iter__" will work in some places and not others. For example, given this: class Counter: def __init__(self, last): self.i = 0 self.last = last def next(self): self.i += 1 if self.i > self.last: raise StopIteration return self.i class Container: def __init__(self, size): self.size = size def __iter__(self): return Counter(self.size) This will work: >>> for x in Container(3): print x ... 1 2 3 But this will fail: >>> for x in Counter(3): print x ... Traceback (most recent call last): File "<stdin>", line 1, in ? TypeError: iteration over non-sequence It's more accurate to say that there are two distinct protocols here. 1. An object is "for-able" if it implements __iter__ or __getitem__. This is a subset of the sequence protocol. 2. An object can be iterated if it implements next. The Container supports only protocol 1, and the Counter supports only protocol 2, with the above results. Iterators are currently asked to support both protocols. The semantics of iteration come only from protocol 2; protocol 1 is an effort to make iterators look sorta like sequences. But the analogy is very weak -- these are "sequences" that destroy themselves while you look at them -- not like any typical sequence i've ever seen! The short of it is that whenever any Python programmer says "for x in y", he or she had better be darned sure of whether this is going to destroy y. Whatever we can do to make this clear would be a good idea. -- ?!ng
__iter__ is a red herring. It has nothing to do with the act of iterating. It exists only to support the use of "for" directly on the iterator. Iterators that currently implement "next" but not "__iter__" will work in some places and not others. For example, given this:
class Counter: def __init__(self, last): self.i = 0 self.last = last
def next(self): self.i += 1 if self.i > self.last: raise StopIteration return self.i
class Container: def __init__(self, size): self.size = size
def __iter__(self): return Counter(self.size)
This will work:
>>> for x in Container(3): print x ... 1 2 3
But this will fail:
>>> for x in Counter(3): print x ... Traceback (most recent call last): File "<stdin>", line 1, in ? TypeError: iteration over non-sequence
It's more accurate to say that there are two distinct protocols here.
1. An object is "for-able" if it implements __iter__ or __getitem__. This is a subset of the sequence protocol.
2. An object can be iterated if it implements next.
The Container supports only protocol 1, and the Counter supports only protocol 2, with the above results.
Iterators are currently asked to support both protocols. The semantics of iteration come only from protocol 2; protocol 1 is an effort to make iterators look sorta like sequences. But the analogy is very weak -- these are "sequences" that destroy themselves while you look at them -- not like any typical sequence i've ever seen!
The short of it is that whenever any Python programmer says "for x in y", he or she had better be darned sure of whether this is going to destroy y. Whatever we can do to make this clear would be a good idea.
This is a very good summary of the two iterator protocols. Ping, would you mind adding this to PEP 234? --Guido van Rossum (home page: http://www.python.org/~guido/)
I wrote:
__iter__ is a red herring. [...blah blah blah...] The short of it is that whenever any Python programmer says "for x in y", he or she had better be darned sure of whether this is going to destroy y. Whatever we can do to make this clear would be a good idea.
Guido wrote:
This is a very good summary of the two iterator protocols. Ping, would you mind adding this to PEP 234?
And i thought it was a critique. Fascinating, Captain. :) I'm happy to add the text, but i want to be clear, then: is it acceptable to write an iterator that only provides <next> if you only care about the "iteration protocol" and not the "for-able protocol"? I see that "ought to" is the most opinion the PEP is willing to give on the topic: A class is a valid iterator object when it defines a next() method that behaves as described above. A class that wants to be an iterator also ought to implement __iter__() returning itself. -- ?!ng
Ka-Ping:
is it acceptable to write an iterator that only provides <next> if you only care about the "iteration protocol" and not the "for-able protocol"?
Probably just as acceptable as writing a file object that only provides some of the file methods. Seems quite Pythonic to me. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+
This is a very good summary of the two iterator protocols. Ping, would you mind adding this to PEP 234?
And i thought it was a critique. Fascinating, Captain. :)
I'm happy to add the text, but i want to be clear, then: is it acceptable to write an iterator that only provides <next> if you only care about the "iteration protocol" and not the "for-able protocol"?
No, an iterator ought to provide both, but it's good to recognize that there *are* two protocols.
I see that "ought to" is the most opinion the PEP is willing to give on the topic:
A class is a valid iterator object when it defines a next() method that behaves as described above. A class that wants to be an iterator also ought to implement __iter__() returning itself.
I would like to see this strengthened. I envision "iterator algebra" code that really needs to be able to do a for loop over an iterator when it feels like it. --Guido van Rossum (home page: http://www.python.org/~guido/)
I'm happy to add the text, but i want to be clear, then: is it acceptable to write an iterator that only provides <next> if you only care about the "iteration protocol" and not the "for-able protocol"?
No, an iterator ought to provide both, but it's good to recognize that there *are* two protocols.
A class is a valid iterator object when it defines a next() method that behaves as described above. A class that wants to be an iterator also ought to implement __iter__() returning itself.
I would like to see this strengthened. I envision "iterator algebra" code that really needs to be able to do a for loop over an iterator when it feels like it.
Maybe the reasons behind having __iter__() returning itself should be clearly expressed in the PEP, too. On this list, Tim gave one recently, Guido gives another here, but unless I missed it, the PEP gives none. Usually, PEPs explain the reasons behind the choices. -- François Pinard http://www.iro.umontreal.ca/~pinard
Maybe the reasons behind having __iter__() returning itself should be clearly expressed in the PEP, too. On this list, Tim gave one recently, Guido gives another here, but unless I missed it, the PEP gives none. Usually, PEPs explain the reasons behind the choices.
Ping added this to the PEP: The two methods correspond to two distinct protocols: 1. An object can be iterated over with "for" if it implements __iter__() or __getitem__(). 2. An object can function as an iterator if it implements next(). Container-like objects usually support protocol 1. Iterators are currently required to support both protocols. The semantics of iteration come only from protocol 2; protocol 1 is present to make iterators behave like sequences. But the analogy is weak -- unlike ordinary sequences, iterators are "sequences" that are destroyed by the act of looking at their elements. (I could do without the last sentence, since this expresses a value judgement rather than fact -- not a good thing to have in a PEP's "specification" section.) --Guido van Rossum (home page: http://www.python.org/~guido/)
"GvR" == Guido van Rossum <guido@python.org> writes:
>> Container-like objects usually support protocol 1. Iterators are >> currently required to support both protocols. The semantics of >> iteration come only from protocol 2; protocol 1 is present to make >> iterators behave like sequences. But the analogy is weak -- unlike >> ordinary sequences, iterators are "sequences" that are destroyed by >> the act of looking at their elements. GvR> (I could do without the last sentence, since this expresses a GvR> value judgement rather than fact -- not a good thing to have GvR> in a PEP's "specification" section.) What about: "...sequences. Note that the act of looking at an iterator's elements mutates the iterator." -Barry
On Friday 19 July 2002 12:26 am, Tim Peters wrote:
What about:
"...sequences. Note that the act of looking at an iterator's elements mutates the iterator."
That doesn't belong in the spec either -- nothing requires an iterator to have mutable state, let alone to mutate it when next() is called.
Right, for unbounded iterators returning constant values, such as: class Ones: def __iter__(self): return self def next(self): return 1 However, such "exceptions that prove the rule" are rare enough that I wouldn't consider their existence as forbidding to say _anything_ about state mutation. I _would_ similarly say that x[y]=z normally mutates x, even though "del __setitem__(self, key): pass" is quite legal. Inserting an adverb such as "generally" or "usually" should suffice to make even the most grizzled sea lawyer happy while keeping the information in. Alex
I wrote:
__iter__ is a red herring. [...blah blah blah...] Iterators are currently asked to support both protocols. The semantics of iteration come only from protocol 2; protocol 1 is an effort to make iterators look sorta like sequences. But the analogy is very weak -- these are "sequences" that destroy themselves while you look at them -- not like any typical sequence i've ever seen!
The short of it is that whenever any Python programmer says "for x in y", he or she had better be darned sure of whether this is going to destroy y. Whatever we can do to make this clear would be a good idea.
On Wed, 17 Jul 2002, Guido van Rossum wrote:
This is a very good summary of the two iterator protocols. Ping, would you mind adding this to PEP 234?
I have now done so. I didn't add the whole thing verbatim, because the tone doesn't fit: it was written with the intent of motivating a change to the protocol, rather than describing what the protocol is. Presumably we don't want the PEP to say "__iter__ is a red herring". There's a bunch of issues flying around here, which i'll try to explain better in a separate posting. But i wanted to take care of Guido's request first. I have toned down and abridged my text somewhat, and strengthened the requirement for __iter__(). Here is what the "API specification" section now says: Classes can define how they are iterated over by defining an __iter__() method; this should take no additional arguments and return a valid iterator object. A class that wants to be an iterator should implement two methods: a next() method that behaves as described above, and an __iter__() method that returns self. The two methods correspond to two distinct protocols: 1. An object can be iterated over with "for" if it implements __iter__() or __getitem__(). 2. An object can function as an iterator if it implements next(). Container-like objects usually support protocol 1. Iterators are currently required to support both protocols. The semantics of iteration come only from protocol 2; protocol 1 is present to make iterators behave like sequences. But the analogy is weak -- unlike ordinary sequences, iterators are "sequences" that are destroyed by the act of looking at their elements. Consequently, whenever any Python programmer says "for x in y", he or she must be sure of whether this is going to destroy y. -- ?!ng
I didn't add the whole thing verbatim, because the tone doesn't fit: it was written with the intent of motivating a change to the protocol, rather than describing what the protocol is. Presumably we don't want the PEP to say "__iter__ is a red herring".
There's a bunch of issues flying around here, which i'll try to explain better in a separate posting. But i wanted to take care of Guido's request first. I have toned down and abridged my text somewhat, and strengthened the requirement for __iter__(). Here is what the "API specification" section now says:
Classes can define how they are iterated over by defining an __iter__() method; this should take no additional arguments and return a valid iterator object. A class that wants to be an iterator should implement two methods: a next() method that behaves as described above, and an __iter__() method that returns self.
The two methods correspond to two distinct protocols:
1. An object can be iterated over with "for" if it implements __iter__() or __getitem__().
2. An object can function as an iterator if it implements next().
Container-like objects usually support protocol 1. Iterators are currently required to support both protocols. The semantics of iteration come only from protocol 2; protocol 1 is present to make iterators behave like sequences. But the analogy is weak -- unlike ordinary sequences, iterators are "sequences" that are destroyed by the act of looking at their elements.
Find up to here.
Consequently, whenever any Python programmer says "for x in y", he or she must be sure of whether this is going to destroy y.
I don't understand why this is here. *Why* is it important to know whether this is going to destroy y? --Guido van Rossum (home page: http://www.python.org/~guido/)
On Wed, 17 Jul 2002, Clark C . Evans wrote:
Right, but such objects wouldn't be mis-leading beacuse they'd be missing a __iter__ method, correct?
Oh, i guess i didn't properly answer your question. Oops. :) My answer would be: you could say that, but wouldn't it suck to have to check for the existence of __iter__ every time you wanted to call next? You can legislate that everyone should implement __iter__ together with next; you can legislate that everyone should check for __iter__ before calling next. To some extent you have to do both or neither; one without the other is inconsistent and would lead to surprises. In practice no one's going to check. So in practice __iter__ isn't really part of the protocol. -- ?!ng
Unfortunately "next" is a pretty common word and it's quite possible that such a method name is already in use.
It is -- all my scanners have a "next" method that does something different from what an iterator's "next" is supposed to do. Fortunately I haven't had an urge to make any of them into an iterator yet. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+
On Wed, Jul 17, 2002 at 02:58:55PM -0500, Ka-Ping Yee wrote: | But i think this is more than a minor problem. This is a | namespace collision problem, and that's significant. Naming | the method "next" means that any object with a "next" method | cannot be adapted to support the iterator protocol. Unfortunately | "next" is a pretty common word and it's quite possible that such | a method name is already in use. Ping, Do you have any suggestions for re-wording the Iterator questionare at http://yaml.org/wk/survey?id=pyiter to reflect this paragraph above? Best, Clark -- Clark C. Evans Axista, Inc. http://www.axista.com 800.926.5525 XCOLLA Collaborative Project Management Software
On Thu, 18 Jul 2002, Clark C . Evans wrote:
On Wed, Jul 17, 2002 at 02:58:55PM -0500, Ka-Ping Yee wrote: | But i think this is more than a minor problem. This is a | namespace collision problem, and that's significant. Naming | the method "next" means that any object with a "next" method | cannot be adapted to support the iterator protocol. Unfortunately | "next" is a pretty common word and it's quite possible that such | a method name is already in use.
Ping,
Do you have any suggestions for re-wording the Iterator questionare at http://yaml.org/wk/survey?id=pyiter to reflect this paragraph above?
I might add something like: One motivation for this change is that the name "next()" might collide with the name of an existing "next()" method. This could cause a problem if someone wants to implement the iterator protocol for an object that already happens to have a method called "next()". So far no one has reported encountering this situation. It seems plausible that there will be some objects where it would be nice to support the iterator protocol, and we have heard of some objects with methods named "next()", but we don't know how likely or unlikely it is that there's an object where both are true. Does that seem fair? -- ?!ng
Ping> On Mon, 15 Jul 2002, Andrew Koenig wrote:
However the purpose my suggestion of __multiter__ was not to use it to test for multiple iteration, but to enable a container to be able to yield either a single or a multiple iterator on request.
Ping> I see what you want, though i have a hard time imagining a situation Ping> where it's really necessary to have both (as opposed to just the Ping> multiple iterator, which is strictly more capable). I can certainly Ping> see how you might want to be able to ask for a breadth-first or Ping> depth-first iterator on a tree, though. How about a class that represents a file? If you ask it for a single iterator, that's easy. If you ask it for a multiple iterator, it checks whether the file is really an interactive device such as a pipe or a keyboard. If so, it uses a buffering mechanism to simulate multiple iteration; otherwise, it lets the multiple iterators access the file directly. Then when you ask to iterate over the file, you automatically get the least cumbersome mechanism needed to support the kind of iteration that you requested.
Or, what if there is no container to begin with, but the iterator is still copyable? You can't flag that by putting __multiter__ on anything; again it makes more sense to just provide __copy__ on the iterator.
You could flag it by putting __multiter__ on the iterator, just as iterators presently have __iter__.
Ping> Ugh. I don't like this, for the reasons i outlined in another message: Ping> an iterator is not the same as a container. Iterators always mutate; Ping> containers usually do not (at least not as a result of looking at the Ping> elements). The scenario is this: def search(thing): iter = thing.__multiter__() // now either iter is an iterator that supports __copy__ // or we will raise an exception (and raise it here, rather // than waiting for the first time we try to copy iter).
Not quite. We also need an agreement that calling __iter__ on a container is not a destructive operation unless you call next() on the iterator that you get back.
Ping> What i'd like is an agreement that calling __iter__ on a container is Ping> not a destructive operation at all. If it's destructive, then what you Ping> started with is not really a container, and we should encourage people Ping> to call attention to this irregularity in their documentation. Is a file a container or not? Isn't making an iterator from a file and calling next() on it a destructive operation?
I think a proliferation of iterator-fetching methods would be a messy and unpleasant prospect. After __iter__, __multiter__, and __ambiter__, what next? __mutableiter__? __depthfirstiter__? __breadthfirstiter__?
A data structure that supports several different kinds of iteration has to provide that support somehow.
Ping> Agreed. I was unclear: what makes me uncomfortable is the pollution Ping> of the double-underscore namespace. When you do have a container-like Ping> object that supports various kinds of iteration, naturally you are Ping> going to need some methods for getting iterators. I just think it's Ping> not appropriate to establish special names for them. Fair enough. But then why is __iter__ special? Ping> To me, the presence of double-underscores around a method name means Ping> that the method is called automatically. My expectation is that when Ping> i write a method with a "normal" name, the name itself will appear Ping> after a dot wherever that method is used; and that when there's a Ping> method with a "__special__" name, the method is called implicitly. Ping> The implicit call can occur via an operator (e.g. __add__), or to Ping> implement a protocol defined in the language (e.g. __init__), etc. Ping> If you see the string ".__" it means that something unusual is going on. Ping> If you follow this convention, then "__iter__" deserves a special name, Ping> because it is the specially blessed iterator-getter used by "for". Ping> There may be other iterator-getters, but they must be called explicitly, Ping> so they shouldn't get underscores. Ah, is it only "for" that makes __iter__ special, and not iter() ? Ping> An aside on "next" vs. "__next__": Ping> This is exactly why the iterator protocol should consist of one Ping> method named "__next__" rather than two methods named "__iter__" Ping> (which has nothing to do with the act of iterating!) and "next" Ping> (which is the one we really care about, but can collide with Ping> existing method names). Ping> As far as i know, "next" is the only implicitly-called method of Ping> an internal protocol that has no underscores. It's a little late Ping> to fix the name of "next" in Python 2, though it might be worth Ping> considering for Python 3. One way to clarify a discussion of a protocol is to append an "s" and think of a plurality of protocols, so as to see which properites are truly intrinsic and which can vary between protocols. That's part of what I'm trying to do in this discussion. (and I don't presently have a strong opinion about what the right answer is. I don't even know for sure what the question is.)
On 17 Jul 2002 at 11:27, Andrew Koenig wrote:
How about a class that represents a file? If you ask it for a single iterator, that's easy. If you ask it for a multiple iterator, it checks whether the file is really an interactive device such as a pipe or a keyboard. If so, it uses a buffering mechanism to simulate multiple iteration; otherwise, it lets the multiple iterators access the file directly.
Then when you ask to iterate over the file, you automatically get the least cumbersome mechanism needed to support the kind of iteration that you requested.
OK, it's a pipe, and one iterator wants to go past what's been received. Is that iterator at EOF? Not really, just "temporary EOF". So should it block? But I'm single threaded and receiving asynchronously. Oh, and it turns out to be a humongous download, and what happens if the buffering mechanism runs out of memory / disk space. Does my process die? Aargh. Too much magic. Too many corner cases. -- Gordon http://www.mcmillan-inc.com/
Gordon> OK, it's a pipe, and one iterator wants to go past Gordon> what's been received. Is that iterator at EOF? Not Gordon> really, just "temporary EOF". So should it block? Gordon> But I'm single threaded and receiving asynchronously. Gordon> Oh, and it turns out to be a humongous download, Gordon> and what happens if the buffering mechanism runs Gordon> out of memory / disk space. Does my process die? Gordon> Aargh. Too much magic. Too many corner cases. The implementation of the file should have a choice: Either refuse to yield a multiple iterator (which seems to be your preference) or yield one that works (which might or might not be my preference, depending on circumstances). In the latter case, I don't think your questions are hard to answer, because most of the answers fall out of the single-iterator case. So if the iterator is at EOF, it should either block or not, depending on what a single iterator should so. The only real question is what happens if the buffering mechanism runs out of space, but that's always a question for such mechanisms; I don't see why it's any more irksome in this particular context.
Andrew Koenig <ark@research.att.com>:
Is a file a container or not?
I would say no, a file object is not a container in Python terms. You can't index it with [] or use len() on it or any of the other things you expect to be able to do on containers. I think we just have to live with the idea that there are things other than containers that can supply iterators. Forcing everything that can supply an iterator to bend over backwards to try to be a random-access container as well would be too cumbersome. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+
On Thursday 18 July 2002 01:25 am, Greg Ewing wrote:
Andrew Koenig <ark@research.att.com>:
Is a file a container or not?
I would say no, a file object is not a container in Python terms. You can't index it with [] or use len() on it or any of the other things you expect to be able to do on containers.
I think we just have to live with the idea that there are things other than containers that can supply iterators.
Yes, there are such things, and there may be cases in which no other alternative makes sense. But I don't think files are necessarily in such a bind.
Forcing everything that can supply an iterator to bend over backwards to try to be a random-access container as well would be too cumbersome.
Absolutely. But what Oren's patch does, and my mods of it preserve, is definitely NOT "forcing" files "to be random- access containers": on the contrary, it accepts the fact that files aren't containers and conceptually simplifies things by making them iterators instead. I'm not sure about "random access" being needed to be a container. Consider sets, e.g. as per Greg Wilson's soapbox implementation (as modified by my patch to allow immutable-sets, maybe, but that's secondary). They're doubtlessly containers, able to produce on request as many iterators as you wish, each iterator not affecting the set's state in any way -- the ideal. But what sense would it make to force sets to expose a __getitem__? Right now they inherit from dict and thus do happen to expose it, but that's really an implementation artefact showing through (and a good example of why one might like to inherit without needing to expose all of the superclass's interface, to tie this in to another recent thread -- inheritance for implementation). Ideally, sets would expose __contains__, __iter__, __len__, ways to add and remove elements, and perhaps (it's so in Greg's implementation, and I didn't touch that) set ops such as union, intersection &c. someset[anindex] is really a weird thing to have... yet sets _are_ containers! Alex
But what sense would it make to force sets to expose a __getitem__? Right now they inherit from dict and thus do happen to expose it, but that's really an implementation artefact showing through (and a good example of why one might like to inherit without needing to expose all of the superclass's interface, to tie this in to another recent thread -- inheritance for implementation).
Ideally, sets would expose __contains__, __iter__, __len__, ways to add and remove elements, and perhaps (it's so in Greg's implementation, and I didn't touch that) set ops such as union, intersection &c. someset[anindex] is really a weird thing to have... yet sets _are_ containers!
I believe I recommended to Greg to make sets "have" a dict instead of "being" dicts, and I think he agreed. But I guess he never got to implementing that change. --Guido van Rossum (home page: http://www.python.org/~guido/)
On Thursday 18 July 2002 20:45, Guido van Rossum wrote: ...
I believe I recommended to Greg to make sets "have" a dict instead of "being" dicts, and I think he agreed. But I guess he never got to implementing that change.
Right. OK, guess I'll make a new patch using delegation instead of inheritance, then. Alex
I believe I recommended to Greg to make sets "have" a dict instead of "being" dicts, and I think he agreed. But I guess he never got to implementing that change.
Right. OK, guess I'll make a new patch using delegation instead of inheritance, then.
Maybe benchmark the performance too. If the "has" version is much slower, perhaps we could remove unwanted interfaces from the public API by overriding them with something that raises an exception (and rename the internal versions to some internal name if they are needed). --Guido van Rossum (home page: http://www.python.org/~guido/)
On Thursday 18 July 2002 09:50 pm, Guido van Rossum wrote:
I believe I recommended to Greg to make sets "have" a dict instead of "being" dicts, and I think he agreed. But I guess he never got to implementing that change.
Right. OK, guess I'll make a new patch using delegation instead of inheritance, then.
Maybe benchmark the performance too. If the "has" version is much slower, perhaps we could remove unwanted interfaces from the public API by overriding them with something that raises an exception (and rename the internal versions to some internal name if they are needed).
I've just updated patch 580995 with the has-A rather than is-A version. OK, I'll now run some simple benchmarks... Looks good, offhand. Here's the simple benchmark script: import time import set import sys clock = time.clock raw = range(10000) times = [None]*20 print "Timing Set %s (Python %s)" % (set.__version__, sys.version) print "Make 20 10k-items sets (no reps)...", start = clock() for i in times: s10k = set.Set(raw) stend = clock() print stend-start witre = range(1000)*10 print "Make 20 1k-items sets (x10 reps)...", for i in times: s1k1 = set.Set(witre) stend = clock() print stend-start raw1 = range(500, 1500) print "Make 20 more 1k-items sets (no reps)...", for i in times: s1k2 = set.Set(raw1) stend = clock() print stend-start print "20 unions of 1k-items sets 50% overlap...", for i in times: result = s1k1 | s1k2 stend = clock() print stend-start print "20 inters of 1k-items sets 50% overlap...", for i in times: result = s1k1 & s1k2 stend = clock() print stend-start print "20 diffes of 1k-items sets 50% overlap...", for i in times: result = s1k1 - s1k2 stend = clock() print stend-start print "20 simdif of 1k-items sets 50% overlap...", for i in times: result = s1k1 ^ s1k2 stend = clock() print stend-start And here's a few runs (with -O of course) on my PC: [alex@lancelot has]$ python -O ../bench_set.py Timing Set $Revision: 1.5 $ (Python 2.2.1 (#2, Jul 15 2002, 17:32:26) [GCC 2.96 20000731 (Mandrake Linux 8.1 2.96-0.62mdk)]) Make 20 10k-items sets (no reps)... 0.21 Make 20 1k-items sets (x10 reps)... 0.36 Make 20 more 1k-items sets (no reps)... 0.38 20 unions of 1k-items sets 50% overlap... 0.43 20 inters of 1k-items sets 50% overlap... 0.92 20 diffes of 1k-items sets 50% overlap... 1.41 20 simdif of 1k-items sets 50% overlap... 2.38 [alex@lancelot has]$ python -O ../bench_set.py Timing Set $Revision: 1.5 $ (Python 2.2.1 (#2, Jul 15 2002, 17:32:26) [GCC 2.96 20000731 (Mandrake Linux 8.1 2.96-0.62mdk)]) Make 20 10k-items sets (no reps)... 0.22 Make 20 1k-items sets (x10 reps)... 0.37 Make 20 more 1k-items sets (no reps)... 0.39 20 unions of 1k-items sets 50% overlap... 0.44 20 inters of 1k-items sets 50% overlap... 0.93 20 diffes of 1k-items sets 50% overlap... 1.42 20 simdif of 1k-items sets 50% overlap... 2.39 [alex@lancelot has]$ cd ../is [alex@lancelot is]$ python -O ../bench_set.py Timing Set $Revision: 1.5 $ (Python 2.2.1 (#2, Jul 15 2002, 17:32:26) [GCC 2.96 20000731 (Mandrake Linux 8.1 2.96-0.62mdk)]) Make 20 10k-items sets (no reps)... 0.21 Make 20 1k-items sets (x10 reps)... 0.37 Make 20 more 1k-items sets (no reps)... 0.39 20 unions of 1k-items sets 50% overlap... 0.44 20 inters of 1k-items sets 50% overlap... 0.93 20 diffes of 1k-items sets 50% overlap... 1.42 20 simdif of 1k-items sets 50% overlap... 2.38 [alex@lancelot is]$ python -O ../bench_set.py Timing Set $Revision: 1.5 $ (Python 2.2.1 (#2, Jul 15 2002, 17:32:26) [GCC 2.96 20000731 (Mandrake Linux 8.1 2.96-0.62mdk)]) Make 20 10k-items sets (no reps)... 0.22 Make 20 1k-items sets (x10 reps)... 0.38 Make 20 more 1k-items sets (no reps)... 0.4 20 unions of 1k-items sets 50% overlap... 0.44 20 inters of 1k-items sets 50% overlap... 0.93 20 diffes of 1k-items sets 50% overlap... 1.42 20 simdif of 1k-items sets 50% overlap... 2.41 [alex@lancelot is]$ They look much of a muchness to me. Sorry about the version stuck at 1.5 -- forgot to update that, but you can tell the difference by the directory name, 'is' and 'has' resp.:-). Python 2.3 (built from CVS 22 hours ago) is substantially faster at some tasks (intersections and differences): [alex@lancelot has]$ python -O ../bench_set.py Timing Set $Revision: 1.5 $ (Python 2.3a0 (#44, Jul 18 2002, 00:03:05) [GCC 2.96 20000731 (Mandrake Linux 8.2 2.96-0.76mdk)]) Make 20 10k-items sets (no reps)... 0.21 Make 20 1k-items sets (x10 reps)... 0.36 Make 20 more 1k-items sets (no reps)... 0.37 20 unions of 1k-items sets 50% overlap... 0.42 20 inters of 1k-items sets 50% overlap... 0.75 20 diffes of 1k-items sets 50% overlap... 1.08 20 simdif of 1k-items sets 50% overlap... 1.73 [alex@lancelot has]$ python -O ../bench_set.py Timing Set $Revision: 1.5 $ (Python 2.3a0 (#44, Jul 18 2002, 00:03:05) [GCC 2.96 20000731 (Mandrake Linux 8.2 2.96-0.76mdk)]) Make 20 10k-items sets (no reps)... 0.21 Make 20 1k-items sets (x10 reps)... 0.36 Make 20 more 1k-items sets (no reps)... 0.37 20 unions of 1k-items sets 50% overlap... 0.42 20 inters of 1k-items sets 50% overlap... 0.75 20 diffes of 1k-items sets 50% overlap... 1.08 20 simdif of 1k-items sets 50% overlap... 1.74 [alex@lancelot has]$ [alex@lancelot is]$ python -O ../bench_set.py Timing Set $Revision: 1.5 $ (Python 2.3a0 (#44, Jul 18 2002, 00:03:05) [GCC 2.96 20000731 (Mandrake Linux 8.2 2.96-0.76mdk)]) Make 20 10k-items sets (no reps)... 0.21 Make 20 1k-items sets (x10 reps)... 0.35 Make 20 more 1k-items sets (no reps)... 0.37 20 unions of 1k-items sets 50% overlap... 0.41 20 inters of 1k-items sets 50% overlap... 0.74 20 diffes of 1k-items sets 50% overlap... 1.07 20 simdif of 1k-items sets 50% overlap... 1.72 [alex@lancelot is]$ python -O ../bench_set.py Timing Set $Revision: 1.5 $ (Python 2.3a0 (#44, Jul 18 2002, 00:03:05) [GCC 2.96 20000731 (Mandrake Linux 8.2 2.96-0.76mdk)]) Make 20 10k-items sets (no reps)... 0.21 Make 20 1k-items sets (x10 reps)... 0.36 Make 20 more 1k-items sets (no reps)... 0.38 20 unions of 1k-items sets 50% overlap... 0.42 20 inters of 1k-items sets 50% overlap... 0.75 20 diffes of 1k-items sets 50% overlap... 1.08 20 simdif of 1k-items sets 50% overlap... 1.73 [alex@lancelot is]$ but as you can see, again it's uniformly faster on both 'is' and 'has' versions of sets. The 'has' version thus seems preferable here. Alex
On Thu, Jul 11, 2002 at 11:31:47AM -0400, Andrew Koenig wrote:
Right now, every iterator, and every object that supports iteration, must have an __iter__() method. Suppose we augment that with the following:
A new kind of iterator, called a multiple iterator, that supports multiple iterations over the same sequence. ... __copy__() return a distinct, newly created multiple iterator that iterates over the same sequence as the original, starting from the current element.
There is no need for a new type of iterator. It's ok that iterators are disposable. If I need multiple iterations I don't want to copy the iterator - I prefer to ask the original iterable object for a new iterator. All I need is some way to know whether the iterable object (container) can produce multiple iterators that generate the same sequence. An object is re-iterable if its iterators do not modify its state. The iterator of an iterator is itself. Calling the next method, by definition, modifies the internal state of an object. Therefore anything that has a next method is not re-iterable. "hasattr(obj,'__iter__') and hasattr(obj, 'next')" is a good signature of a non re-iterable object. Unfortunately, the opposite is not true. One iterable object in Python produces iterators that modify its state when their .next() method is called - the file object. I have just submitted a patch that makes a file into an iterator (i.e. adds a .next method to files). With this change all Python objects that have an __iter__ method and no next method produce iterators that do not modify the container. Another possibility would be to make file iterators that use seek or re-open the file to avoid modifying the file position of the parent file object. I don't think that would be a good idea because files can be devices, pipes or sockets which are not seekable. I think it may be a good idea to add a note to the documentation pages about the iterator protocol that the iterators of a container should not modify the state of the container. If you think they must it's probably a good sign that your 'container' is not really a container and maybe it should be an iterator rather than produce iterators of itself. Oren
Oren, I like the direction this is going in, but I have some reservations about any protocol which requires users to avoid using a simple method name like next() on their own multi-pass sequence types unless they intend their sequence to be treated as single-pass. One other possibility: if x.__iter__() is x, it's a single-pass sequence. I realize this involves actually invoking the __iter__ method and conjuring up a new iterator, but that's generally a lightweight operation... -Dave From: "Oren Tirosh" <oren-py-d@hishome.net>
There is no need for a new type of iterator. It's ok that iterators are disposable. If I need multiple iterations I don't want to copy the iterator - I prefer to ask the original iterable object for a new iterator. All I need is some way to know whether the iterable object (container) can produce multiple iterators that generate the same sequence.
An object is re-iterable if its iterators do not modify its state.
The iterator of an iterator is itself. Calling the next method, by definition, modifies the internal state of an object. Therefore anything that has a next method is not re-iterable.
"hasattr(obj,'__iter__') and hasattr(obj, 'next')" is a good signature of a non re-iterable object. Unfortunately, the opposite is not true. One iterable object in Python produces iterators that modify its state when their .next() method is called - the file object.
I have just submitted a patch that makes a file into an iterator (i.e. adds a .next method to files). With this change all Python objects that have an __iter__ method and no next method produce iterators that do not modify the container. Another possibility would be to make file iterators that use seek or re-open the file to avoid modifying the file position of the parent file object. I don't think that would be a good idea because files can be devices, pipes or sockets which are not seekable.
I think it may be a good idea to add a note to the documentation pages about the iterator protocol that the iterators of a container should not modify the state of the container. If you think they must it's probably a good sign that your 'container' is not really a container and maybe it should be an iterator rather than produce iterators of itself.
Oren
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev
On Fri, Jul 12, 2002 at 06:22:05AM -0400, David Abrahams wrote:
Oren,
I like the direction this is going in, but I have some reservations about any protocol which requires users to avoid using a simple method name like next() on their own multi-pass sequence types unless they intend their sequence to be treated as single-pass.
I'm not too thrilled about it, either, but I don't think it's too bad. If you implement an object with an __iter__ method you must be aware of the iteration protocol and the next method. If you put a next method on an iterable you are most probably confusing iterators and iterables and not just using the name 'next' for some other innocent purpose.
One other possibility: if x.__iter__() is x, it's a single-pass sequence. I realize this involves actually invoking the __iter__ method and conjuring up a new iterator, but that's generally a lightweight operation...
I think it is critical that all protocols should be defined by something passive like presence of attributes and attributes of attributes and not by active probing. I don't see how a future typing system could be retrofitted to Python otherwise (pssst, don't tell anyone, but I'm working on such a system...) Oren
On Fri, Jul 12, 2002 at 06:22:05AM -0400, David Abrahams wrote:
Oren,
I like the direction this is going in, but I have some reservations about any protocol which requires users to avoid using a simple method name
From: "Oren Tirosh" <oren-py-d@hishome.net> like
next() on their own multi-pass sequence types unless they intend their sequence to be treated as single-pass.
I'm not too thrilled about it, either, but I don't think it's too bad. If you implement an object with an __iter__ method you must be aware of the iteration protocol and the next method. If you put a next method on an iterable you are most probably confusing iterators and iterables and not just using the name 'next' for some other innocent purpose.
People may have already written that innocent code, but I'm not sure the consequences of misinterpreting such sequences as single-pass are so terrible. Still, I would prefer if we were looking for "__next__" instead of next().
One other possibility: if x.__iter__() is x, it's a single-pass sequence. I realize this involves actually invoking the __iter__ method and conjuring up a new iterator, but that's generally a lightweight operation...
I think it is critical that all protocols should be defined by something passive like presence of attributes and attributes of attributes and not by active probing.
Isn't that passive/active distinction illusory though? What about __getattr__ methods?
I don't see how a future typing system could be retrofitted to Python otherwise (pssst, don't tell anyone, but I'm working on such a system...)
Nifty! I'd love to get a preview, if possible. Types come into play at the Python/C++ boundary, and I'm interested in how our systems will interact (c.f. http://aspn.activestate.com/ASPN/Mail/Message/types-sig/1222793) -Dave
On Fri, Jul 12, 2002 at 07:14:21AM -0400, David Abrahams wrote:
I'm not too thrilled about it, either, but I don't think it's too bad. If you implement an object with an __iter__ method you must be aware of the iteration protocol and the next method. If you put a next method on an iterable you are most probably confusing iterators and iterables and not just using the name 'next' for some other innocent purpose.
People may have already written that innocent code, but I'm not sure the consequences of misinterpreting such sequences as single-pass are so terrible. Still, I would prefer if we were looking for "__next__" instead of next().
I'm not actually suggesting this as a reliable way to detect re-iterable objects, it's more of an observation. If you want something that can be relied upon for optimizations that would probably require a new __magic__ attribute. Any suggestions?
Isn't that passive/active distinction illusory though? What about __getattr__ methods?
I can't believe that any static or semi-static typing system will be able to handle __getattr__ virtual attributes. An object simply won't match a type predicate if any of the attributes checked by the predicate are virtual.
I don't see how a future typing system could be retrofitted to Python otherwise (pssst, don't tell anyone, but I'm working on such a system...)
Nifty! I'd love to get a preview, if possible. Types come into play at the Python/C++ boundary, and I'm interested in how our systems will interact (c.f. http://aspn.activestate.com/ASPN/Mail/Message/types-sig/1222793)
I don't know what you're talking about. :-) Oren
If you put a next method on an iterable you are most probably confusing iterators and iterables and not just using the name 'next' for some other innocent purpose.
Quite to the contrary. You might have a multi-iterable class that was defined before the iterator protocol existed, and had a "built-in" iterator that keeps the iteration state in the object itself (a common design, e.g. BSD db files have this). This is OK for simple uses. But with iterators available the class might grow a proper iterator class that keeps state external from the object. But for backward compatibility reasons you cannot remove the next() method on the class itself. QED: you have a multi-iterable object that has both an __iter__ method and a next method. --Guido van Rossum (home page: http://www.python.org/~guido/)
Oren> There is no need for a new type of iterator. It's ok that Oren> iterators are disposable. If I need multiple iterations I don't Oren> want to copy the iterator - I prefer to ask the original Oren> iterable object for a new iterator. All I need is some way to Oren> know whether the iterable object (container) can produce Oren> multiple iterators that generate the same sequence. You are assuming that you still have access to the original iterable object. But what if all you have is an iterator? Then you need to be able to ask the iterator for a new iterator. Oren> An object is re-iterable if its iterators do not modify its state. Oren> The iterator of an iterator is itself. Calling the next method, Oren> by definition, modifies the internal state of an Oren> object. Therefore anything that has a next method is not Oren> re-iterable. That's not the only possible definition of an iterator. I'm thinking, in part, about how one might translate some of the C++ standard-library algorithms into Python. If that translation requires that the user always supply the original container, rather than using iterators only, then some algorithms become harder to express or less ueful.
On Fri, Jul 12, 2002 at 08:27:56AM -0400, Andrew Koenig wrote:
Oren> There is no need for a new type of iterator. It's ok that Oren> iterators are disposable. If I need multiple iterations I don't Oren> want to copy the iterator - I prefer to ask the original Oren> iterable object for a new iterator. All I need is some way to Oren> know whether the iterable object (container) can produce Oren> multiple iterators that generate the same sequence.
You are assuming that you still have access to the original iterable object. But what if all you have is an iterator? Then you need to be able to ask the iterator for a new iterator.
Here are two cases I can think of where I don't have access to the iterable object: 1. There is no iterable object. An iterator object was created directly. For example, the result of a generator function is an iterator which isn't the result of some container's __iter__ method. 2. The iterator was received as an argument and the caller sent iter(x) instead of x. In that case I guess it means that the caller doesn't *want* to give me access to x.
Oren> An object is re-iterable if its iterators do not modify its state.
Oren> The iterator of an iterator is itself. Calling the next method, Oren> by definition, modifies the internal state of an Oren> object. Therefore anything that has a next method is not Oren> re-iterable.
That's not the only possible definition of an iterator.
It isn't a definition of an iterator. It isn't even a definition of a re-iterable object, it's a sufficient (but not required) condition for objects to be re-iterable.
I'm thinking, in part, about how one might translate some of the C++ standard-library algorithms into Python.
Why not translate *what* they do instead of *how* they do it? I'm pretty sure the Python way would be shorter and simpler anyway. Oren
Oren> 1. There is no iterable object. An iterator object was created Oren> directly. For example, the result of a generator function is an Oren> iterator which isn't the result of some container's __iter__ Oren> method. Yes. Oren> 2. The iterator was received as an argument and the caller sent Oren> iter(x) instead of x. In that case I guess it means that the Oren> caller doesn't *want* to give me access to x. 3. The caller sent an iterator that refers to an element of the container other than the initial one. For example: def findafter(it, x): it = iter(it) while it.next() != x: pass return it This function locates the first element equal to x in the sequence denoted by iter, and returns an iterator that refers to the element after the one equal to x. It raises StopIteration if no such element exists. Now, suppose you want to use this function to find all of the elements in a sequence that are equal to x. On the second and subsequent calls, you're going to have to pass an iterator as the first argument, because passing the container isn't going to give you the right answer. For another, more detailed example of how sensitive library design is to the details of iterator behavior, please look at http://www.research.att.com/~ark/design.pdf (I hope I have uttered the right incantations to make it available outside our firewall; if I haven't, please let me know)
Oh yes -- one more use for being able to copy an iterator: If you can't copy an iterator, how can you determine the value to which the iterator refers without losing access to that value? Oren> Why not translate *what* they do instead of *how* they do it? Oren> I'm pretty sure the Python way would be shorter and simpler Oren> anyway. Maybe yes, maybe no. It would certainly be different, because the C++ algorithms generally assume that iterators support comparison operations. That assumption makes possible algorithms in C++ that are difficult to express at all using Python iterators as they stand. On the other hand, the availability of garbage collection in Python, combined with the dynamic nature of its type system, makes it possible to express algorithms in Python that cannot be expressed easily in C++ using C++ iterators as they now stand. Details about language design can have a profound effect on usage, which in turn has a profound effect on future design. -- Andrew Koenig, ark@research.att.com, http://www.research.att.com/info/ark
I'm thinking, in part, about how one might translate some of the C++ standard-library algorithms into Python. If that translation requires that the user always supply the original container, rather than using iterators only, then some algorithms become harder to express or less ueful.
Indeed. There's a whole slew of interesting things you can do with iterators that means you won't have a container, only an iterator. For example, you can define "iterator algebra" functions that take iterators and return iterators. A simple example is this generator, which yields alternating elements of a given iterator. def alternating(it): while 1: yield it.next() it.next() The nice thing is that you can combine these easily. For example alternating(alternating(it)) would yield every 4th element. It would be a pity if the results of iterator algebra operations would not be acceptable to Andrew's proposed algorithm library. --Guido van Rossum (home page: http://www.python.org/~guido/)
"AK" == Andrew Koenig <ark@research.att.com> writes:
AK> You are assuming that you still have access to the original AK> iterable object. But what if all you have is an iterator? AK> Then you need to be able to ask the iterator for a new AK> iterator. Would it be useful to add to the interator "interface" a method which would retrieve the original iterable object? I've no idea what that method should be called, but it seems like it would be trivial to add since most (all?) iterators have a pointer to their underlying object anyway, don't they? -Barry
Would it be useful to add to the interator "interface" a method which would retrieve the original iterable object? I've no idea what that method should be called, but it seems like it would be trivial to add since most (all?) iterators have a pointer to their underlying object anyway, don't they?
No. The (important!) class of generator-iterators does not have an underlying container object. --Guido van Rossum (home page: http://www.python.org/~guido/)
"GvR" == Guido van Rossum <guido@python.org> writes:
>> Would it be useful to add to the interator "interface" a method >> which would retrieve the original iterable object? I've no >> idea what that method should be called, but it seems like it >> would be trivial to add since most (all?) iterators have a >> pointer to their underlying object anyway, don't they? GvR> No. The (important!) class of generator-iterators does not GvR> have an underlying container object. Yup, but in that case I think it would be fine if it.gimme_the_underlying_iteratable_object() returned None. It still may be useless. ;) -Barry
Barry> Would it be useful to add to the interator "interface" a method Barry> which would retrieve the original iterable object? What if there isn't one?
"AK" == Andrew Koenig <ark@research.att.com> writes:
Barry> Would it be useful to add to the interator "interface" a Barry> method which would retrieve the original iterable object? AK> What if there isn't one? Barry> The method would return None. But then you can't rely on it. That is, if you want to write code that depends on the ability to retrieve the original iterable, you have to give up the ability for that code to work on generators, for example. I'm not saying it's not a useful thing to have; I'm just saying it might not be as useful as it appears at first.
"AK" == Andrew Koenig <ark@research.att.com> writes:
Barry> Would it be useful to add to the interator "interface" a Barry> method which would retrieve the original iterable object? AK> What if there isn't one? Barry> The method would return None. AK> But then you can't rely on it. That is, if you want to write AK> code that depends on the ability to retrieve the original AK> iterable, you have to give up the ability for that code to AK> work on generators, for example. AK> I'm not saying it's not a useful thing to have; I'm just AK> saying it might not be as useful as it appears at first. I'm not sure it's even useful at all, e.g. I've never had a use for it. But if you have code that depends on the ability to retrieve the original iterable, and you have iterators for which there /is no/ original iterable, it doesn't matter how you spell it, you're going to have to special case around that fact. -Barry
I have just submitted a patch that makes a file into an iterator (i.e. adds a .next method to files). With this change all Python objects that have an __iter__ method and no next method produce iterators that do not modify the container. Another possibility would be to make file iterators that use seek or re-open the file to avoid modifying the file position of the parent file object. I don't think that would be a good idea because files can be devices, pipes or sockets which are not seekable.
Cute trick, but I think it's too fragile. You don't know about 3rd party iterables that have the same problem as file. --Guido van Rossum (home page: http://www.python.org/~guido/)
On Fri, Jul 12, 2002 at 08:46:26AM -0400, Guido van Rossum wrote:
I have just submitted a patch that makes a file into an iterator (i.e. adds a .next method to files). With this change all Python objects that have an __iter__ method and no next method produce iterators that do not modify the container. Another possibility would be to make file iterators that use seek or re-open the file to avoid modifying the file position of the parent file object. I don't think that would be a good idea because files can be devices, pipes or sockets which are not seekable.
Cute trick, but I think it's too fragile. You don't know about 3rd party iterables that have the same problem as file.
I don't understand what you mean by fragile. I'm not suggesting anything that actually depends on this behavior so I don't see what could break. I think it's semantically cleaner for iterable objects to produce iterators that do not modify the state of the original iterable object. There's no way to force extension writers to adhere to this but Python should at least set a good example. Python file objects are not a good example. The xrange object that was its own iterator was not a good example. Oren
I have just submitted a patch that makes a file into an iterator (i.e. adds a .next method to files). With this change all Python objects that have an __iter__ method and no next method produce iterators that do not modify the container. Another possibility would be to make file iterators that use seek or re-open the file to avoid modifying the file position of the parent file object. I don't think that would be a good idea because files can be devices, pipes or sockets which are not seekable.
Cute trick, but I think it's too fragile. You don't know about 3rd party iterables that have the same problem as file.
I don't understand what you mean by fragile. I'm not suggesting anything that actually depends on this behavior so I don't see what could break.
If nothing depends on it, what's the point?
I think it's semantically cleaner for iterable objects to produce iterators that do not modify the state of the original iterable object.
Too bad. Files are the only first but certainly not the only example, and saying it's cleaner doesn't make it so.
There's no way to force extension writers to adhere to this but Python should at least set a good example. Python file objects are not a good example. The xrange object that was its own iterator was not a good example.
That version of the xrange object was broken. I don't see what's wrong with the file object. Iterating over a file changes the file's state, that's just a fact of life. --Guido van Rossum (home page: http://www.python.org/~guido/)
On Fri, Jul 12, 2002 at 09:36:22AM -0400, Guido van Rossum wrote:
I don't understand what you mean by fragile. I'm not suggesting anything that actually depends on this behavior so I don't see what could break.
If nothing depends on it, what's the point?
To satisfy my perverted obsession for semantic hygiene, of course!
That version of the xrange object was broken.
That's exactly my point. There will be more broken code like this as long as people keep confusing iterators and iterables. Making the language semantically cleaner should help prevent things like this in the long run. I remember it was pretty hard to actually convince anyone that xrange was broken. When I pointed out that the xrange 'iterator' modified the state of the xrange 'container' people responded that it's ok because this happens with file objects, too...
I don't see what's wrong with the file object. Iterating over a file changes the file's state, that's just a fact of life.
A file object is an iterator pretending to be a container. For historical reasons it uses 'readline' instead of 'next' and an empty string instead of StopIteration but it basically does the same job. A file object is not really a container that can produce iterators of itself. Oren
I think this thread is ready to die.
That version of the xrange object was broken.
That's exactly my point. There will be more broken code like this as long as people keep confusing iterators and iterables. Making the language semantically cleaner should help prevent things like this in the long run.
I don't think that the language can help this. There's nothing oyu can do to remove the wart from file objects.
I remember it was pretty hard to actually convince anyone that xrange was broken.
Huh? IIRC I said it was broken right away and pushed Raymond to fix it.
When I pointed out that the xrange 'iterator' modified the state of the xrange 'container' people responded that it's ok because this happens with file objects, too...
A confusion that you don't stamp out by "fixing" files.
I don't see what's wrong with the file object. Iterating over a file changes the file's state, that's just a fact of life.
A file object is an iterator pretending to be a container.
In what sense does it pretend to be a container? File objects are what they are; they have rich semantics for a reason.
For historical reasons it uses 'readline' instead of 'next' and an empty string instead of StopIteration but it basically does the same job. A file object is not really a container that can produce iterators of itself.
I think this thread is ready to die. --Guido van Rossum (home page: http://www.python.org/~guido/)
Oren Tirosh <oren-py-d@hishome.net>:
A file object is an iterator pretending to be a container. For historical reasons it uses 'readline' instead of 'next'
I think it's more complicated than that. If the file object were to become an object obeying the iterator protocol, its next() method should really return the next *byte* of the file. Then you'd still want methods like read(), readline() etc. for reading in larger chunks. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+
I think it's more complicated than that. If the file object were to become an object obeying the iterator protocol, its next() method should really return the next *byte* of the file. Then you'd still want methods like read(), readline() etc. for reading in larger chunks.
I don't think so. We should pick the most convenient chunking for the default iterator, and provide explicit ways to ask for other iterators (like dict iterators). Also, since "for line in file" already works, there's a strong precedent for iterating by line by default. --Guido van Rossum (home page: http://www.python.org/~guido/)
Guido:
Me:
If the file object were to become an object obeying the iterator protocol, its next() method should really return the next *byte* of the file.
I don't think so. We should pick the most convenient chunking for the default iterator
But we're talking here about making the file object *be* an iterator itself, not just have a "default iterator". If that's to happen, all the other ways of iterating over a file ought to be implemented on top of the basic iteration facility provided by the file object -- lest we get unfortunate interactions between the different iteration methods a la xreadlines(). To me, this implies that the file object must iterate by bytes. I'm not necessarily advocating this, just following the idea to its logical conclusion. If the conclusion is distasteful, maybe that's a sign that the idea (i.e. making file objects into iterators) isn't so good in the first place. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+
On Tue, Jul 16, 2002, Greg Ewing wrote:
Guido:
Greg:
If the file object were to become an object obeying the iterator protocol, its next() method should really return the next *byte* of the file.
I don't think so. We should pick the most convenient chunking for the default iterator
But we're talking here about making the file object *be* an iterator itself, not just have a "default iterator". If that's to happen, all the other ways of iterating over a file ought to be implemented on top of the basic iteration facility provided by the file object -- lest we get unfortunate interactions between the different iteration methods a la xreadlines(). To me, this implies that the file object must iterate by bytes.
"Practicality beats purity" -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Project Vote Smart: http://www.vote-smart.org/
I don't think so. We should pick the most convenient chunking for the default iterator
But we're talking here about making the file object *be* an iterator itself, not just have a "default iterator". If that's to happen, all the other ways of iterating over a file ought to be implemented on top of the basic iteration facility provided by the file object -- lest we get unfortunate interactions between the different iteration methods a la xreadlines(). To me, this implies that the file object must iterate by bytes.
Why should all iteration over a file be defined in terms of its basic iteration? I don't see that as dogma. --Guido van Rossum (home page: http://www.python.org/~guido/)
On Fri, 12 Jul 2002, Guido van Rossum wrote:
I don't see what's wrong with the file object. Iterating over a file changes the file's state, that's just a fact of life.
That's exactly the point. Iterators and containers are different. Walking over a container shouldn't mutate it, whereas an iterator has mutable state independent of the container. The key problem is that the file's __iter__ method returns something whose state depends on the file, thus breaking this expectation. Either __iter__ should be implemented to fulfill its commitment, or there shouldn't be an __iter__ method on files at all. I'm not suggesting that the semantics of files themselves are "broken" or have a "wart" that needs to be fixed -- merely that we should decide on a place for files to live in our world of containers and iterators, so we can set and maintain consistent expectations. -- ?!ng
On Fri, 12 Jul 2002, Guido van Rossum wrote:
I don't see what's wrong with the file object. Iterating over a file changes the file's state, that's just a fact of life.
[Ping]
That's exactly the point. Iterators and containers are different. Walking over a container shouldn't mutate it, whereas an iterator has mutable state independent of the container.
The key problem is that the file's __iter__ method returns something whose state depends on the file, thus breaking this expectation. Either __iter__ should be implemented to fulfill its commitment, or there shouldn't be an __iter__ method on files at all.
What commitment? Iterators don't have to have an undelying container! (E.g. generators.)
I'm not suggesting that the semantics of files themselves are "broken" or have a "wart" that needs to be fixed -- merely that we should decide on a place for files to live in our world of containers and iterators, so we can set and maintain consistent expectations.
What are your expectations? I think that both file.__iter__() returning file (as it does with Oren's patch) or file.__iter__() returning an xreadlines object (as it still does in CVS) are fine as far as reasonable expectations for iterators go. --Guido van Rossum (home page: http://www.python.org/~guido/)
participants (23)
-
Aahz
-
Alex Martelli
-
Andrew Koenig
-
barry@zope.com
-
Clark C . Evans
-
David Abrahams
-
Fredrik Lundh
-
Gordon McMillan
-
Greg Ewing
-
Guido van Rossum
-
Just van Rossum
-
Ka-Ping Yee
-
Kevin Jacobs
-
Neal Norwitz
-
Neil Schemenauer
-
Oren Tirosh
-
Patrick K. O'Brien
-
Paul Svensson
-
pinard@iro.umontreal.ca
-
Steve Holden
-
Tim Peters
-
Tim Peters
-
Tim Peters