Re: [Python-Dev] Single- vs. Multi-pass iterability

Can you remind me of your definition of "iterable"? Mine is "something for which iter() works", which clearly isn't yours. :-)
Right -- I mean something closer to what I've seen others call "a container". By your definition, iterators are indeed iterable. I would love for all iterables-by-your-definition to divide neatly into iterators and what-many-call-containers.
The file object, unless you make it into an iterator, is not "a container" like all others and just sits there -- a bit of a wart.
I must be misunderstanding. How does making the file object into an iterator make it a container???
f.seek does cooperate with f.next now, doesn't it? since it invalidates f's xreadlines object, if any?
Not yet. You may have seen Oren's patch for this. Unfortunately it
Right -- that's what I had in mind. I had also tweaked it so that readline sort of interoperated with it (delegating to next if the file object is holding an xreadlines object) and sent the modified patch to Oren but he disliked it (because it meant readline would not respect its numeric argument, if any, in that case).
Hm, you should've sent it to me. The numeric argument was a mistake I think. Who ever uses it? --Guido van Rossum (home page: http://www.python.org/~guido/)

On Wednesday 17 July 2002 06:40 pm, Guido van Rossum wrote: ...
The file object, unless you make it into an iterator, is not "a container" like all others and just sits there -- a bit of a wart.
I must be misunderstanding. How does making the file object into an iterator make it a container???
My fault for unclear expression! I mean: if it's an iterator, it's an iterator. All OTHER iterables (iterables that aren't iterators) are (what some call) containers. It's not QUITE that way, but Python would be easier to teach if it were.
to Oren but he disliked it (because it meant readline would not respect its numeric argument, if any, in that case).
Hm, you should've sent it to me. The numeric argument was a mistake I think. Who ever uses it?
Not me, and I think it's advisory anyway according to the docs. Still, it doesn't solve the reference-loop-between-two-deuced-things- that-don't-cooperate-with-gc problem. And I can't see how either could be made into a WEAK reference given that xreadlines objects in other contexts need to hold a strong ref to the file they work on -- we'd have to refactor xreadlines objects too, a core part holding a weak ref and a shell around it (holding a strong ref to the file) to support ordinary calls to xreadlines.xreadlines. Messy:-(. Alex

The file object, unless you make it into an iterator, is not "a container" like all others and just sits there -- a bit of a wart.
I must be misunderstanding. How does making the file object into an iterator make it a container???
My fault for unclear expression! I mean: if it's an iterator, it's an iterator. All OTHER iterables (iterables that aren't iterators) are (what some call) containers.
It's not QUITE that way, but Python would be easier to teach if it were.
But leaving the file object as an exception to the rule helps as a reminder that it's just a rule of thumb and cannot be taken as absolute law.
to Oren but he disliked it (because it meant readline would not respect its numeric argument, if any, in that case).
Hm, you should've sent it to me. The numeric argument was a mistake I think. Who ever uses it?
Not me, and I think it's advisory anyway according to the docs.
Still, it doesn't solve the reference-loop-between-two-deuced-things- that-don't-cooperate-with-gc problem. And I can't see how either could be made into a WEAK reference given that xreadlines objects in other contexts need to hold a strong ref to the file they work on -- we'd have to refactor xreadlines objects too, a core part holding a weak ref and a shell around it (holding a strong ref to the file) to support ordinary calls to xreadlines.xreadlines. Messy:-(.
I don't think that a weak ref to the file would be sufficient for xreadlines -- e.g. for line in open(filename): print line, would close the file right away. Likewise, the file needs a strong ref to the xreadlines, otherwise the following would create a new iterator in the second for loop, and lose data buffered by the first iterator. f = open(filename) it = iter(f) for i in range(10): it.next() del it for line in f: print line, I think I will have to reject Oren's patch because of this, and the situation with file iterators will remain as it is: once you've asked for the iterator, all operations on the file are unsafe, and the only way to get back to using the file is to abandon the file and do an absolute seek on the file. (This is sort of like switching between the raw integer file descriptor and the stream object in C -- or in Python if you care to use f.fileno() and os.read() etc.) --Guido van Rossum (home page: http://www.python.org/~guido/)

On Wednesday 17 July 2002 07:38 pm, Guido van Rossum wrote: ...
But leaving the file object as an exception to the rule helps as a reminder that it's just a rule of thumb and cannot be taken as absolute law.
The sublunar world has enough reminders of its imperfections that we need not strive to add more.
Still, it doesn't solve the reference-loop-between-two-deuced-things- that-don't-cooperate-with-gc problem. And I can't see how either could be made into a WEAK reference given that xreadlines objects in other contexts need to hold a strong ref to the file they work on -- we'd have to refactor xreadlines objects too, a core part holding a weak ref and a shell around it (holding a strong ref to the file) to support ordinary calls to xreadlines.xreadlines. Messy:-(.
I don't think that a weak ref to the file would be sufficient for xreadlines -- e.g.
for line in open(filename): print line,
would close the file right away.
If the iterator were the file itself, no it wouldn't, whatever kind of ref the xreadlines object had to the file. What would break without refactoring would be: for line in xreadlines.xreadlines(open(filename)): ... The refactoring would be to have a, say _xreadlines, object, with the functionality of today's xreadlines object BUT a weak ref to the file, and an xreadlines object with strong refs to the file and the _xreadlines object and delegating functionality to the latter. A bit of a mess.
Likewise, the file needs a strong ref to the xreadlines, otherwise the
Definitely! Otherwise nothing keeps the xreadlines (or _xreadlines) object around _at all_ -- it's even worse than you indicate below, it seems to me:
following would create a new iterator in the second for loop, and lose data buffered by the first iterator.
f = open(filename) it = iter(f)
...with the patch it would be "it is f", and so, I don't really get it...
for i in range(10): it.next() del it for line in f: print line,
I think I will have to reject Oren's patch because of this, and the situation with file iterators will remain as it is: once you've asked for the iterator, all operations on the file are unsafe, and the only way to get back to using the file is to abandon the file and do an
Abandon the iterator, you mean? Or am I hopelessly confused?
absolute seek on the file. (This is sort of like switching between the raw integer file descriptor and the stream object in C -- or in Python if you care to use f.fileno() and os.read() etc.)
In these cases you do get some control on the buffering, though, if you care to exercise it. Alex

OK, I'll wait to see if someone submits a working patch. I still find it a non-issue myself. --Guido van Rossum (home page: http://www.python.org/~guido/)

On Wednesday 17 July 2002 08:07 pm, Guido van Rossum wrote:
OK, I'll wait to see if someone submits a working patch. I still find it a non-issue myself.
OK, I'm gonna give it a try -- kludging up Oren's patch so that the xreadlines object is able to hold a non-addref'd pointer to the file object (when it's for internal use of the file object) and, as long as I'm at it, also including the little further kludge that makes f.readline delegate to f.next if f is holding an xreadlines object. Oh, and dropping the xreadlines object on a seek, too. It's just a few lines' changes to two files after all, Objects/fileobject.c and Modules/xreadlines.c. A bit kludgey and tricky, admittedly, which is perhaps not the nicest thing in the world given that fileobject.c isn't the shortest, simplest, or least crucial part of Python. But anyway, I think I'll have it ready by early tomorrow my time (it's past midnight and I'm past the age for all-nighters:-). Alex

On Thursday 18 July 2002 12:06 am, Alex Martelli wrote:
On Wednesday 17 July 2002 08:07 pm, Guido van Rossum wrote:
OK, I'll wait to see if someone submits a working patch. I still find it a non-issue myself.
OK, I'm gonna give it a try -- kludging up Oren's patch so that
Done, now submitted as patch 583235. Alex

Guido> Likewise, the file needs a strong ref to the xreadlines, Guido> otherwise the following would create a new iterator in the Guido> second for loop, and lose data buffered by the first iterator. Guido> f = open(filename) Guido> it = iter(f) Guido> for i in range(10): Guido> it.next() Guido> del it Guido> for line in f: Guido> print line, Guido> I think I will have to reject Oren's patch because of this, and Guido> the situation with file iterators will remain as it is: once Guido> you've asked for the iterator, all operations on the file are Guido> unsafe, and the only way to get back to using the file is to Guido> abandon the file and do an absolute seek on the file. This implies that you don't expect the code above to work correctly, right? -- Andrew Koenig, ark@research.att.com, http://www.research.att.com/info/ark

Guido> Likewise, the file needs a strong ref to the xreadlines, Guido> otherwise the following would create a new iterator in the Guido> second for loop, and lose data buffered by the first iterator.
Guido> f = open(filename) Guido> it = iter(f) Guido> for i in range(10): Guido> it.next() Guido> del it Guido> for line in f: Guido> print line,
Guido> I think I will have to reject Oren's patch because of this, and Guido> the situation with file iterators will remain as it is: once Guido> you've asked for the iterator, all operations on the file are Guido> unsafe, and the only way to get back to using the file is to Guido> abandon the file and do an absolute seek on the file.
This implies that you don't expect the code above to work correctly, right?
I think that Oren's patch would make this work (the iterator requested by the second for loop would return the same iterator as the first one, since it's cached in the file object), but at the cost of an unbreakable cycle between the file and the xreadlines object. --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido:
Likewise, the file needs a strong ref to the xreadlines, otherwise the following would create a new iterator in the second for loop, and lose data buffered by the first iterator.
To me, these problems are screaming out that the buffer *shouldn't* be kept in the xreadlines object! Maybe the xreadlines object's buffer should be kept in the file object? Then it wouldn't matter if multiple xreadlines objects were created, as they'd all share the same buffer, and there would be no reference loops. Hmmm... then we're moving towards making the file object and the xreadlines object be the same object. What was the reason for not doing that again? Was it just to avoid changing a lot of code, or was there some reason it wouldn't work? Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+

Note that it's easy to make objects cooperate with gc. We've historically only done so when the need was clear, because the gc header takes about a dozen extra bytes per gc-tracked object. There aren't enough files or xreadlines objects in existence to care about the extra memory burden here, though; we simply thought that objects of these types could never be in cycles. OTOH, if that means lazy code like for fname in os.listdir('.'): for line in file(fname): n += 1 would accumulate an ever-growing number of open file objects until gc happened to run and break cycles, I expect a lot of CPython programs would "suddenly break" (they rely on refcount semantics now closing the anonymous file object the instant it becomes unreachable).

Alex Martelli <aleax@aleax.it>:
Still, it doesn't solve the reference-loop-between-two-deuced-things- that-don't-cooperate-with-gc problem.
Would making them cooperate with GC be a difficult thing to do? Seems to me we should be moving towards making everything cooperate with GC, and fixing things like this whenever they come to light. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+

On Thursday 18 July 2002 01:48 am, Greg Ewing wrote:
Alex Martelli <aleax@aleax.it>:
Still, it doesn't solve the reference-loop-between-two-deuced-things- that-don't-cooperate-with-gc problem.
Would making them cooperate with GC be a difficult thing to do? Seems to me we should be moving towards making everything cooperate with GC, and fixing things like this whenever they come to light.
Tim Peters says it wouldn't be, but I have not explored that. Alex

Would making them cooperate with GC be a difficult thing to do? Seems to me we should be moving towards making everything cooperate with GC, and fixing things like this whenever they come to light.
Tim Peters says it wouldn't be, but I have not explored that.
But he also warned that it introduces new surprises. --Guido van Rossum (home page: http://www.python.org/~guido/)
participants (5)
-
Alex Martelli
-
Andrew Koenig
-
Greg Ewing
-
Guido van Rossum
-
Tim Peters