Iterators, map, xreadlines and docs
A confused user on c.l.py reported that while for x in file.xreadlines(): works fine, map(whatever, file.xreadlines()) blows up with TypeError: argument 2 to map() must be a sequence object The docs say both contexts require "a sequence", so this is baffling to them. It's apparently because map() internally insists that the sq_length slot be non-null (but it's null in the xreadlines object), despite that map() doesn't use it for anything other than *guessing* a result size (it keeps going until IndexError is raised regardless of what len() returns, growing or shrinking the preallocated result list as needed). I think that's a bug in map(). Anyone disagree? If so, fine, map() has to be changed to work with iterators anyway <wink>. How are we going to identify all the places that need to become iterator-aware, get all the code changed, and update the docs to match? In effect, a bunch of docs for arguments need to say, in some words or other, that the args must implement the iterator interface or protocol. I think it's essential that we define the latter only once. But the docs don't really define any interfaces/protocols now, so it's unclear where to put that. Fred, Pronounce. Better sooner than later, else I bet a bunch of code changes will get checked in without appropriate doc changes, and the 2.2 docs won't match the code.
On Sat, 28 Apr 2001 03:24:48 -0400, "Tim Peters" <tim.one@home.com> wrote:
A confused user on c.l.py reported that while
for x in file.xreadlines():
works fine,
map(whatever, file.xreadlines())
blows up with
TypeError: argument 2 to map() must be a sequence object ... I think that's a bug in map(). Anyone disagree?
I agree...but when we talked about it in c.l.py, I said that and you said map() should be deprecated. Why the sudden change of heart? why shouldn't it be changed to [whatever(x) for x in file.xreadlines()]? -- "I'll be ex-DPL soon anyway so I'm |LUKE: Is Perl better than Python? looking for someplace else to grab power."|YODA: No...no... no. Quicker, -- Wichert Akkerman (on debian-private)| easier, more seductive. For public key, finger moshez@debian.org |http://www.{python,debian,gnu}.org
...
I think that's a bug in map(). Anyone disagree?
[Moshe]
I agree...but when we talked about it in c.l.py, I said that and you said map() should be deprecated. Why the sudden change of heart?
This doesn't ring a bell. I don't even recall *seeing* a msg from you in the .xreadlines() vs map() thread ...
why shouldn't it be changed to
[whatever(x) for x in file.xreadlines()]?
I'm not keen to eradicate map() from the face of the Earth regardless, and until it *is* deprecated (if ever), it's supported. But this is all moot since I already checked in the map() fix. How to deal with the iterator docs is still an important issue.
Tim Peters writes:
effect, a bunch of docs for arguments need to say, in some words or other, that the args must implement the iterator interface or protocol. I think it's essential that we define the latter only once. But the docs don't really define any interfaces/protocols now, so it's unclear where to put that.
Won't there be at least one standard iterator object defined for lists, etc.? That could be described in the built-in types section (as with files, lists, etc.) of the Library Reference. That will be used as the definition of the iterator protocol in the same way the file object description there is referred to from places that want file or file-like objects. I think we need some re-organization of the built-in types section to separate abstract protocols from specific implementations, but that's an orthagonal aspect and can be handled at the same time as the rest of the built-in types. Specific changes for places that accept iterators should be made as the code is changed, as usual. Please describe the changes clearly in checkin messages so iterator related changes don't propogate to the maintenance branch. -Fred -- Fred L. Drake, Jr. <fdrake at acm.org> PythonLabs at Digital Creations
[Fred]
Won't there be at least one standard iterator object defined for lists, etc.?
iter([]) <iterator object at 00792640>
iter({}) <dictionary-iterator object at 00793A40>
iter(()) <iterator object at 00792640>
import sys iter(sys.stdin) <callable-iterator object at 00793FE0>
class C: ... def __iter__(self): return self ... iter(C()) <__main__.C instance at 007938EC>
What do those have in common? Objects and types are the wrong way to approach this one: it's really of no interest that, e.g., iter(list) and iter(dict) return objects of different types; what *is* interesting is that iter(whatever) returns *an* object that conforms to the iterator protocol (or "implements the iterator interface" -- all the same to me). You can give *examples* of builtin iterator types that conform to the protocol, but the protocol needs to be defined for its own sake first. The protocol is fundamental, and is neither an object nor a type.
That could be described in the built-in types section (as with files, lists, etc.) of the Library Reference. That will be used as the definition of the iterator protocol in the same way the file object description there is referred to from places that want file or file-like objects.
"file-like objects" is the bad doc experience I'm hoping we don't repeat. The phrase "file-like object" is indeed used freely in the docs, but it's not (AFAICT) *defined* anywhere, and doesn't even appear in the index. Besides, the notion that "file-like object" refers to section Buitin-in Types, Exceptions and Functions -> Other Built-in Types -> File Objects was news to me. I see the individual method descriptions there sometimes refer to "file-like objects", and other times "objects implementing a file-like interface". The latter phrase appears uniquely in the description of .readlines(), and may be the clearest explanation in the docs of wtf "file-like object" means. If so, it shouldn't be buried in the bowels of one method description.
I think we need some re-organization of the built-in types section to separate abstract protocols from specific implementations,
Yes.
but that's an orthagonal aspect and can be handled at the same time as the rest of the built-in types.
I assume you're thinking of creating a new "Iterator Objects" section under "Other Built-in Types"? That would work for me if it *started* with a description of the iterator interface/protocol. There's a twist, though: iterators need to be defined already in the Language Reference manual (we can't explain for-loops without them anymore).
Specific changes for places that accept iterators should be made as the code is changed, as usual. Please describe the changes clearly in checkin messages so iterator related changes don't propogate to the maintenance branch.
We need an example to build on -- this is too abstract for me (which is saying something <wink>). For example, today we have: list(sequence) Return a list whose items are the same and in the same order as sequence's items. If sequence is already a list, a copy is made and returned, similar to sequence[:]. For instance, list('abc') returns returns ['a', 'b', 'c'] and list( (1, 2, 3) ) returns [1, 2, 3]. list() doesn't yet work with iterators, but surely will. What do we want the docs to say after it changes? Should it be implicit or explicit that "sequence" now means "sequence or sequence-like object"? Where is the connection between "sequence-like object" and "iterable" explained? Perhaps what's really needed is s/sequence/iterable/ in this description. But then where is "iterable" defined? Solve this once and the rest should follow easily. But solving it the first time doesn't look easy to me. That's why I'm bugging you now.
On Sat, 28 Apr 2001, Tim Peters wrote:
What do those have in common? Objects and types are the wrong way to approach this one: it's really of no interest that, e.g., iter(list) and iter(dict) return objects of different types; what *is* interesting is that iter(whatever) returns *an* object that conforms to the iterator protocol (or "implements the iterator interface" -- all the same to me).
I have added a notional iter interface to the PEP 245 prototype and will be making another release of it later tonight.
"file-like objects" is the bad doc experience I'm hoping we don't repeat. The phrase "file-like object" is indeed used freely in the docs, but it's not (AFAICT) *defined* anywhere, and doesn't even appear in the index. Besides, the notion that "file-like object" refers to section
Buitin-in Types, Exceptions and Functions -> Other Built-in Types -> File Objects
was news to me. I see the individual method descriptions there sometimes refer to "file-like objects", and other times "objects implementing a file-like interface". The latter phrase appears uniquely in the description of .readlines(), and may be the clearest explanation in the docs of wtf "file-like object" means. If so, it shouldn't be buried in the bowels of one method description.
245 takes a couple stabs at File interfaces, trying to satisfy the bare-bones kind of file, a Python File, StringI, StringO etc...
I think we need some re-organization of the built-in types section to separate abstract protocols from specific implementations,
Yes.
FYI, 245 defines the following interfaces. Some of them may be wrong, I took most of this straight from the Pydocs and stuff done by Jim: Mutable Comparable Orderable(Comparable) Hashable Hashkey(Comparable, Hashable) Membership Mapping Sized MutableMapping(Mutable) Sequence(Mapping) Sequential(Sequential) Type Null String(Sequence, Sized) Tuple(Sequence, Sized) List(Mapping, MutableMapping, Sequence, Sized) Dictionary(Mapping, MutableMapping, Sized) File - and the various specific file functionality Module Class Function InstanceMethod Exception Number - Real, Compex, Exact, others.... Here are some examples from the current prototype. The primary utility function from PEP 245 is 'does': Python 2.1 (#1, Apr 22 2001, 06:33:07) [GCC egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)] on linux2 Type "copyright", "credits" or "license" for more information. >>> from Interface import does 'does' can be used two ways. The first way is to pass imp an object and it will return the interfaces that the object implements: >>> does({}) [<Interface Dictionary at 82e288c>] >>> does([]) [<Interface List at 82c71f4>] >>> does(sys.stdin) [<Interface PyFile at 8261e54>] >>> def f(): pass ... >>> does(f) [<Interface Function at 82ff134>] Here, we see that a dictionary and a list do the 'Dictionary' and 'List' interfaces respectively, and that files and functions also implement interfaces. 'does' can also be used with another argument, to ask whether the object implements a certain interface: >>> from Interface.Protocols import Mapping, Sequence, Mutable >>> from Interface.File import File >>> does({}, Mapping) 1 >>> does([], Sequence) 1 >>> does((), Mutable) 0 >>> does({}, Dictionary) 1 >>> does(sys.stdin, File) 1 Note that PEP 245 requires NO changes to Python. You can download it now and try this stuff out. -Michel
On Mon, 30 Apr 2001, Michel Pelletier wrote:
On Sat, 28 Apr 2001, Tim Peters wrote:
I have added a notional iter interface to the PEP 245 prototype and will be making another release of it later tonight.
I forgot to mention that you can get the previous release (without iter) here: http://www.zope.org/Members/michel/InterfacesPEP/Interface.tgz -Michel
participants (4)
-
Fred L. Drake, Jr.
-
Michel Pelletier
-
Moshe Zadka
-
Tim Peters