I think I've actually found a syntax for lockstep iteration that looks reasonable (or at least not completely unreasonable) and is backward compatible: for (x in a, y in b): ... Not sure what the implications are for the parser yet. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+
for (x in a, y in b): ...
Hmmm. Until someone smarter than me shoots it down for some obvious reason <wink>, it certainly appeals to me. My immediate reaction _is_ lockstep iteration, and that is the first time I can say that. Part of the reason is that it looks like a tuple unpack, which I think of as a "lockstep/parallel/atomic" operation... Mark.
On Wed, Aug 09, 2000 at 04:39:30PM +1000, Mark Hammond wrote:
for (x in a, y in b): ...
Hmmm. Until someone smarter than me shoots it down for some obvious reason <wink>, it certainly appeals to me.
The only objection I can bring up is that parentheses are almost always optional, in Python, and this kind of violates it. Suddenly the presence of parentheses changes the entire expression, not just the grouping of it. Oh, and there is the question of whether 'for (x in a):' is allowed, too (it isn't, currently.) I'm not entirely sure that the parser will swallow this, however, because 'for (x in a, y in b) in z:' *is* valid syntax... so it might be ambiguous. Then again, it can probably be worked around. It might not be too pretty, but it can be worked around ;) -- Thomas Wouters <thomas@xs4all.net> Hi! I'm a .signature virus! copy me into your .signature file to help me spread!
Thomas Wouters <thomas@xs4all.net>:
The only objection I can bring up is that parentheses are almost always optional, in Python, and this kind of violates it.
They're optional around tuple constructors, but this is not a tuple constructor. The parentheses around function arguments aren't optional either, and nobody complains about that.
'for (x in a, y in b) in z:' *is* valid syntax...
But it's not valid Python:
for (x in a, y in b) in z: ... print x,y ... SyntaxError: can't assign to operator
It might not be too pretty, but it can be worked around ;)
It wouldn't be any uglier than what's currently done with the LHS of an assignment, which is parsed as a general expression and treated specially later on. There's-more-to-the-Python-syntax-than-what-it-says-in- the-Grammar-file-ly, Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+
On Wed, 9 Aug 2000, Greg Ewing wrote:
for (x in a, y in b): ...
It looks nice, but i'm pretty sure it won't fly. (x in a, y in b) is a perfectly valid expression. For compatibility the parser must also accept for (x, y) in list_of_pairs: and since the thing after the open-paren can be arbitrarily long, how is the parser to know whether the lockstep form has been invoked? Besides, i think Guido has Pronounced quite firmly on zip(). I would much rather petition now to get indices() and irange() into the built-ins... please pretty please? -- ?!ng "All models are wrong; some models are useful." -- George Box
On Wed, 9 Aug 2000, Greg Ewing wrote:
for (x in a, y in b): ...
No, for exactly the reasons Ping explained. Let's give this a rest okay?
I would much rather petition now to get indices() and irange() into the built-ins... please pretty please?
I forget what indices() was -- is it the moreal equivalent of keys()? That's range(len(s)), I don't see a need for a new function. In fact I think indices() would reduce readability because you have to guess what it means. Everybody knows range() and len(); not everybody will know indices() because it's not needed that often. If irange(s) is zip(range(len(s)), s), I see how that's a bit unwieldy. In the past there were syntax proposals, e.g. ``for i indexing s''. Maybe you and Just can draft a PEP? --Guido van Rossum (home page: http://www.pythonlabs.com/~guido/)
Guido van Rossum wrote:
On Wed, 9 Aug 2000, Greg Ewing wrote:
for (x in a, y in b): ...
No, for exactly the reasons Ping explained. Let's give this a rest okay?
I would much rather petition now to get indices() and irange() into the built-ins... please pretty please?
I forget what indices() was -- is it the moreal equivalent of keys()?
indices() and irange() are both builtins which originated from mx.Tools. See: http://starship.python.net/crew/lemburg/mxTools.html * indices(object) is the same as tuple(range(len(object))) - only faster and using a more intuitive and less convoluted name. * irange(object[,indices]) (in its mx.Tools version) creates a tuple of tuples (index, object[index]). indices defaults to indices(object) if not given, otherwise, only the indexes found in indices are used to create the mentioned tuple -- and this even works with arbitrary keys, since the PyObject_GetItem() API is used. Typical use is: for i,value in irange(sequence): sequence[i] = value + 1 In practice I found that I could always use irange() where indices() would have been used, since I typically need the indexed sequence object anyway.
That's range(len(s)), I don't see a need for a new function. In fact I think indices() would reduce readability because you have to guess what it means. Everybody knows range() and len(); not everybody will know indices() because it's not needed that often.
If irange(s) is zip(range(len(s)), s), I see how that's a bit unwieldy. In the past there were syntax proposals, e.g. ``for i indexing s''. Maybe you and Just can draft a PEP?
-- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
At 7:49 AM -0500 09-08-2000, Guido van Rossum wrote:
In the past there were syntax proposals, e.g. ``for i indexing s''. Maybe you and Just can draft a PEP?
PEP: 1716099-3 Title: Index-enhanced sequence iteration Version: $Revision: 1.1 $ Owner: Someone-with-commit-rights Python-Version: 2.0 Status: Incomplete Introduction This PEP proposes a way to more conveniently iterate over a sequence and its indices. Features It adds an optional clause to the 'for' statement: for <index> indexing <element> in <seq>: ... This is equivalent to (see the zip() PEP): for <index>, <element> in zip(range(len(seq)), seq): ... Except no new list is created. Mechanism The index of the current element in a for-loop already exists in the implementation, however, it is not reachable from Python. The new 'indexing' keyword merely exposes the internal counter. Implementation Implementation should be trivial for anyone named Guido, Tim or Thomas. Justs better not try. Advantages: Less code needed for this common operation, which is currently most often written as: for index in range(len(seq)): element = seq[i] ... Disadvantages: It will break that one person's code that uses "indexing" as a variable name. Copyright This document has been placed in the public domain. Local Variables: mode: indented-text indent-tabs-mode: nil End:
On Wed, Aug 09, 2000 at 04:01:18PM +0100, Just van Rossum wrote:
PEP: 1716099-3 Title: Index-enhanced sequence iteration Version: $Revision: 1.1 $ Owner: Someone-with-commit-rights
I'd be willing to adopt this PEP, if the other two PEPs on my name don't need extensive rewrites anymore.
Features
It adds an optional clause to the 'for' statement:
for <index> indexing <element> in <seq>:
Ever since I saw the implementation of FOR_LOOP I've wanted this, but I never could think up a backwards compatible and readable syntax for it ;P
Disadvantages:
It will break that one person's code that uses "indexing" as a variable name.
This needn't be true, if it's done in the same way as Tim proposed the 'form from import as as as' syntax change ;) for_stmt: 'for' exprlist [NAME exprlist] 'in' testlist ':' suite ['else' ':' suite] If the 5th subnode of the expression is 'in', the 3rd should be 'indexing' and the 4th would be the variable to assign the index number to. If it's ':', the loop is index-less. (this is just a quick and dirty example; 'exprlist' is probably not the right subnode for the indexing variable, because it can't be a tuple or anything like that.) -- Thomas Wouters <thomas@xs4all.net> Hi! I'm a .signature virus! copy me into your .signature file to help me spread!
>> Disadvantages: >> It will break that one person's code that uses "indexing" as a >> variable name. Thomas> This needn't be true, if it's done in the same way as Tim Thomas> proposed the 'form from import as as as' syntax change ;) Could this be extended to many/most/all current instances of keywords in Python? As Tim pointed out, Fortran has no keywords. It annoys me that I (for example) can't define a method named "print". Skip
On Wed, Aug 09, 2000 at 11:40:27AM -0500, Skip Montanaro wrote:
>> Disadvantages:
>> It will break that one person's code that uses "indexing" as a >> variable name.
Thomas> This needn't be true, if it's done in the same way as Tim Thomas> proposed the 'form from import as as as' syntax change ;)
Could this be extended to many/most/all current instances of keywords in Python? As Tim pointed out, Fortran has no keywords. It annoys me that I (for example) can't define a method named "print".
No. I just (in the trainride from work to home ;) wrote a patch that adds 'from x import y as z' and 'import foo as fee', and came to the conclusion that we can't make 'from' a non-reserved word, for instance. Because if we change 'from' dotted_name 'import' NAME* into NAME dotted_name 'import' NAME* the parser won't know how to parse other expressions that start with NAME, like 'NAME = expr' or 'NAME is expr'. I know this because I tried it and it didn't work :-) So we can probably make most names that are *part* of a statement non-reserved words, but not those that uniquely identify a statement. That doesn't leave much words, except perhaps for the 'in' in 'for' -- but 'in' is already a reserved word for other purposes ;) As for the patch that adds 'as' (as a non-reserved word) to both imports, I'll upload it to SF after I rewrite it a bit ;) -- Thomas Wouters <thomas@xs4all.net> Hi! I'm a .signature virus! copy me into your .signature file to help me spread!
[Skip Montanaro]
Could this be extended to many/most/all current instances of keywords in Python? As Tim pointed out, Fortran has no keywords. It annoys me that I (for example) can't define a method named "print".
[Thomas Wouters]
No. I just (in the trainride from work to home ;) wrote a patch that adds 'from x import y as z' and 'import foo as fee', and came to the conclusion that we can't make 'from' a non-reserved word, for instance. Because if we change
'from' dotted_name 'import' NAME*
into
NAME dotted_name 'import' NAME*
the parser won't know how to parse other expressions that start with NAME, like 'NAME = expr' or 'NAME is expr'. I know this because I tried it and it didn't work :-) So we can probably make most names that are *part* of a statement non-reserved words, but not those that uniquely identify a statement. That doesn't leave much words, except perhaps for the 'in' in 'for' -- but 'in' is already a reserved word for other purposes ;)
Just a datapoint. JPython goes a bit further in its attempt to unreserve reserved words in certain cases: - after "def" - after a dot "." - after "import" - after "from" (in an import stmt) - and as argument names This allow JPython to do: from from import x def def(): pass x.exec(from=1, to=2) This feature was added to ease JPython's integration to existing java libraries. IIRC it was remarked that CPython could also make use of such a feature when integrating to f.ex Tk or COM. regards, finn
[Skip laments...]
Could this be extended to many/most/all current instances of keywords in Python? As Tim pointed out, Fortran has no keywords. It annoys me that I (for example) can't define a method named "print".
Sometimes it is worse than annoying! In the COM and CORBA worlds, it can be a showstopper - if an external object happens to expose a method or property named after a Python keyword, then you simply can not use it! This has lead to COM support having to check _every_ attribute name it sees externally, and mangle it if a keyword. A bigger support exists for .NET. The .NET framework explicitly dictates that a compliant language _must_ have a way of overriding its own keywords when calling external methods (it was either that, or try and dictate a union of reserved words they can ban) Eg, C# allows you to surround a keyword with brackets. ie, I believe something like: object.[if] Would work in C# to provide access to an attribute named "if" Unfortunately, Python COM is a layer ontop of CPython, and Python .NET still uses the CPython parser - so in neither of these cases is there a simple hack I can use to work around it at the parser level. Needless to say, as this affects the 2 major technologies I work with currently, I would like an official way to work around Python keywords! Mark.
[Skip laments...]
Could this be extended to many/most/all current instances of keywords in Python? As Tim pointed out, Fortran has no keywords. It annoys me that I (for example) can't define a method named "print".
Sometimes it is worse than annoying!
In the COM and CORBA worlds, it can be a showstopper - if an external object happens to expose a method or property named after a Python keyword, then you simply can not use it!
This has lead to COM support having to check _every_ attribute name it sees externally, and mangle it if a keyword.
A bigger support exists for .NET. The .NET framework explicitly dictates that a compliant language _must_ have a way of overriding its own keywords when calling external methods (it was either that, or try and dictate a union of reserved words they can ban)
Eg, C# allows you to surround a keyword with brackets. ie, I believe something like:
object.[if]
Would work in C# to provide access to an attribute named "if"
Unfortunately, Python COM is a layer ontop of CPython, and Python .NET still uses the CPython parser - so in neither of these cases is there a simple hack I can use to work around it at the parser level.
Needless to say, as this affects the 2 major technologies I work with currently, I would like an official way to work around Python keywords!
The JPython approach should be added to CPython. This effectively turns off keywords directly after ".", "def" and in a few other places. --Guido van Rossum (home page: http://www.pythonlabs.com/~guido/)
[Guido]
The JPython approach should be added to CPython. This effectively turns off keywords directly after ".", "def" and in a few other places.
Excellent. I saw a reference to this after I sent my mail. I'd be happy to help, in a long, drawn out, when-I-find-time kind of way. What is the general strategy - is it simply to maintain a state in the parser? Where would I start to look into? Mark.
[Guido]
The JPython approach should be added to CPython. This effectively turns off keywords directly after ".", "def" and in a few other places.
Excellent. I saw a reference to this after I sent my mail.
I'd be happy to help, in a long, drawn out, when-I-find-time kind of way. What is the general strategy - is it simply to maintain a state in the parser? Where would I start to look into?
Mark.
Alas, I'm not sure how easy it will be. The parser generator will probably have to be changed to allow you to indicate not to do a resword lookup at certain points in the grammar. I don't know where to start. :-( --Guido van Rossum (home page: http://www.pythonlabs.com/~guido/)
BDFL:
The parser generator will probably have to be changed to allow you to indicate not to do a resword lookup at certain points in the grammar.
Isn't it the scanner which recognises reserved words? In that case, just make it not do that for the next token after certain tokens. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+
"GvR" == Guido van Rossum <guido@beopen.com> writes:
GvR> Alas, I'm not sure how easy it will be. The parser generator GvR> will probably have to be changed to allow you to indicate not GvR> to do a resword lookup at certain points in the grammar. I GvR> don't know where to start. :-( Yet another reason why it would be nice to (eventually) merge the parsing technology in CPython and JPython. i-don't-wanna-work-i-jes-wanna-bang-on-my-drum-all-day-ly y'rs, -Barry
On Thu, 10 Aug 2000, Mark Hammond wrote:
Sometimes it is worse than annoying!
In the COM and CORBA worlds, it can be a showstopper - if an external object happens to expose a method or property named after a Python keyword, then you simply can not use it!
How about this (simple, but relatively unannoying) convention: To COM name: - remove last "_", if any
[Skip Montanaro]
Could this be extended to many/most/all current instances of keywords in Python? As Tim pointed out, Fortran has no keywords. It annoys me that I (for example) can't define a method named "print".
This wasn't accidental in Fortran, though: X3J3 spent many tedious hours fiddling the grammar to guarantee it was always possible. Python wasn't designed with this in mind, and e.g. there's no meaningful way to figure out whether raise is an expression or a "raise stmt" in the absence of keywords. Fortran is very careful to make sure such ambiguities can't arise. A *reasonable* thing is to restrict global keywords to special tokens that can begin a line. There's real human and machine parsing value in being able to figure out what *kind* of stmt a line represents from its first token. So letting "print" be a variable name too would, IMO, really suck. But after that, I don't think users have any problem understanding that different stmt types can have different syntax. For example, if "@" has a special meaning in "print" statments, big deal. Nobody splits a spleen over seeing a b, c, d when "a" happens to be "exec" or "print" today, despite that most stmts don't allow that syntax, and even between "exec" and "print" it has very different meanings. Toss in "global", "del" and "import" too for other twists on what the "b, c, d" part can look like and mean. As far as I'm concerned, each stmt type can have any syntax it damn well likes! Identifiers with special meaning *after* a keyword-introduced stmt can usually be anything at all without making them global keywords (be it "as" after "import", or "indexing" after "for", or ...). The only thing Python is missing then is a lisp stmt <wink>: lisp (setq a (+ a 1)) Other than that, the JPython hack looks cool too. Note that SSKs (stmt-specific keywords) present a new problem to colorizers (or moral equivalents like font-lock), and to other tools that do more than a trivial parse. the-road-to-p3k-has-toll-booths-ly y'rs - tim
On Wed, 9 Aug 2000, Just van Rossum wrote:
PEP: 1716099-3 Title: Index-enhanced sequence iteration [...] It adds an optional clause to the 'for' statement:
for <index> indexing <element> in <seq>: ... [...] Disadvantages:
It will break that one person's code that uses "indexing" as a variable name.
It creates a new 'for' variant, increasing challenge for beginners (and the befuddled, like me) of tracking the correct syntax. I could see that disadvantage being justified by a more significant change - lockstep iteration would qualify, for me (though it's circumventing this drawback with zip()). List comprehensions have that weight, and analogize elegantly against the existing slice syntax. I don't think the 'indexing' benefits are of that order, not enough so to double the number of 'for' forms, even if there are some performance gains over the (syntactically equivalent) zip(), so, sorry, but i'm -1. Ken klm@digicool.com
On Wed, Aug 09, 2000 at 04:01:18PM +0100, Just van Rossum wrote:
Features
It adds an optional clause to the 'for' statement:
for <index> indexing <element> in <seq>: ...
Implementation
Implementation should be trivial for anyone named Guido, Tim or Thomas.
Well, to justify that vote of confidence <0.4 wink> I wrote a quick hack that adds this to Python for loops. It can be found on SF, patch #101138. It's small, but it works. I'll iron out any bugs if there's enough positive feelings towards this kind of syntax change. -- Thomas Wouters <thomas@xs4all.net> Hi! I'm a .signature virus! copy me into your .signature file to help me spread!
Just van Rossum <just@letterror.com>:
for <index> indexing <element> in <seq>:
Then idea is good, but I don't like this particular syntax much. It seems to be trying to do too much at once, giving you both an index and an element. Also, the wording reminds me unpleasantly of COBOL for some reason. Some time ago I suggested for <index> over <sequence>: as a way of getting hold of the index, and as a direct replacement for 'for i in range(len(blarg))' constructs. It could also be used for lockstep iteration applications, e.g. for i over a: frobulate(a[i], b[i]) Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+
Just van Rossum wrote:
...
for <index> indexing <element> in <seq>: ...
Let me throw out another idea. What if sequences just had .items() methods? j=range(0,10) for index, element in j.items(): ... While we wait for the sequence "base class" we could provide helper functions that makes the implementation of both eager and lazy versions easier. -- Paul Prescod - Not encumbered by corporate consensus "I don't want you to describe to me -- not ever -- what you were doing to that poor boy to make him sound like that; but if you ever do it again, please cover his mouth with your hand," Grandmother said. -- John Irving, "A Prayer for Owen Meany"
Paul Prescod wrote:
Just van Rossum wrote:
for <index> indexing <element> in <seq>:
Let me throw out another idea. What if sequences just had .items() methods?
j=range(0,10)
for index, element in j.items():
I like the idea and so I've uploaded a patch for this to SF: https://sourceforge.net/patch/?func=detailpatch&patch_id=101178&group_id=5470 For ease of reading: This patch adds a .items() method to the list object. .items() returns a list with of tuples. E.g.: for index, value in ["a", "b", "c"].items(): print index, ":", value will print: 0: a 1: b 2: c I think this is an easy way to achieve looping over index AND elements in parallel. Semantically the following two expressions should be equivalent: for index, value in zip(range(len(mylist)), mylist): for index, value in mylist.items(): In opposition to patch #110138 I would call this: "Adding syntactic sugar without adding syntax (or sugar<wink>):" this-doesn't-deserve-new-syntax-ly y'rs Peter -- Peter Schneider-Kamp ++47-7388-7331 Herman Krags veg 51-11 mailto:peter@schneider-kamp.de N-7050 Trondheim http://schneider-kamp.de
"PP" == Paul Prescod <paul@prescod.net> writes:
PP> Let me throw out another idea. What if sequences just had PP> .items() methods? Funny, I remember talking with Guido about this on a lunch trip several years ago. Tim will probably chime in that /he/ proposed it in the Python 0.9.3 time frame. :) -Barry
[Paul Prescod]
Let me throw out another idea. What if sequences just had .items() methods?
[Barry A. Warsaw]
Funny, I remember talking with Guido about this on a lunch trip several years ago. Tim will probably chime in that /he/ proposed it in the Python 0.9.3 time frame. :)
Not me, although *someone* proposed it at least that early, perhaps at 0.9.1 already. IIRC, that was the very first time Guido used the term "hypergeneralization" in a cluck-cluck kind of public way. That is, sequences and mappings are different concepts in Python, and intentionally so. Don't know how he feels now. But if you add seq.items(), you had better add seq.keys() too, and seq.values() as a synonym for seq[:]. I guess the perceived advantage of adding seq.items() is that it supplies yet another incredibly slow and convoluted way to get at the for-loop index? "Ah, that's the ticket! Let's allocate gazillabytes of storage and compute all the indexes into a massive data structure up front, and then we can use the loop index that's already sitting there for free anyway to index into that and get back a redundant copy of itself!" <wink>. not-a-good-sign-when-common-sense-is-offended-ly y'rs - tim
"TP" == Tim Peters <tim_one@email.msn.com> writes:
TP> But if you add seq.items(), you had better add seq.keys() too, TP> and seq.values() as a synonym for seq[:]. I guess the TP> perceived advantage of adding seq.items() is that it supplies TP> yet another incredibly slow and convoluted way to get at the TP> for-loop index? "Ah, that's the ticket! Let's allocate TP> gazillabytes of storage and compute all the indexes into a TP> massive data structure up front, and then we can use the loop TP> index that's already sitting there for free anyway to index TP> into that and get back a redundant copy of itself!" <wink>. Or create a generator. <oops, slap> -Barry
Tim Peters wrote:
But if you add seq.items(), you had better add seq.keys() too, and seq.values() as a synonym for seq[:]. I guess the perceived advantage of adding seq.items() is that it supplies yet another incredibly slow and convoluted way to get at the for-loop index? "Ah, that's the ticket! Let's allocate gazillabytes of storage and compute all the indexes into a massive data structure up front, and then we can use the loop index that's already sitting there for free anyway to index into that and get back a redundant copy of itself!" <wink>.
That's a -1, right? <0.1 wink> Peter -- Peter Schneider-Kamp ++47-7388-7331 Herman Krags veg 51-11 mailto:peter@schneider-kamp.de N-7050 Trondheim http://schneider-kamp.de
[Tim]
But if you add seq.items(), you had better add seq.keys() too, and seq.values() as a synonym for seq[:]. I guess the perceived advantage of adding seq.items() is that it supplies yet another incredibly slow and convoluted way to get at the for-loop index? "Ah, that's the ticket! Let's allocate gazillabytes of storage and compute all the indexes into a massive data structure up front, and then we can use the loop index that's already sitting there for free anyway to index into that and get back a redundant copy of itself!" <wink>.
[Peter Schneider-Kamp]]
That's a -1, right? <0.1 wink>
-0 if you also add .keys() and .values() (if you're going to hypergeneralize, don't go partway nuts -- then it's both more general than it should be yet still not as general as people will expect). -1 if it's *just* seq.items(). +1 on an "indexing" clause (the BDFL liked that enough to implement it a few years ago, but it didn't go in then because he found some random putz who had used "indexing" as a vrbl name; but if doesn't need to be a keyword, even that lame (ask Just <wink>) objection goes away). sqrt(-1) on Barry's generator tease, because that's an imaginary proposal at this stage of the game.
Tim Peters wrote:
...
But if you add seq.items(), you had better add seq.keys() too, and seq.values() as a synonym for seq[:]. I guess the perceived advantage of adding seq.items() is that it supplies yet another incredibly slow and convoluted way to get at the for-loop index? "Ah, that's the ticket! Let's allocate gazillabytes of storage and compute all the indexes into a massive data structure up front, and then we can use the loop index that's already sitting there for free anyway to index into that and get back a redundant copy of itself!" <wink>.
not-a-good-sign-when-common-sense-is-offended-ly y'rs - tim
.items(), .keys(), .values() and range() all offended my common sense when I started using Python in the first place. I got over it. I really don't see this "indexing" issue to be common enough either for special syntax OR to worry alot about efficiency. Nobody is forcing anyone to use .items(). If you want a more efficient way to do it, it's available (just not as syntactically beautifu -- same as range/xrangel). That isn't the case for dictionary .items(), .keys() and .values(). Also, if .keys() returns a range object then theoretically the interpreter could recognize that it is looping over a range and optimize it at runtime. That's an alternate approach to optimizing range literals through new byte-codes. I don't have time to think about what that would entail right now.... :( -- Paul Prescod - Not encumbered by corporate consensus Simplicity does not precede complexity, but follows it. - http://www.cs.yale.edu/homes/perlis-alan/quotes.html
[Paul Prescod]
... I really don't see this "indexing" issue to be common enough
A simple grep (well, findstr under Windows) finds over 300 instances of for ... in range(len(... in the .py files on my laptop. I don't recall exactly what the percentages were when I looked over a very large base of Python code several years ago, but I believe it was about 1 in 7 for loops.
for special syntax OR to worry alot about efficiency.
1 in 7 is plenty. range(len(seq)) is a puzzler to newbies, too -- it's *such* an indirect way of saying what they say directly in other languages.
Nobody is forcing anyone to use .items().
Agreed, but since seq.items() doesn't exist now <wink>.
If you want a more efficient way to do it, it's available (just not as syntactically beautiful -- same as range/xrangel).
Which way would that be? I don't know of one, "efficient" either in the sense of runtime speed or of directness of expression. xrange is at least a storage-efficient way, and isn't it grand that we index the xrange object with the very integer we're (usually) trying to get it to return <wink>? The "loop index" isn't an accident of the way Python happens to implement "for" today, it's the very basis of Python's thing.__getitem__(i)/IndexError iteration protocol. Exposing it is natural, because *it* is natural.
... Also, if .keys() returns a range object then theoretically the interpreter could recognize that it is looping over a range and optimize it at runtime.
Sorry, but seq.keys() just makes me squirm. It's a little step down the Lispish path of making everything look the same. I don't want to see float.write() either <wink>. although-that-would-surely-be-more-orthogonal-ly y'rs - tim
Tim Peters wrote:
...
If you want a more efficient way to do it, it's available (just not as syntactically beautiful -- same as range/xrangel).
Which way would that be? I don't know of one, "efficient" either in the sense of runtime speed or of directness of expression.
One of the reasons for adding range literals was for efficiency. So for x in [:len(seq)]: ... should be efficient.
The "loop index" isn't an accident of the way Python happens to implement "for" today, it's the very basis of Python's thing.__getitem__(i)/IndexError iteration protocol. Exposing it is natural, because *it* is natural.
I don't think of iterators as indexing in terms of numbers. Otherwise I could do this:
a={0:"zero",1:"one",2:"two",3:"three"} for i in a: ... print i ...
So from a Python user's point of view, for-looping has nothing to do with integers. From a Python class/module creator's point of view it does have to do with integers. I wouldn't be either surprised nor disappointed if that changed one day.
Sorry, but seq.keys() just makes me squirm. It's a little step down the Lispish path of making everything look the same. I don't want to see float.write() either <wink>.
You'll have to explain your squeamishness better if you expect us to channel you in the future. Why do I use the same syntax for indexing sequences and dictionaries and for deleting sequence and dictionary items? Is the rule: "syntax can work across types but method names should never be shared"? -- Paul Prescod - Not encumbered by corporate consensus Simplicity does not precede complexity, but follows it. - http://www.cs.yale.edu/homes/perlis-alan/quotes.html
Note that Guido rejected all the loop-gimmick proposals ("indexing", indices(), irange(), and list.items()) on Thursday, so let's stifle this debate until after 2.0 (or, even better, until after I'm dead <wink>). hope-springs-eternal-ly y'rs - tim
Tim Peters wrote:
Note that Guido rejected all the loop-gimmick proposals ("indexing", indices(), irange(), and list.items()) on Thursday, so let's stifle this debate until after 2.0 (or, even better, until after I'm dead <wink>).
That's sad. :-/ One of the reasons I implemented .items() is that I wanted to increase the probability that at least *something* is available instead of: for i in range(len(list): e = list[i] ... or for i, e in zip(range(len(list)), list): ... I'm going to teach Python to a lot of newbies (ca. 30) in October. From my experience (I already tried my luck on two individuals from that group) 'range(len(list))' is one of the harder concepts to get across. Even indices(list) would help here. Peter -- Peter Schneider-Kamp ++47-7388-7331 Herman Krags veg 51-11 mailto:peter@schneider-kamp.de N-7050 Trondheim http://schneider-kamp.de
I'm stifling it, but, FWIW, I've been trying to sell "indexing" for most of my adult life <wink -- but yes, in my experience too range(len(seq)) is extraordinarly hard to get across to newbies at first; and I *expect* [:len(seq)] to be at least as hard>.
-----Original Message----- From: nowonder@stud.ntnu.no [mailto:nowonder@stud.ntnu.no]On Behalf Of Peter Schneider-Kamp Sent: Friday, August 18, 2000 4:06 AM To: Tim Peters Cc: python-dev@python.org Subject: Re: indexing, indices(), irange(), list.items() (was RE: [Python-Dev] Lockstep iteration - eureka!)
Tim Peters wrote:
Note that Guido rejected all the loop-gimmick proposals ("indexing", indices(), irange(), and list.items()) on Thursday, so let's stifle this debate until after 2.0 (or, even better, until after I'm dead <wink>).
That's sad. :-/
One of the reasons I implemented .items() is that I wanted to increase the probability that at least *something* is available instead of:
for i in range(len(list): e = list[i] ...
or
for i, e in zip(range(len(list)), list): ...
I'm going to teach Python to a lot of newbies (ca. 30) in October. From my experience (I already tried my luck on two individuals from that group) 'range(len(list))' is one of the harder concepts to get across. Even indices(list) would help here.
Peter -- Peter Schneider-Kamp ++47-7388-7331 Herman Krags veg 51-11 mailto:peter@schneider-kamp.de N-7050 Trondheim http://schneider-kamp.de
What about 'indexing' xor 'in' ? Like that: for i indexing sequence: # good for e in sequence: # good for i indexing e in sequence: # BAD! This might help Guido to understand what it does in the 'indexing' case. I admit that the third one may be a bit harder to parse, so why not *leave it out*? But then I'm sure this has also been discussed before. Nevertheless I'll mail Barry and volunteer for a PEP on this. [Tim Peters about his life]
I've been trying to sell "indexing" for most of my adult life
then-I'll-have-to-spend-another-life-on-it-ly y'rs Peter -- Peter Schneider-Kamp ++47-7388-7331 Herman Krags veg 51-11 mailto:peter@schneider-kamp.de N-7050 Trondheim http://schneider-kamp.de
Peter Schneider-Kamp writes:
What about 'indexing' xor 'in' ? Like that:
for i indexing sequence: # good for e in sequence: # good for i indexing e in sequence: # BAD!
This might help Guido to understand what it does in the 'indexing' case. I admit that the third one may be a bit harder to parse, so why not *leave it out*?
I hadn't considered *not* using an "in" clause, but that is actually pretty neat. I'd like to see all of these allowed; disallowing "for i indexing e in ...:" reduces the intended functionality substantially. -Fred -- Fred L. Drake, Jr. <fdrake at beopen.com> BeOpen PythonLabs Team Member
On Fri, 18 Aug 2000, Fred L. Drake, Jr. wrote:
I hadn't considered *not* using an "in" clause, but that is actually pretty neat. I'd like to see all of these allowed; disallowing "for i indexing e in ...:" reduces the intended functionality substantially.
I like them all as well (and had previously assumed that the "indexing" proposal included the "for i indexing sequence" case!). While we're sounding off on the issue, i'm quite happy (+1) on both of: for el in seq: for i indexing seq: for i indexing el in seq: and for el in seq: for i in indices(seq): for i, el in irange(seq): with a slight preference for the former. -- ?!ng
Tim Peters writes:
I'm stifling it, but, FWIW, I've been trying to sell "indexing" for most of my adult life <wink -- but yes, in my experience too range(len(seq)) is extraordinarly hard to get across to newbies at first; and I *expect* [:len(seq)] to be at least as hard>.
And "for i indexing o in ...:" is the best proposal I've seen to resolve the whole problem in what *I* would describe as a Pythonic way. And it's not a new proposal. I haven't read the specific patch, but bugs can be fixed. I guess a lot of us will just have to disagree with the Guido on this one. ;-( Linguistic coup, anyone? ;-) -Fred -- Fred L. Drake, Jr. <fdrake at beopen.com> BeOpen PythonLabs Team Member
Tim Peters wrote:
Note that Guido rejected all the loop-gimmick proposals ("indexing", indices(), irange(), and list.items()) on Thursday, so let's stifle this debate until after 2.0 (or, even better, until after I'm dead <wink>).
Hey, we still have mxTools which gives you most of those goodies and lots more ;-) -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
On Fri, Aug 18, 2000 at 10:30:51AM +0200, M.-A. Lemburg wrote:
Hey, we still have mxTools which gives you most of those goodies and lots more ;-)
Yes, I don't understand what's wrong with a function. It would be nice if it was a builtin. IMHO, all this new syntax is a bad idea. Neil
"NS" == Neil Schemenauer <nascheme@enme.ucalgary.ca> writes:
NS> On Fri, Aug 18, 2000 at 10:30:51AM +0200, M.-A. Lemburg wrote: >> Hey, we still have mxTools which gives you most of those >> goodies and lots more ;-) NS> Yes, I don't understand what's wrong with a function. It NS> would be nice if it was a builtin. IMHO, all this new syntax NS> is a bad idea. I agree, but Guido nixed even the builtin. Let's move on; there's always Python 2.1. -Barry
Paul Prescod wrote:
I don't think of iterators as indexing in terms of numbers. Otherwise I could do this:
a={0:"zero",1:"one",2:"two",3:"three"} for i in a: ... print i ...
So from a Python user's point of view, for-looping has nothing to do with integers. From a Python class/module creator's point of view it does have to do with integers. I wouldn't be either surprised nor disappointed if that changed one day.
Bingo! I've long had an idea for generalizing 'for' loops using iterators. This is more a Python 3000 thing, but I'll explain it here anyway because I think it's relevant. Perhaps this should become a PEP? (Maybe we should have a series of PEPs with numbers in the 3000 range for Py3k ideas?) The statement for <variable> in <object>: <block> should translate into this kind of pseudo-code: # variant 1 __temp = <object>.newiterator() while 1: try: <variable> = __temp.next() except ExhaustedIterator: break <block> or perhaps (to avoid the relatively expensive exception handling): # variant 2 __temp = <object>.newiterator() while 1: __flag, <variable = __temp.next() if not __flag: break <block> In variant 1, the next() method returns the next object or raises ExhaustedIterator. In variant 2, the next() method returns a tuple (<flag>, <variable>) where <flag> is 1 to indicate that <value> is valid or 0 to indicate that there are no more items available. I'm not crazy about the exception, but I'm even less crazy about the more complex next() return value (careful observers may have noticed that I'm rarely crazy about flag variables :-). Another argument for variant 1 is that variant 2 changes what <variable> is after the loop is exhausted, compared to current practice: currently, it keeps the last valid value assigned to it. Most likely, the next() method returns None when the sequence is exhausted. It doesn't make a lot of sense to require it to return the last item of the sequence -- there may not *be* a last item, if the sequence is empty, and not all sequences make it convenient to keep hanging on to the last item in the iterator, so it's best to specify that next() returns (0, None) when the sequence is exhausted. (It would be tempting to suggeste a variant 1a where instead of raising an exception, next() returns None when the sequence is exhausted, but this won't fly: you couldn't iterate over a list containing some items that are None.) Side note: I believe that the iterator approach could actually *speed up* iteration over lists compared to today's code. This is because currently the interation index is a Python integer object that is stored on the stack. This means an integer add with overflow check, allocation, and deallocation on each iteration! But the iterator for lists (and other basic sequences) could easily store the index as a C int! (As long as the sequence's length is stored in an int, the index can't overflow.) [Warning: thinking aloud ahead!] Once we have the concept of iterators, we could support explicit use of them as well. E.g. we could use a variant of the for statement to iterate over an existing iterator: for <variable> over <iterator>: <block> which would (assuming variant 1 above) translate to: while 1: try: <variable> = <iterator>.next() except ExhaustedIterator: break <block> This could be used in situations where you have a loop iterating over the first half of a sequence and a second loop that iterates over the remaining items: it = something.newiterator() for x over it: if time_to_start_second_loop(): break do_something() for x over it: do_something_else() Note that the second for loop doesn't reset the iterator -- it just picks up where the first one left off! (Detail: the x that caused the break in the first loop doesn't get dealt with in the second loop.) I like the iterator concept because it allows us to do things lazily. There are lots of other possibilities for iterators. E.g. mappings could have several iterator variants to loop over keys, values, or both, in sorted or hash-table order. Sequences could have an iterator for traversing them backwards, and a few other ones for iterating over their index set (cf. indices()) and over (index, value) tuples (cf. irange()). Files could be their own iterator where the iterator is almost the same as readline() except it raises ExhaustedIterator at EOF instead of returning "". A tree datastructure class could have an associated iterator class that maintains a "finger" into the tree. Hm, perhaps iterators could be their own iterator? Then if 'it' were an iterator, it.newiterator() would return a reference to itself (not a copy). Then we wouldn't even need the 'over' alternative syntax. Maybe the method should be called iterator() then, not newiterator(), to avoid suggesting anything about the newness of the returned iterator. Other ideas: - Iterators could have a backup() method that moves the index back (or raises an exception if this feature is not supported, e.g. when reading data from a pipe). - Iterators over certain sequences might support operations on the underlying sequence at the current position of the iterator, so that you could iterate over a sequence and occasionally insert or delete an item (or a slice). Of course iterators also connect to generators. The basic list iterator doesn't need coroutines or anything, it can be done as follows: class Iterator: def __init__(self, seq): self.seq = seq self.ind = 0 def next(self): if self.ind >= len(self.seq): raise ExhaustedIterator val = self.seq[self.ind] self.ind += 1 return val so that <list>.iterator() could just return Iterator(<list>) -- at least conceptually. But for other data structures the amount of state needed might be cumbersome. E.g. a tree iterator needs to maintain a stack, and it's much easier to code that using a recursive Icon-style generator than by using an explicit stack. On the other hand, I remember reading an article a while ago (in Dr. Dobbs?) by someone who argued (in a C++ context) that such recursive solutions are very inefficient, and that an explicit stack (1) is really not that hard to code, and (2) gives much more control over the memory and time consumption of the code. On the third hand, some forms of iteration really *are* expressed much more clearly using recursion. On the fourth hand, I disagree with Matthias ("Dr. Scheme") Felleisen about recursion as the root of all iteration. Anyway, I believe that iterators (as explained above) can be useful even if we don't have generators (in the Icon sense, which I believe means coroutine-style). --Guido
Guido van Rossum wrote:
Paul Prescod wrote:
I don't think of iterators as indexing in terms of numbers. Otherwise I could do this:
a={0:"zero",1:"one",2:"two",3:"three"} for i in a: ... print i ...
So from a Python user's point of view, for-looping has nothing to do with integers. From a Python class/module creator's point of view it does have to do with integers. I wouldn't be either surprised nor disappointed if that changed one day.
Bingo!
I've long had an idea for generalizing 'for' loops using iterators. This is more a Python 3000 thing, but I'll explain it here anyway because I think it's relevant. Perhaps this should become a PEP? (Maybe we should have a series of PEPs with numbers in the 3000 range for Py3k ideas?)
The statement
for <variable> in <object>: <block>
should translate into this kind of pseudo-code:
# variant 1 __temp = <object>.newiterator() while 1: try: <variable> = __temp.next() except ExhaustedIterator: break <block>
or perhaps (to avoid the relatively expensive exception handling):
# variant 2 __temp = <object>.newiterator() while 1: __flag, <variable = __temp.next() if not __flag: break <block>
In variant 1, the next() method returns the next object or raises ExhaustedIterator. In variant 2, the next() method returns a tuple (<flag>, <variable>) where <flag> is 1 to indicate that <value> is valid or 0 to indicate that there are no more items available. I'm not crazy about the exception, but I'm even less crazy about the more complex next() return value (careful observers may have noticed that I'm rarely crazy about flag variables :-).
Another argument for variant 1 is that variant 2 changes what <variable> is after the loop is exhausted, compared to current practice: currently, it keeps the last valid value assigned to it. Most likely, the next() method returns None when the sequence is exhausted. It doesn't make a lot of sense to require it to return the last item of the sequence -- there may not *be* a last item, if the sequence is empty, and not all sequences make it convenient to keep hanging on to the last item in the iterator, so it's best to specify that next() returns (0, None) when the sequence is exhausted.
(It would be tempting to suggeste a variant 1a where instead of raising an exception, next() returns None when the sequence is exhausted, but this won't fly: you couldn't iterate over a list containing some items that are None.)
How about a third variant: #3: __iter = <object>.iterator() while __iter: <variable> = __iter.next() <block> This adds a slot call, but removes the malloc overhead introduced by returning a tuple for every iteration (which is likely to be a performance problem). Another possibility would be using an iterator attribute to get at the variable: #4: __iter = <object>.iterator() while 1: if not __iter.next(): break <variable> = __iter.value <block>
Side note: I believe that the iterator approach could actually *speed up* iteration over lists compared to today's code. This is because currently the interation index is a Python integer object that is stored on the stack. This means an integer add with overflow check, allocation, and deallocation on each iteration! But the iterator for lists (and other basic sequences) could easily store the index as a C int! (As long as the sequence's length is stored in an int, the index can't overflow.)
You might want to check out the counterobject.c approach I used to speed up the current for-loop in Python 1.5's ceval.c: it's basically a mutable C integer which is used instead of the current Python integer index. The details can be found in my old patch: http://starship.python.net/crew/lemburg/mxPython-1.5.patch.gz
[Warning: thinking aloud ahead!]
Once we have the concept of iterators, we could support explicit use of them as well. E.g. we could use a variant of the for statement to iterate over an existing iterator:
for <variable> over <iterator>: <block>
which would (assuming variant 1 above) translate to:
while 1: try: <variable> = <iterator>.next() except ExhaustedIterator: break <block>
This could be used in situations where you have a loop iterating over the first half of a sequence and a second loop that iterates over the remaining items:
it = something.newiterator() for x over it: if time_to_start_second_loop(): break do_something() for x over it: do_something_else()
Note that the second for loop doesn't reset the iterator -- it just picks up where the first one left off! (Detail: the x that caused the break in the first loop doesn't get dealt with in the second loop.)
I like the iterator concept because it allows us to do things lazily. There are lots of other possibilities for iterators. E.g. mappings could have several iterator variants to loop over keys, values, or both, in sorted or hash-table order. Sequences could have an iterator for traversing them backwards, and a few other ones for iterating over their index set (cf. indices()) and over (index, value) tuples (cf. irange()). Files could be their own iterator where the iterator is almost the same as readline() except it raises ExhaustedIterator at EOF instead of returning "". A tree datastructure class could have an associated iterator class that maintains a "finger" into the tree.
Hm, perhaps iterators could be their own iterator? Then if 'it' were an iterator, it.newiterator() would return a reference to itself (not a copy). Then we wouldn't even need the 'over' alternative syntax. Maybe the method should be called iterator() then, not newiterator(), to avoid suggesting anything about the newness of the returned iterator.
Other ideas:
- Iterators could have a backup() method that moves the index back (or raises an exception if this feature is not supported, e.g. when reading data from a pipe).
- Iterators over certain sequences might support operations on the underlying sequence at the current position of the iterator, so that you could iterate over a sequence and occasionally insert or delete an item (or a slice).
FYI, I've attached a module which I've been using a while for iteration. The code is very simple and implements the #4 variant described above.
Of course iterators also connect to generators. The basic list iterator doesn't need coroutines or anything, it can be done as follows:
class Iterator: def __init__(self, seq): self.seq = seq self.ind = 0 def next(self): if self.ind >= len(self.seq): raise ExhaustedIterator val = self.seq[self.ind] self.ind += 1 return val
so that <list>.iterator() could just return Iterator(<list>) -- at least conceptually.
But for other data structures the amount of state needed might be cumbersome. E.g. a tree iterator needs to maintain a stack, and it's much easier to code that using a recursive Icon-style generator than by using an explicit stack. On the other hand, I remember reading an article a while ago (in Dr. Dobbs?) by someone who argued (in a C++ context) that such recursive solutions are very inefficient, and that an explicit stack (1) is really not that hard to code, and (2) gives much more control over the memory and time consumption of the code. On the third hand, some forms of iteration really *are* expressed much more clearly using recursion. On the fourth hand, I disagree with Matthias ("Dr. Scheme") Felleisen about recursion as the root of all iteration. Anyway, I believe that iterators (as explained above) can be useful even if we don't have generators (in the Icon sense, which I believe means coroutine-style).
-- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
mal wrote:
How about a third variant:
#3: __iter = <object>.iterator() while __iter: <variable> = __iter.next() <block>
how does that one terminate? maybe you meant something like: __iter = <object>.iterator() while __iter: <variable> = __iter.next() if <variable> is <sentinel>: break <block> (where <sentinel> could be __iter itself...) </F>
On Mon, Aug 21, 2000 at 12:43:47PM +0200, Fredrik Lundh wrote:
mal wrote:
How about a third variant:
#3: __iter = <object>.iterator() while __iter: <variable> = __iter.next() <block>
how does that one terminate?
__iter should evaluate to false once it's "empty". -- Thomas Wouters <thomas@xs4all.net> Hi! I'm a .signature virus! copy me into your .signature file to help me spread!
[BDFL]
The statement
for <variable> in <object>: <block>
should translate into this kind of pseudo-code:
# variant 1 __temp = <object>.newiterator() while 1: try: <variable> = __temp.next() except ExhaustedIterator: break <block>
or perhaps (to avoid the relatively expensive exception handling):
# variant 2 __temp = <object>.newiterator() while 1: __flag, <variable = __temp.next() if not __flag: break <block>
[MAL]
How about a third variant:
#3: __iter = <object>.iterator() while __iter: <variable> = __iter.next() <block>
This adds a slot call, but removes the malloc overhead introduced by returning a tuple for every iteration (which is likely to be a performance problem).
Are you sure the slot call doesn't cause some malloc overhead as well? Ayway, the problem with this one is that it requires a dynamic iterator (one that generates values on the fly, e.g. something reading lines from a pipe) to hold on to the next value between "while __iter" and "__iter.next()".
Another possibility would be using an iterator attribute to get at the variable:
#4: __iter = <object>.iterator() while 1: if not __iter.next(): break <variable> = __iter.value <block>
Uglier than any of the others.
You might want to check out the counterobject.c approach I used to speed up the current for-loop in Python 1.5's ceval.c: it's basically a mutable C integer which is used instead of the current Python integer index.
The details can be found in my old patch:
http://starship.python.net/crew/lemburg/mxPython-1.5.patch.gz
Ah, yes, that's what I was thinking of.
""" Generic object iterators. [...]
Thanks -- yes, that's what I was thinking of. Did you just whip this up? --Guido van Rossum (home page: http://www.pythonlabs.com/~guido/)
Guido van Rossum wrote:
[BDFL]
The statement
for <variable> in <object>: <block>
should translate into this kind of pseudo-code:
# variant 1 __temp = <object>.newiterator() while 1: try: <variable> = __temp.next() except ExhaustedIterator: break <block>
or perhaps (to avoid the relatively expensive exception handling):
# variant 2 __temp = <object>.newiterator() while 1: __flag, <variable = __temp.next() if not __flag: break <block>
[MAL]
How about a third variant:
#3: __iter = <object>.iterator() while __iter: <variable> = __iter.next() <block>
This adds a slot call, but removes the malloc overhead introduced by returning a tuple for every iteration (which is likely to be a performance problem).
Are you sure the slot call doesn't cause some malloc overhead as well?
Quite likely not, since the slot in question doesn't generate Python objects (nb_nonzero).
Ayway, the problem with this one is that it requires a dynamic iterator (one that generates values on the fly, e.g. something reading lines from a pipe) to hold on to the next value between "while __iter" and "__iter.next()".
Hmm, that depends on how you look at it: I was thinking in terms of reading from a file -- feof() is true as soon as the end of file is reached. The same could be done for iterators. We might also consider a mixed approach: #5: __iter = <object>.iterator() while __iter: try: <variable> = __iter.next() except ExhaustedIterator: break <block> Some iterators may want to signal the end of iteration using an exception, others via the truth text prior to calling .next(), e.g. a list iterator can easily implement the truth test variant, while an iterator with complex .next() processing might want to use the exception variant. Another possibility would be using exception class objects as singleton indicators of "end of iteration": #6: __iter = <object>.iterator() while 1: try: rc = __iter.next() except ExhaustedIterator: break else: if rc is ExhaustedIterator: break <variable> = rc <block>
Another possibility would be using an iterator attribute to get at the variable:
#4: __iter = <object>.iterator() while 1: if not __iter.next(): break <variable> = __iter.value <block>
Uglier than any of the others.
You might want to check out the counterobject.c approach I used to speed up the current for-loop in Python 1.5's ceval.c: it's basically a mutable C integer which is used instead of the current Python integer index.
The details can be found in my old patch:
http://starship.python.net/crew/lemburg/mxPython-1.5.patch.gz
Ah, yes, that's what I was thinking of.
""" Generic object iterators. [...]
Thanks -- yes, that's what I was thinking of. Did you just whip this up?
The file says: Feb 2000... I don't remember what I wrote it for; it's in my lib/ dir meaning that it qualified as general purpose utility :-) -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
[MAL]
How about a third variant:
#3: __iter = <object>.iterator() while __iter: <variable> = __iter.next() <block>
This adds a slot call, but removes the malloc overhead introduced by returning a tuple for every iteration (which is likely to be a performance problem).
Are you sure the slot call doesn't cause some malloc overhead as well?
Quite likely not, since the slot in question doesn't generate Python objects (nb_nonzero).
Agreed only for built-in objects like lists. For class instances this would be way more expensive, because of the two calls vs. one!
Ayway, the problem with this one is that it requires a dynamic iterator (one that generates values on the fly, e.g. something reading lines from a pipe) to hold on to the next value between "while __iter" and "__iter.next()".
Hmm, that depends on how you look at it: I was thinking in terms of reading from a file -- feof() is true as soon as the end of file is reached. The same could be done for iterators.
But feof() needs to read an extra character into the buffer if the buffer is empty -- so it needs buffering! That's what I'm trying to avoid.
We might also consider a mixed approach:
#5: __iter = <object>.iterator() while __iter: try: <variable> = __iter.next() except ExhaustedIterator: break <block>
Some iterators may want to signal the end of iteration using an exception, others via the truth text prior to calling .next(), e.g. a list iterator can easily implement the truth test variant, while an iterator with complex .next() processing might want to use the exception variant.
Belt and suspenders. What does this buy you over "while 1"?
Another possibility would be using exception class objects as singleton indicators of "end of iteration":
#6: __iter = <object>.iterator() while 1: try: rc = __iter.next() except ExhaustedIterator: break else: if rc is ExhaustedIterator: break <variable> = rc <block>
Then I'd prefer to use a single protocol: #7: __iter = <object>.iterator() while 1: rc = __iter.next() if rc is ExhaustedIterator: break <variable> = rc <block> This means there's a special value that you can't store in lists though, and that would bite some introspection code (e.g. code listing all internal objects). --Guido van Rossum (home page: http://www.pythonlabs.com/~guido/)
Guido van Rossum wrote:
[MAL]
How about a third variant:
#3: __iter = <object>.iterator() while __iter: <variable> = __iter.next() <block>
This adds a slot call, but removes the malloc overhead introduced by returning a tuple for every iteration (which is likely to be a performance problem).
Are you sure the slot call doesn't cause some malloc overhead as well?
Quite likely not, since the slot in question doesn't generate Python objects (nb_nonzero).
Agreed only for built-in objects like lists. For class instances this would be way more expensive, because of the two calls vs. one!
True.
Ayway, the problem with this one is that it requires a dynamic iterator (one that generates values on the fly, e.g. something reading lines from a pipe) to hold on to the next value between "while __iter" and "__iter.next()".
Hmm, that depends on how you look at it: I was thinking in terms of reading from a file -- feof() is true as soon as the end of file is reached. The same could be done for iterators.
But feof() needs to read an extra character into the buffer if the buffer is empty -- so it needs buffering! That's what I'm trying to avoid.
Ok.
We might also consider a mixed approach:
#5: __iter = <object>.iterator() while __iter: try: <variable> = __iter.next() except ExhaustedIterator: break <block>
Some iterators may want to signal the end of iteration using an exception, others via the truth text prior to calling .next(), e.g. a list iterator can easily implement the truth test variant, while an iterator with complex .next() processing might want to use the exception variant.
Belt and suspenders. What does this buy you over "while 1"?
It gives you two possible ways to signal "end of iteration". But your argument about Python iterators (as opposed to builtin ones) applies here as well, so I withdraw this one :-)
Another possibility would be using exception class objects as singleton indicators of "end of iteration":
#6: __iter = <object>.iterator() while 1: try: rc = __iter.next() except ExhaustedIterator: break else: if rc is ExhaustedIterator: break <variable> = rc <block>
Then I'd prefer to use a single protocol:
#7: __iter = <object>.iterator() while 1: rc = __iter.next() if rc is ExhaustedIterator: break <variable> = rc <block>
This means there's a special value that you can't store in lists though, and that would bite some introspection code (e.g. code listing all internal objects).
Which brings us back to the good old "end of iteration" == raise an exception logic :-) Would this really hurt all that much in terms of performance ? I mean, todays for-loop code uses IndexError for much the same thing... #8: __iter = <object>.iterator() while 1: try: <variable> = __iter.next() except ExhaustedIterator: break <block> Since this will be written in C, we don't even have the costs of setting up an exception block. I would still suggest that the iterator provides the current position and iteration value as attributes. This avoids some caching of those values and also helps when debugging code using introspection tools. The positional attribute will probably have to be optional since not all iterators can supply this information, but the .value attribute is certainly within range (it would cache the value returned by the last .next() or .prev() call). -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
On Wed, 9 Aug 2000, Guido van Rossum wrote:
I forget what indices() was -- is it the moreal equivalent of keys()?
Yes, it's range(len(s)).
If irange(s) is zip(range(len(s)), s), I see how that's a bit unwieldy. In the past there were syntax proposals, e.g. ``for i indexing s''. Maybe you and Just can draft a PEP?
In the same vein as zip(), i think it's much easier to just toss in a couple of built-ins than try to settle on a new syntax. (I already uploaded a patch to add indices() and irange() to the built-ins, immediately after i posted my first message on this thread.) Surely a PEP isn't required for a couple of built-in functions that are simple and well understood? You can just call thumbs-up or thumbs-down and be done with it. -- ?!ng "All models are wrong; some models are useful." -- George Box
On Wed, 9 Aug 2000, Guido van Rossum wrote:
I forget what indices() was -- is it the moreal equivalent of keys()?
[Ping]
Yes, it's range(len(s)).
If irange(s) is zip(range(len(s)), s), I see how that's a bit unwieldy. In the past there were syntax proposals, e.g. ``for i indexing s''. Maybe you and Just can draft a PEP?
In the same vein as zip(), i think it's much easier to just toss in a couple of built-ins than try to settle on a new syntax. (I already uploaded a patch to add indices() and irange() to the built-ins, immediately after i posted my first message on this thread.)
Surely a PEP isn't required for a couple of built-in functions that are simple and well understood? You can just call thumbs-up or thumbs-down and be done with it.
-1 for indices -0 for irange --Guido van Rossum (home page: http://www.pythonlabs.com/~guido/)
On Wed, Aug 09, 2000 at 06:14:19PM -0500, Guido van Rossum wrote:
-1 for indices
-0 for irange
The same for me, though I prefer 'for i indexing x in l' over 'irange()'. -- Thomas Wouters <thomas@xs4all.net> Hi! I'm a .signature virus! copy me into your .signature file to help me spread!
Guido commented yesterday that he doesnt tally votes (yay), but obviously he still issues them! It made me think of a Dutch Crocodile Dundee on a visit to New York, muttering to his harassers as he whips something out from under his clothing...
-1 for indices
"You call that a -1, _this_ is a -1" :-) [Apologies to anyone who hasnt seen the knife scene in the forementioned movie ;-] Mark.
[Ka-Ping Yee]
... Surely a PEP isn't required for a couple of built-in functions that are simple and well understood? You can just call thumbs-up or thumbs-down and be done with it.
Only half of that is true, and even then only partially: if the verdict is thumbs-up, *almost* cool, except that newcomers delight in pestering "but how come it wasn't done *my* way instead?". You did a bit of that yourself in your day, you know <wink>. We're hoping the stream of newcomers never ends, but the group of old-timers willing and able to take an hour or two to explain the past in detail is actually dwindling (heck, you can count the Python-Dev members chipping in on Python-Help with a couple of fingers, and if anything fewer still active on c.l.py). If it's thumbs-down, in the absence of a PEP it's much worse: it will just come back again, and again, and again, and again. The sheer repetition in these endlessly recycled arguments all but guarantees that most old-timers ignore these threads completely. A prime purpose of the PEPs is to be the community's collective memory, pro or con, so I don't have to be <wink>. You surely can't believe this is the first time these particular functions have been pushed for core adoption!? If not, why do we need to have the same arguments all over again? It's not because we're assholes, and neither because there's anything truly new here, it's simply because a mailing list has no coherent memory. Not so much as a comma gets changed in an ANSI or ISO std without an elaborate pile of proposal paperwork and formal reviews. PEPs are a very lightweight mechanism compared to that. And it would take you less time to write a PEP for this than I alone spent reading the 21 msgs waiting for me in this thread today. Multiply the savings by billions <wink>. world-domination-has-some-scary-aspects-ly y'rs - tim
It looks nice, but i'm pretty sure it won't fly.
It will! Try it:
for (x in a, y in b): File "<stdin>", line 1 for (x in a, y in b): ^ SyntaxError: invalid syntax
how is the parser to know whether the lockstep form has been invoked?
The parser doesn't have to know as long as the compiler can tell, and clearly one of them can.
Besides, i think Guido has Pronounced quite firmly on zip().
That won't stop me from gently trying to change his mind one day. The existence of zip() doesn't preclude something more elegant being adopted in a future version. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+
participants (19)
-
bckfnn@worldonline.dk -
bwarsaw@beopen.com -
Fred L. Drake, Jr. -
Fredrik Lundh -
Greg Ewing -
Guido van Rossum -
Guido van Rossum -
Just van Rossum -
Ka-Ping Yee -
Ken Manheimer -
M.-A. Lemburg -
Mark Hammond -
Moshe Zadka -
Neil Schemenauer -
Paul Prescod -
Peter Schneider-Kamp -
Skip Montanaro -
Thomas Wouters -
Tim Peters