Re: Sets: elt in dict, lst.include
[Ping]
dict[key] = 1 if key in dict: ... for key in dict: ...
[Guido]
No chance of a time-machine escape, but I *can* say that I agree that Ping's proposal makes a lot of sense. This is a reversal of my previous opinion on this matter. (Take note -- those don't happen very often! :-)
First to submit a working patch gets a free copy of 2.1a2 and subsequent releases,
Thomas since submitted a patch to do the "if key in dict" part (which I reviewed and accepted, pending resolution of doc issues). It does not do the "for key in dict" part. It's not entirely clear whether you intended to approve that part too (I've simplified away many layers of quoting in the above <wink>). In any case, nobody is working on that part. WRT that part, Ping produced some stats in: http://mail.python.org/pipermail/python-dev/2001-January/012106.html
How often do you write 'dict.has_key(x)'? (std lib says: 206) How often do you write 'for x in dict.keys()'? (std lib says: 49)
How often do you write 'x in dict.values()'? (std lib says: 0) How often do you write 'for x in dict.values()'? (std lib says: 3)
However, he did not report on occurrences of for k, v in dict.items() I'm not clear exactly which files he examined in the above, or how the counts were obtained. So I don't know how this compares: I counted 188 instances of the string ".items(" in 122 .py files, under the dist/ portion of current CVS. A number of those were assignment and return stmts, others were dict.items() in an arglist, and at least one was in a comment. After weeding those out, I was left with 153 legit "for" loops iterating over x.items(). In all: 153 iterating over x.items() 118 " over x.keys() 17 " over x.values() So I conclude that iterating over .values() is significantly more common than iterating over .keys(). On c.l.py about an hour ago, Thomas complained that two (out of two) of his coworkers guessed wrong about what for x in dict: would do, but didn't say what they *did* think it would do. Since Thomas doesn't work with idiots, I'm guessing they *didn't* guess it would iterate over either values or the lines of a freshly-opened file named "dict" <wink>. So if you did intend to approve "for x in dict" iterating over dict.keys(), maybe you want to call me out on that "approval post" I forged under your name. falls-on-swords-so-often-there's-nothing-left-to-puncture<wink>-ly y'rs - tim
Tim Peters wrote:
[Ping]
dict[key] = 1 if key in dict: ... for key in dict: ...
[Guido]
No chance of a time-machine escape, but I *can* say that I agree that Ping's proposal makes a lot of sense. This is a reversal of my previous opinion on this matter. (Take note -- those don't happen very often! :-)
First to submit a working patch gets a free copy of 2.1a2 and subsequent releases,
Thomas since submitted a patch to do the "if key in dict" part (which I reviewed and accepted, pending resolution of doc issues).
It does not do the "for key in dict" part. It's not entirely clear whether you intended to approve that part too (I've simplified away many layers of quoting in the above <wink>). In any case, nobody is working on that part.
WRT that part, Ping produced some stats in:
http://mail.python.org/pipermail/python-dev/2001-January/012106.html
How often do you write 'dict.has_key(x)'? (std lib says: 206) How often do you write 'for x in dict.keys()'? (std lib says: 49)
How often do you write 'x in dict.values()'? (std lib says: 0) How often do you write 'for x in dict.values()'? (std lib says: 3)
However, he did not report on occurrences of
for k, v in dict.items()
I'm not clear exactly which files he examined in the above, or how the counts were obtained. So I don't know how this compares: I counted 188 instances of the string ".items(" in 122 .py files, under the dist/ portion of current CVS. A number of those were assignment and return stmts, others were dict.items() in an arglist, and at least one was in a comment. After weeding those out, I was left with 153 legit "for" loops iterating over x.items(). In all:
153 iterating over x.items() 118 " over x.keys() 17 " over x.values()
So I conclude that iterating over .values() is significantly more common than iterating over .keys().
On c.l.py about an hour ago, Thomas complained that two (out of two) of his coworkers guessed wrong about what
for x in dict:
would do, but didn't say what they *did* think it would do. Since Thomas doesn't work with idiots, I'm guessing they *didn't* guess it would iterate over either values or the lines of a freshly-opened file named "dict" <wink>.
So if you did intend to approve "for x in dict" iterating over dict.keys(), maybe you want to call me out on that "approval post" I forged under your name.
Dictionaries are not sequences. I wonder what order a user of for k,v in dict: (or whatever other of this proposal you choose) will expect... Please also take into account that dictionaries are *mutable* and their internal state is not defined to e.g. not change due to lookups (take the string optimization for example...), so exposing PyDict_Next() in any to Python will cause trouble. In the end, you will need to create a list or tuple to iterate over one way or another, so why bother overloading for-loops w/r to dictionaries ? -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
Dictionaries are not sequences. I wonder what order a user of for k,v in dict: (or whatever other of this proposal you choose) will expect...
The same order that for k,v in dict.items() will yield, of course.
Please also take into account that dictionaries are *mutable* and their internal state is not defined to e.g. not change due to lookups (take the string optimization for example...), so exposing PyDict_Next() in any to Python will cause trouble. In the end, you will need to create a list or tuple to iterate over one way or another, so why bother overloading for-loops w/r to dictionaries ?
Actually, I was going to propose to play dangerously here: the for k:v in dict: ... syntax I proposed in my previous message should indeed expose PyDict_Next(). It should be a big speed-up, and I'm expecting (though don't have much proof) that most loops over dicts don't mutate the dict. Maybe we could add a flag to the dict that issues an error when a new key is inserted during such a for loop? (I don't think the key order can be affected when a key is *deleted*.) --Guido van Rossum (home page: http://www.python.org/~guido/)
Guido van Rossum wrote:
Dictionaries are not sequences. I wonder what order a user of for k,v in dict: (or whatever other of this proposal you choose) will expect...
The same order that for k,v in dict.items() will yield, of course.
And then people find out that the order has some sorting properties and start to use it... "how to sort a dictionary?" comes up again, every now and then.
Please also take into account that dictionaries are *mutable* and their internal state is not defined to e.g. not change due to lookups (take the string optimization for example...), so exposing PyDict_Next() in any to Python will cause trouble. In the end, you will need to create a list or tuple to iterate over one way or another, so why bother overloading for-loops w/r to dictionaries ?
Actually, I was going to propose to play dangerously here: the
for k:v in dict: ...
syntax I proposed in my previous message should indeed expose PyDict_Next(). It should be a big speed-up, and I'm expecting (though don't have much proof) that most loops over dicts don't mutate the dict.
Maybe we could add a flag to the dict that issues an error when a new key is inserted during such a for loop? (I don't think the key order can be affected when a key is *deleted*.)
You mean: mark it read-only ? That would be a "nice to have" property for a lot of mutable types indeed -- sort of like low-level locks. This would be another candidate for an object flag (much like the one Fred wants to introduce for weak referenced objects). -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
Dictionaries are not sequences. I wonder what order a user of for k,v in dict: (or whatever other of this proposal you choose) will expect...
The same order that for k,v in dict.items() will yield, of course.
And then people find out that the order has some sorting properties and start to use it... "how to sort a dictionary?" comes up again, every now and then.
I don't understand why you bring this up. We're not revealing anything new here, the random order of dict items has always been part of the language. The answer to "how to sort a dict" should be "copy it into a list and sort that." Or am I missing something?
Please also take into account that dictionaries are *mutable* and their internal state is not defined to e.g. not change due to lookups (take the string optimization for example...), so exposing PyDict_Next() in any to Python will cause trouble. In the end, you will need to create a list or tuple to iterate over one way or another, so why bother overloading for-loops w/r to dictionaries ?
Actually, I was going to propose to play dangerously here: the
for k:v in dict: ...
syntax I proposed in my previous message should indeed expose PyDict_Next(). It should be a big speed-up, and I'm expecting (though don't have much proof) that most loops over dicts don't mutate the dict.
Maybe we could add a flag to the dict that issues an error when a new key is inserted during such a for loop? (I don't think the key order can be affected when a key is *deleted*.)
You mean: mark it read-only ? That would be a "nice to have" property for a lot of mutable types indeed -- sort of like low-level locks. This would be another candidate for an object flag (much like the one Fred wants to introduce for weak referenced objects).
Yes. --Guido van Rossum (home page: http://www.python.org/~guido/)
Guido van Rossum wrote:
Dictionaries are not sequences. I wonder what order a user of for k,v in dict: (or whatever other of this proposal you choose) will expect...
The same order that for k,v in dict.items() will yield, of course.
And then people find out that the order has some sorting properties and start to use it... "how to sort a dictionary?" comes up again, every now and then.
I don't understand why you bring this up. We're not revealing anything new here, the random order of dict items has always been part of the language. The answer to "how to sort a dict" should be "copy it into a list and sort that."
Or am I missing something?
I just wanted to hint at a problem which iterating over items in an unordered set can cause. Especially new Python users will find it confusing that the order of the items in an iteration can change from one run to the next. Not much of an argument, but I like explicit programming more than magic under the cover. What we really want is iterators for dictionaries, so why not implement these instead of tweaking for-loops. If you are looking for speedups w/r to for-loops, applying a different indexing technique in for-loops would go a lot further and provide better performance not only to dictionary loops, but also to other sequences. I have made some good experience with a special counter object (sort of like a mutable integer) which is used instead of the iteration index integer in the current implementation. Using an iterator object instead of the integer + __getitem__ call machinery would allow more flexibility for all kinds of sequences or containers. There could be an iterator type for dictionaries, one for generic __getitem__ style sequences, one for lists and tuples, etc. All of these could include special logic to get the most out of the targetted datatype. Well, just a thought... -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
M.-A. Lemburg <mal@lemburg.com>:
If you are looking for speedups w/r to for-loops, applying a different indexing technique in for-loops would go a lot further and provide better performance not only to dictionary loops, but also to other sequences.
Which reminds me... There's not much I miss from C these days, but one thing I wish Python had is a more general for-loop. The C semantics that let you have any initialization, any termination test, and any iteration you like are rather cool. Yes, I realize that for (<init>; <test>; <step>) {<body>} can be simulated with: <init> while 1: if <test>: break <body> Still, having them spatially grouped the way a C for does it is nice. Makes it easier to see invariants, I think. -- <a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a> "Rightful liberty is unobstructed action, according to our will, within limits drawn around us by the equal rights of others." -- Thomas Jefferson
[ESR]
There's not much I miss from C these days, but one thing I wish Python had is a more general for-loop. The C semantics that let you have any initialization, any termination test, and any iteration you like are rather cool.
Yes, I realize that
for (<init>; <test>; <step>) {<body>}
can be simulated with:
<init> while 1: if <test>: break <body>
Still, having them spatially grouped the way a C for does it is nice. Makes it easier to see invariants, I think.
Hm, I've seen too many ugly C for loops to have much appreciation for it. I can recognize and appreciate the few common forms that clearly iterate over an array; most other forms look rather contorted to me. Check out the Python C sources; if you find anything more complicated than ``for (i = n; i > 0; i--)'' I probably didn't write it. :-) Common abominations include: - writing a while loop as for(;<test>;) - putting arbitrary initialization code in <init> - having an empty condition, so the <step> becomes an arbitraty extension of the body that's written out-of-sequence --Guido van Rossum (home page: http://www.python.org/~guido/)
Check out SETL's loop statement. I think Perl5 is a subset of it <0.9 wink>.
Guido van Rossum <guido@digicool.com>:
Common abominations include:
- writing a while loop as for(;<test>;)
Agreed. Bletch.
- putting arbitrary initialization code in <init>
Not sure what's "arbitrary", unless you mean unrelated to the iteration variable.
- having an empty condition, so the <step> becomes an arbitraty extension of the body that's written out-of-sequence
Again agreed. Double bletch. I guess my archetype of the cute C for-loop is the idiom for pointer-list traversal: struct foo {int data; struct foo *next;} *ptr, *head; for (ptr = head; *ptr; ptr = ptr->next) do_something_with(ptr->data) This is elegant. It separates the logic for list traversal from the operation on the list element. Not the highest on my list of wants -- I'd sooner have ?: back. I submitted a patch for that once, and the discussion sort of died. Were you dead det against it, or should I revive this proposal? -- <a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a> "The bearing of arms is the essential medium through which the individual asserts both his social power and his participation in politics as a responsible moral being..." -- J.G.A. Pocock, describing the beliefs of the founders of the U.S.
- putting arbitrary initialization code in <init>
Not sure what's "arbitrary", unless you mean unrelated to the iteration variable.
Yes, that.
I guess my archetype of the cute C for-loop is the idiom for pointer-list traversal:
struct foo {int data; struct foo *next;} *ptr, *head;
for (ptr = head; *ptr; ptr = ptr->next) do_something_with(ptr->data)
This is elegant. It separates the logic for list traversal from the operation on the list element.
And it rarely happens in Python, because sequences are rarely represented as linked lists.
Not the highest on my list of wants -- I'd sooner have ?: back. I submitted a patch for that once, and the discussion sort of died. Were you dead det against it, or should I revive this proposal?
Not dead set against something like it, but dead set against the ?: syntax because then : becomes too overloaded for the human reader, e.g.: if foo ? bar : bletch : spam = eggs If you want to revive this, I strongly suggest writing a PEP first before posting here. --Guido van Rossum (home page: http://www.python.org/~guido/)
Guido van Rossum <guido@digicool.com>:
Not dead set against something like it, but dead set against the ?: syntax because then : becomes too overloaded for the human reader, e.g.:
if foo ? bar : bletch : spam = eggs
If you want to revive this, I strongly suggest writing a PEP first before posting here.
Noted. Will do. -- <a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a> Such are a well regulated militia, composed of the freeholders, citizen and husbandman, who take up arms to preserve their property, as individuals, and their rights as freemen. -- "M.T. Cicero", in a newspaper letter of 1788 touching the "militia" referred to in the Second Amendment to the Constitution.
On Mon, Jan 29, 2001 at 09:34:01PM -0500, Eric S. Raymond wrote:
I guess my archetype of the cute C for-loop is the idiom for pointer-list traversal:
struct foo {int data; struct foo *next;} *ptr, *head;
for (ptr = head; *ptr; ptr = ptr->next) do_something_with(ptr->data)
Note two things: in Python, you would use a list, so 'for x i list' does exactly what you want here ;) And if you really need it, you could use iterators for exactly this (once we have them, of course): you are inventing a new storage type. Quite common in C, since the only one it has is useless for anything other than strings<wink>, but not so common in Python.
Not the highest on my list of wants -- I'd sooner have ?: back. I submitted a patch for that once, and the discussion sort of died. Were you dead det against it, or should I revive this proposal?
Triple blech. Guido will never go for it! (There, increased your chance of getting it approved! :) Seriously though, I wouldn't like it much, it's too cryptic a syntax. I notice I use it less and less in C, too. -- Thomas Wouters <thomas@xs4all.net> Hi! I'm a .signature virus! copy me into your .signature file to help me spread!
[MAL]
I just wanted to hint at a problem which iterating over items in an unordered set can cause. Especially new Python users will find it confusing that the order of the items in an iteration can change from one run to the next.
Do they find "for k, v in dict.items()" confusing now? Would be the same.
... What we really want is iterators for dictionaries, so why not implement these instead of tweaking for-loops.
Seems an unrelated topic: would "iterators for dictionaries" solve the supposed problem with iteration order?
If you are looking for speedups w/r to for-loops, applying a different indexing technique in for-loops would go a lot further and provide better performance not only to dictionary loops, but also to other sequences.
I have made some good experience with a special counter object (sort of like a mutable integer) which is used instead of the iteration index integer in the current implementation.
Please quantify, if possible. My belief (based on past experiments) is that in loops fancier than for i in range(n): pass the loop overhead quickly falls into the noise even now.
Using an iterator object instead of the integer + __getitem__ call machinery would allow more flexibility for all kinds of sequences or containers. ...
This is yet another abrupt change of topic, yes <0.9 wink>? I agree a new iteration *protocol* could have major attractions.
Tim Peters wrote:
[MAL]
... What we really want is iterators for dictionaries, so why not implement these instead of tweaking for-loops.
Seems an unrelated topic: would "iterators for dictionaries" solve the supposed problem with iteration order?
No, but it would solve the problem in a more elegant and generalized way. Besides, it also allows writing code which is thread safe, since the iterator can take special actions to assure that the dictionary doesn't change during the iteration phase (see the other thread about "making mutable objects readonly").
If you are looking for speedups w/r to for-loops, applying a different indexing technique in for-loops would go a lot further and provide better performance not only to dictionary loops, but also to other sequences.
I have made some good experience with a special counter object (sort of like a mutable integer) which is used instead of the iteration index integer in the current implementation.
Please quantify, if possible. My belief (based on past experiments) is that in loops fancier than
for i in range(n): pass
the loop overhead quickly falls into the noise even now.
I don't remember the figures, but these micor optimizations do speedup loops by a noticable amount. Just compare the performance of stock Python 1.5 against my patched version.
Using an iterator object instead of the integer + __getitem__ call machinery would allow more flexibility for all kinds of sequences or containers. ...
This is yet another abrupt change of topic, yes <0.9 wink>? I agree a new iteration *protocol* could have major attractions.
Not really... the counter object is just a special case of an iterator -- in this case iteration is over the IN. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
[Tim]
Seems an unrelated topic: would "iterators for dictionaries" solve the supposed problem with iteration order?
[MAL]
No, but it would solve the problem in a more elegant and generalized way.
I'm lost. "Would [it] solve the ... problem?" "No [it wouldn't solve the problem], but it would solve the problem ...". Can only assume we're switching topics within single sentences now <wink>.
Besides, it also allows writing code which is thread safe, since the iterator can take special actions to assure that the dictionary doesn't change during the iteration phase (see the other thread about "making mutable objects readonly").
Sorry, but immutability has nothing to do with thread safety (the latter has to do with "doing a right thing" in the presence of multiple threads, to keep data structures internally consistent; raising an exception is never "a right thing" unless the user is violating the advertised semantics, and if mutation during iteration is such a violation, the presence or absence of multiple threads has nothing to do with that). IOW, perhaps, a critical section is an area of non-exceptional serialization, not a landmine that makes other threads *blow up* if they touch it.
... I don't remember the figures, but these micor optimizations
That's plural, but I thought you were talking specifically about the mutable counter object. I don't know which, but the two statements don't jibe.
do speedup loops by a noticable amount. Just compare the performance of stock Python 1.5 against my patched version.
No time now, but after 2.1 is out, sure, wrt it (not 1.5).
Tim Peters wrote:
[Tim]
Seems an unrelated topic: would "iterators for dictionaries" solve the supposed problem with iteration order?
[MAL]
No, but it would solve the problem in a more elegant and generalized way.
I'm lost. "Would [it] solve the ... problem?" "No [it wouldn't solve the problem], but it would solve the problem ...". Can only assume we're switching topics within single sentences now <wink>.
Sorry, not my brightest day today... what I wanted to say is that iterators would solve the problem of defining "something" in "for something in dict" nicely. Since iterators can define the order in which a data structure is traversed, this would also do away with the second (supposed) problem.
Besides, it also allows writing code which is thread safe, since the iterator can take special actions to assure that the dictionary doesn't change during the iteration phase (see the other thread about "making mutable objects readonly").
Sorry, but immutability has nothing to do with thread safety (the latter has to do with "doing a right thing" in the presence of multiple threads, to keep data structures internally consistent; raising an exception is never "a right thing" unless the user is violating the advertised semantics, and if mutation during iteration is such a violation, the presence or absence of multiple threads has nothing to do with that). IOW, perhaps, a critical section is an area of non-exceptional serialization, not a landmine that makes other threads *blow up* if they touch it.
Who said that an exception is raised ? The method I posted on the mutability thread allows querying the current state just like you would query the availability of a resource.
... I don't remember the figures, but these micor optimizations
That's plural, but I thought you were talking specifically about the mutable counter object. I don't know which, but the two statements don't jibe.
The counter object patch is a micro-optimization and as such will only give you a gain of a few percent. What makes the difference is the sum of these micro optimizations. Here's the patch for Python 1.5 which includes the optimizations: http://www.lemburg.com/python/mxPython-1.5.patch.gz
do speedup loops by a noticable amount. Just compare the performance of stock Python 1.5 against my patched version.
No time now, but after 2.1 is out, sure, wrt it (not 1.5).
-- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
[MAL]
... Since iterators can define the order in which a data structure is traversed, this would also do away with the second (supposed) problem.
Except we don't need iterators to do that. If anyone thought it was important, they could change the existing PyDict_Next to force an ordering, and then everything building on that would inherit it. So while I'm in favor of better iteration schemes, I'm not in favor of overselling them (on grounds that aren't unique to them).
Sorry, but immutability has nothing to do with thread safety ...
Who said that an exception is raised ?
I did <wink>.
The method I posted on the mutability thread allows querying the current state just like you would query the availability of a resource.
This? .mutable([flag]) -> integer If called without argument, returns 1/0 depending on whether the object is mutable or not. When called with a flag argument, sets the mutable state of the object to the value indicated by flag and returns the previous flag state. If I do: if object.mutable(): object.mutate() in a threaded world, the certain (but erratic) outcome is that sometimes it blows up: there's no guarantee that another thread doesn't sneak in and *change* the mutability between the time object.mutable() returns 1 and object.mutate() acts on a bad assumption. Same thing for: if resources.num_printers_available() > 0: action_that_blows_up_if_no_printers_are_available in a threaded world. It's possible to build a thread-safe resource acquisition protocol in either case, but that's really got nothing to do with immutability or iterators (marking a thing immutable doesn't do any good there unless you *also* build a protocol on top of it for communicating state changes, blocking until one occurs, notifications with optional timeouts, etc -- just doing object.mutable(1) is a threaded disaster in the absence of a higher-level protocol guaranteeing that this won't go changing the mutability state in the middle of some other thread's belief that it's got the thing frozen; likewise for object.mutable(0) not stepping on some other thread's belief that it's got permission to mutate). .mutable(flag) is *fine* for what it does, it's simply got nothing to do with threads. Thread safety could *build* on it via coordinated use of a threading.Sempahore (or moral equivalent), though.
Tim Peters wrote:
[MAL]
... Since iterators can define the order in which a data structure is traversed, this would also do away with the second (supposed) problem.
Except we don't need iterators to do that. If anyone thought it was important, they could change the existing PyDict_Next to force an ordering, and then everything building on that would inherit it. So while I'm in favor of better iteration schemes, I'm not in favor of overselling them (on grounds that aren't unique to them).
I'm just trying to sell iterators to bare us the pain of overloading the for-loop syntax just to get faster iteration over dictionaries. The idea is simple: put all the lookup, order and item building code into the iterator, have many of them, one for each flavour of values, keys, items and honeyloops, and then optimize the for-loop/iterator interaction to get the best performance out of them. There's really not much use in adding *one* special case to for-loops when there are a gazillion different needs to iterate over data structures, files, socket, ports, coffee cups, etc.
Sorry, but immutability has nothing to do with thread safety ...
Who said that an exception is raised ?
I did <wink>.
The method I posted on the mutability thread allows querying the current state just like you would query the availability of a resource.
This?
.mutable([flag]) -> integer
If called without argument, returns 1/0 depending on whether the object is mutable or not. When called with a flag argument, sets the mutable state of the object to the value indicated by flag and returns the previous flag state.
If I do:
if object.mutable(): object.mutate()
in a threaded world, the certain (but erratic) outcome is that sometimes it blows up: there's no guarantee that another thread doesn't sneak in and *change* the mutability between the time object.mutable() returns 1 and object.mutate() acts on a bad assumption.
I know. That's why you would do this: lock = [] # we use the mutable state as lock indicator; initial state is mutable # try to acquire lock: while 1: prevstate = lock.mutable(0) if prevstate == 0: # was already locked continue elif prevstate == 1: # we acquired the lock break # release lock lock.mutable(1)
Same thing for:
if resources.num_printers_available() > 0: action_that_blows_up_if_no_printers_are_available
in a threaded world. It's possible to build a thread-safe resource acquisition protocol in either case, but that's really got nothing to do with immutability or iterators (marking a thing immutable doesn't do any good there unless you *also* build a protocol on top of it for communicating state changes, blocking until one occurs, notifications with optional timeouts, etc -- just doing object.mutable(1) is a threaded disaster in the absence of a higher-level protocol guaranteeing that this won't go changing the mutability state in the middle of some other thread's belief that it's got the thing frozen; likewise for object.mutable(0) not stepping on some other thread's belief that it's got permission to mutate).
.mutable(flag) is *fine* for what it does, it's simply got nothing to do with threads. Thread safety could *build* on it via coordinated use of a threading.Sempahore (or moral equivalent), though.
Ok... :) -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
[MAL]
I'm just trying to sell iterators to bare us the pain of overloading the for-loop syntax just to get faster iteration over dictionaries.
The idea is simple: put all the lookup, order and item building code into the iterator, have many of them, one for each flavour of values, keys, items and honeyloops, and then optimize the for-loop/iterator interaction to get the best performance out of them.
There's really not much use in adding *one* special case to for-loops when there are a gazillion different needs to iterate over data structures, files, socket, ports, coffee cups, etc.
They're simply distinct issues to me. Whether people want special syntax for iterating over dicts is (to me) independent of how the iteration protocol works. Dislike of the former should probably be stabbed into Ping's heart <wink>.
I know. That's why you would do this:
lock = [] # we use the mutable state as lock indicator; initial state is mutable
# try to acquire lock: while 1: prevstate = lock.mutable(0) if prevstate == 0: # was already locked continue elif prevstate == 1: # we acquired the lock break
# release lock lock.mutable(1)
OK, in the lingo of the field, you're using .mutable(0) as a test-and-clear operation, and building a spin lock on top of it in "the usual" way. In that case the acquire code can be much simpler: while not lock.mutable(0): pass Same thing. I agree then that has basic lock semantics (relying indirectly on the global interpreter lock to make .mutable() calls atomic). But Python-level spin locks are thoroughly impractical: a waiting thread T will use 100% of its timeslice at 100% CPU utilization waiting for the lock, with no chance of succeeding (the global interpreter lock blocks all other threads while T is spinning, so no other thread *can* release the lock for the duration -- the spinning is futile). The performance characteristics would be horrid. So while "a lock", it's not a *useful* lock for threading. You got something against Python's locks <wink>? every-proposal-gets-hijacked-to-some-other-end-ly y'rs - tim
On Sat, 3 Feb 2001, Tim Peters wrote:
They're simply distinct issues to me. Whether people want special syntax for iterating over dicts is (to me) independent of how the iteration protocol works. Dislike of the former should probably be stabbed into Ping's heart <wink>.
Ow! Hey. :) We have shorthand like x[k] for spelling x.__getitem__[k]; why not shorthand like 'for k:v in x: ...' for spelling 'iter = x.__iteritems__(); while 1: k, v = iter() ...'? Hmm. What is the issue really with? - the key:value syntax suggested by Guido (i like it quite a lot) - the existence of special __iter*__ methods (seems natural to me; this is how we customize many operators on instances already) - the fact that 'for k:v' checks __iteritems__, __iter__, items, and __getitem__ (it *has* to check all of these things if it's going to play nice with existing mappings and sequences) - or something else? I'm not actually that clear on what the general feeling is about this PEP. Moshe seems to be happy with the first part but not the rest; Tim, do you have a similar position? Eric and Greg both disagreed with Moshe's counter-proposal; does that mean you like the original, or that you would rather do something different altogether? Moshe Zadka wrote:
dict.iteritems() could return not an iterator, but a magical object whose iterator is the requested iterator. Ditto itervalues(), iterkeys()
Seems like too much work to me. I'd rather just have the object produce a straight iterator. (By 'iterator' i mean an ordinary callable object, nothing too magical.) If there are unusual cases where you want to iterate over an object in several different ways i suppose they can create pseudo-sequences in the manner you described, but i think we want to make the most common case (iterating over the object itself) very easy. That is, just implement __iter__ and have it produce a callable. Marc A. Lemburg wrote:
The idea is simple: put all the lookup, order and item building code into the iterator, have many of them, one for each flavour of values, keys, items and honeyloops, and then optimize the for-loop/iterator interaction to get the best performance out of them.
There's really not much use in adding *one* special case to for-loops when there are a gazillion different needs to iterate over data structures, files, socket, ports, coffee cups, etc.
I couldn't tell which way you were trying to argue here. Are you in favour of the general flavour of PEP 234 or did you have in mind something different? Your first paragraph above seems to describe what 234 does. -- ?!ng "There's no point in being grown up if you can't be childish sometimes." -- Dr. Who
Ka-Ping Yee <ping@lfw.org>:
I'm not actually that clear on what the general feeling is about this PEP. Moshe seems to be happy with the first part but not the rest; Tim, do you have a similar position? Eric and Greg both disagreed with Moshe's counter-proposal; does that mean you like the original, or that you would rather do something different altogether?
I haven't yet heard a proposal that I find compelling. And, I have to admit, I've grown somewhat confused about the alternatives on offer. -- <a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a> Of all tyrannies, a tyranny exercised for the good of its victims may be the most oppressive. It may be better to live under robber barons than under omnipotent moral busybodies. The robber baron's cruelty may sometimes sleep, his cupidity may at some point be satiated; but those who torment us for our own good will torment us without end, for they do so with the approval of their consciences. -- C. S. Lewis
Guido van Rossum <guido@digicool.com>:
Maybe we could add a flag to the dict that issues an error when a new key is inserted during such a for loop? (I don't think the key order can be affected when a key is *deleted*.)
You mean: mark it read-only ? That would be a "nice to have" property for a lot of mutable types indeed -- sort of like low-level locks. This would be another candidate for an object flag (much like the one Fred wants to introduce for weak referenced objects).
Yes.
For different reasons, I'd like to be able to set a constant flag on a object instance. Simple semantics: if you try to assign to a member or method, it throws an exception. Application? I have a large Python program that goes to a lot of effort to build elaborate context structures in core. It would be nice to know they can't be even inadvertently trashed without throwing an exception I can watch for. -- <a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a> No one is bound to obey an unconstitutional law and no courts are bound to enforce it. -- 16 Am. Jur. Sec. 177 late 2d, Sec 256
[ESR]
For different reasons, I'd like to be able to set a constant flag on a object instance. Simple semantics: if you try to assign to a member or method, it throws an exception.
Application? I have a large Python program that goes to a lot of effort to build elaborate context structures in core. It would be nice to know they can't be even inadvertently trashed without throwing an exception I can watch for.
Yes, this is a good thing. Easy to do on lists and dicts. Questions: - How to spell it? x.freeze()? x.readonly()? - Should this reversible? I.e. should there be an x.unfreeze()? - Should we support something like this for instances too? Sometimes it might be cool to be able to freeze changing attribute values... --Guido van Rossum (home page: http://www.python.org/~guido/)
Guido van Rossum <guido@digicool.com>:
Yes, this is a good thing. Easy to do on lists and dicts. Questions:
- How to spell it? x.freeze()? x.readonly()?
I like "freeze", it'a a clear imperative where "readonly()" sounds like a test (e.g. "is this readonly()?")
- Should we support something like this for instances too? Sometimes it might be cool to be able to freeze changing attribute values...
Moshe Zadka sent me a hack that handles instances:
class MarkableAsConstant:
def __init__(self): self.mark_writable()
def __setattr__(self, name, value): if self._writable: self.__dict__[name] = value else: raise ValueError, "object is read only"
def mark_writable(self): self.__dict__['_writable'] = 1
def mark_readonly(self): self.__dict__['_writable'] = 0
- Should this reversible? I.e. should there be an x.unfreeze()?
I gave this some thought earlier today. There are advantages to either way. Making freeze a one-way operation would make it possible to use freezing to get certain kinds of security and integrity guarantees that you can't have if freezing is reversible. Fortunately, there's a semantics that captures both. If we allow freeze to take an optional key argument, and require that an unfreeze call must supply the same key or fail, we get both worlds. We can even one-way-hash the keys so they don't have to be stored in the bytecode. Want to lock a structure permanently? Pick a random long key. Freeze with it. Then throw that key away... -- <a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a> Strict gun laws are about as effective as strict drug laws...It pains me to say this, but the NRA seems to be right: The cities and states that have the toughest gun laws have the most murder and mayhem. -- Mike Royko, Chicago Tribune
- How to spell it? x.freeze()? x.readonly()?
I like "freeze", it'a a clear imperative where "readonly()" sounds like a test (e.g. "is this readonly()?")
Agreed.
- Should we support something like this for instances too? Sometimes it might be cool to be able to freeze changing attribute values...
Moshe Zadka sent me a hack that handles instances: [...]
OK, so no special support needed there.
- Should this reversible? I.e. should there be an x.unfreeze()?
I gave this some thought earlier today. There are advantages to either way. Making freeze a one-way operation would make it possible to use freezing to get certain kinds of security and integrity guarantees that you can't have if freezing is reversible.
Fortunately, there's a semantics that captures both. If we allow freeze to take an optional key argument, and require that an unfreeze call must supply the same key or fail, we get both worlds. We can even one-way-hash the keys so they don't have to be stored in the bytecode.
Want to lock a structure permanently? Pick a random long key. Freeze with it. Then throw that key away...
Way too cute. My suggestion freeze(0) freezes forever, freeze(1) can be unfrozen. --Guido van Rossum (home page: http://www.python.org/~guido/)
Note that even adding a "frozen" flag would add 4 bytes to every freezable object on most machines. That's why I'd rather .freeze() replace the type pointer and .unfreeze() restore it. No time or space overhead; no cluttering up the normal-case (i.e., unfrozen) type implementations with new tests.
Tim Peters wrote:
Note that even adding a "frozen" flag would add 4 bytes to every freezable object on most machines. That's why I'd rather .freeze() replace the type pointer and .unfreeze() restore it. No time or space overhead; no cluttering up the normal-case (i.e., unfrozen) type implementations with new tests.
Note that Fred's weak ref implementation also need a flag on every weak referencable object (at least last time I looked at his patches). Why not add a flag byte or word to these objects -- then we'd have 8 or 16 choices of what to do with them ;-) -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
"ESR" == Eric S Raymond <esr@thyrsus.com> writes:
ESR> Fortunately, there's a semantics that captures both. If we ESR> allow freeze to take an optional key argument, and require ESR> that an unfreeze call must supply the same key or fail, we ESR> get both worlds. We can even one-way-hash the keys so they ESR> don't have to be stored in the bytecode. ESR> Want to lock a structure permanently? Pick a random long ESR> key. Freeze with it. Then throw that key away... Clever!
[Guido]
Yes, this is a good thing. Easy to do on lists and dicts. Questions:
- How to spell it? x.freeze()? x.readonly()?
See below.
- Should this reversible?
Of course. Or x.freeze(solid=1) to default to permanent rigidity, but not require it.
I.e. should there be an x.unfreeze()?
That conveniently answers the first question, since x.unreadonly() reads horribly <wink>.
- Should we support something like this for instances too? Sometimes it might be cool to be able to freeze changing attribute values...
"Should be" supported for every mutable object. Next step: as in endless C++ debates, endless Python debates about "representation freeze" vs "logical freeze" ("well, yes, I'm changing this member, but it's just an invisible cache so I *should* be able to tag the object as const anyway ..."; etc etc etc). keep-it-simple-ly y'rs - tim
"GvR" == Guido van Rossum <guido@digicool.com> writes:
GvR> Yes, this is a good thing. Easy to do on lists and dicts. GvR> Questions: GvR> - How to spell it? x.freeze()? x.readonly()? GvR> - Should this reversible? I.e. should there be an GvR> x.unfreeze()? GvR> - Should we support something like this for instances too? GvR> Sometimes it might be cool to be able to freeze changing GvR> attribute values... lock(x) ...? :) -Barry
Barry A. Warsaw <barry@digicool.com>:
lock(x) ...? :)
I was thinking that myself, Barry. -- <a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a> "Boys who own legal firearms have much lower rates of delinquency and drug use and are even slightly less delinquent than nonowners of guns." -- U.S. Department of Justice, National Institute of Justice, Office of Juvenile Justice and Delinquency Prevention, NCJ-143454, "Urban Delinquency and Substance Abuse," August 1995.
Eric S. Raymond wrote:
For different reasons, I'd like to be able to set a constant flag on a object instance. Simple semantics: if you try to assign to a member or method, it throws an exception.
Guido van Rossum wrote:
Yes, this is a good thing. Easy to do on lists and dicts. Questions:
- How to spell it? x.freeze()? x.readonly()?
I'm not so sure. There seem to be many issues here. More questions: What's the difference between a frozen list and a tuple? Is a frozen list hashable?
- Should this reversible? I.e. should there be an x.unfreeze()?
What if two threads lock and then unlock the same structure?
- Should we support something like this for instances too? Sometimes it might be cool to be able to freeze changing attribute values...
If you do this, i bet people will immediately want to freeze individual attributes. Some might be confused by a.x = [1, 2, 3] lock(a.x) # intend to lock the attribute, not the list a.x = 3 # hey, why is this allowed? What does locking an extension object do? What happens when you lock an object that implements list or dict semantics? Do we care that locking a UserList accomplishes nothing? Should unfreeze/unlock() be disallowed in restricted mode? -- ?!ng No software is totally secure, but using [Microsoft] Outlook is like hanging a sign on your back that reads "PLEASE MESS WITH MY COMPUTER." -- Scott Rosenberg, Salon Magazine
Guido van Rossum wrote:
Yes, this is a good thing. Easy to do on lists and dicts. Questions:
- How to spell it? x.freeze()? x.readonly()?
Ping:
I'm not so sure. There seem to be many issues here. More questions:
What's the difference between a frozen list and a tuple?
A frozen list can be unfrozen (maybe)?
Is a frozen list hashable?
Yes -- that's what started this thread (using dicts as dict keys, actually).
- Should this reversible? I.e. should there be an x.unfreeze()?
What if two threads lock and then unlock the same structure?
That's up to the threads -- it's no different that other concurrent access.
- Should we support something like this for instances too? Sometimes it might be cool to be able to freeze changing attribute values...
If you do this, i bet people will immediately want to freeze individual attributes. Some might be confused by
a.x = [1, 2, 3] lock(a.x) # intend to lock the attribute, not the list a.x = 3 # hey, why is this allowed?
That's a matter of API. I wouldn't make this a built-in, but rather a method on freezable objects (please don't call it lock()!).
What does locking an extension object do?
What does adding 1 to an extension object do?
What happens when you lock an object that implements list or dict semantics? Do we care that locking a UserList accomplishes nothing?
Who says it doesn't?
Should unfreeze/unlock() be disallowed in restricted mode?
I don't see why not. --Guido van Rossum (home page: http://www.python.org/~guido/)
[Ping]
Is a frozen list hashable?
[Guido]
Yes -- that's what started this thread (using dicts as dict keys, actually).
Except this doesn't actually work unless list.freeze() recursively ensures that all elements in the list are frozen too:
hash((1, 2)) 219750523 hash((1, [2])) Traceback (most recent call last): File "<stdin>", line 1, in ? TypeError: unhashable type
That bothered me in Eric's original suggestion: unless x.freeze() does a traversal of all objects reachable from x, it doesn't actually make x safe against modification (except at the very topmost level). But doing such a traversal isn't what *everyone* would want either (as with "const" in C, I expect the primary benefit would be the chance to spend countless hours worming around it in both directions <wink>). [Skip]
If you want immutable dicts or lists in order to use them as dictionary keys, just serialize them first:
survey_says = {"spam": 14, "eggs": 42} sl = marshal.dumps(survey_says) dict[sl] = "spam"
marshal.dumps(dict) isn't canonical, though. That is, it may well be that d1 == d2 but dumps(d1) != dumps(d2). Even materializing dict.values(), then sorting it, then marshaling *that* isn't enough; e.g., consider {1: 1} and {1: 1L}. The latter example applies to marshaling lists too.
Guido van Rossum wrote:
[ESR]
For different reasons, I'd like to be able to set a constant flag on a object instance. Simple semantics: if you try to assign to a member or method, it throws an exception.
Application? I have a large Python program that goes to a lot of effort to build elaborate context structures in core. It would be nice to know they can't be even inadvertently trashed without throwing an exception I can watch for.
Yes, this is a good thing. Easy to do on lists and dicts. Questions:
- How to spell it? x.freeze()? x.readonly()?
How about .lock() and .unlock() ?
- Should this reversible? I.e. should there be an x.unfreeze()?
Yes. These low-level locks could be used in thread programming since the above calls are C level functions and thus thread safe w/r to the global interpreter lock.
- Should we support something like this for instances too? Sometimes it might be cool to be able to freeze changing attribute values...
Sure :) Eric, could you write a PEP for this ? -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
MAL writes:
- How to spell it? x.freeze()? x.readonly()?
How about .lock() and .unlock() ?
I'm with Greg here - lock() and unlock() imply an operation similar to threading.Lock() - ie, exclusivity rather than immutability. I don't have a strong opinion on the other names, but definately prefer any of the others over lock() for this operation. Mark.
Mark Hammond wrote:
MAL writes:
- How to spell it? x.freeze()? x.readonly()?
How about .lock() and .unlock() ?
I'm with Greg here - lock() and unlock() imply an operation similar to threading.Lock() - ie, exclusivity rather than immutability.
I don't have a strong opinion on the other names, but definately prefer any of the others over lock() for this operation.
Funny, I though that .lock() and .unlock() could be used to implement exactly what threading.Lock() does... Anyway, names really don't matter much, so how about: .mutable([flag]) -> integer If called without argument, returns 1/0 depending on whether the object is mutable or not. When called with a flag argument, sets the mutable state of the object to the value indicated by flag and returns the previous flag state. The semantics of this interface would be in sync with many other state APIs in Python and C (e.g. setlocale()). The advantage of making this a method should be clear: it allows writing polymorphic code. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
M.-A. Lemburg <mal@lemburg.com>:
Anyway, names really don't matter much, so how about:
.mutable([flag]) -> integer
If called without argument, returns 1/0 depending on whether the object is mutable or not. When called with a flag argument, sets the mutable state of the object to the value indicated by flag and returns the previous flag state.
I'll bear this in mind if things progress to the point where a PEP is indicated. -- <a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a>
"Eric S. Raymond" wrote:
M.-A. Lemburg <mal@lemburg.com>:
Anyway, names really don't matter much, so how about:
.mutable([flag]) -> integer
If called without argument, returns 1/0 depending on whether the object is mutable or not. When called with a flag argument, sets the mutable state of the object to the value indicated by flag and returns the previous flag state.
I'll bear this in mind if things progress to the point where a PEP is indicated.
Great :) -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
What stimulated this thread about making mutable objects (temporarily) immutable? Can someone give me an example where this is actually useful and can't be handled through some existing mechanism? I'm definitely with Fredrik on this one. Sounds like madness to me. I'm just guessing here, but since the most common need for immutable objects is a dictionary keys, I can envision having to test the lock state of a list or dict that someone wants to use as a key everywhere you would normally call has_key: if l.islocked() and d.has_key(l): ... If you want immutable dicts or lists in order to use them as dictionary keys, just serialize them first: survey_says = {"spam": 14, "eggs": 42} sl = marshal.dumps(survey_says) dict[sl] = "spam" Here's another pitfall I can envision. survey_says = {"spam": 14, "eggs": 42} survey_says.lock() dict[survey_says] = "Richard Dawson" survey_says.unlock() At this point can I safely iterate over the keys in the dictionary or not? Skip
Skip Montanaro wrote:
What stimulated this thread about making mutable objects (temporarily) immutable? Can someone give me an example where this is actually useful and can't be handled through some existing mechanism? I'm definitely with Fredrik on this one. Sounds like madness to me.
This thread is an offspring of the "for something in dict:" thread. The problem we face when iterating over mutable objects is that the underlying objects can change. By marking them read-only we can safely iterate over their contents. Another advantage of being able to mark mutable as read-only is that they may become usable as dictionary keys. Optimizations such as self-reorganizing read-only dictionaries would also become possible (e.g. attribute dictionaries which are read-only could calculate a second hash value to make the hashing perfect).
I'm just guessing here, but since the most common need for immutable objects is a dictionary keys, I can envision having to test the lock state of a list or dict that someone wants to use as a key everywhere you would normally call has_key:
if l.islocked() and d.has_key(l): ...
If you want immutable dicts or lists in order to use them as dictionary keys, just serialize them first:
survey_says = {"spam": 14, "eggs": 42} sl = marshal.dumps(survey_says) dict[sl] = "spam"
Sure and that's what .items(), .keys() and .values() do. The idea was to avoid the extra step of creating lists or tuples first.
Here's another pitfall I can envision.
survey_says = {"spam": 14, "eggs": 42} survey_says.lock() dict[survey_says] = "Richard Dawson" survey_says.unlock()
At this point can I safely iterate over the keys in the dictionary or not?
Tim already pointed out that we will need two different read-only states: a) temporary b) permanent For dictionaries to become usable as keys in another dictionary, they'd have to marked permanently read-only. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
MAL> This thread is an offspring of the "for something in dict:" thread. MAL> The problem we face when iterating over mutable objects is that the MAL> underlying objects can change. By marking them read-only we can MAL> safely iterate over their contents. I suspect you'll find it difficult to mark dbm/bsddb/gdbm files read-only. (And what about Andy Dustman's cool sqldict stuff?) If you can't extend this concept in a reasonable fashion to cover (most of) the other objects that smell like dictionaries, I think you'll just be adding needless complications for a feature than can't be used where it's really needed. I see no problem asking for the items() of an in-memory dictionary in order to get a predictable list to iterate over, but doing that for disk-based mappings would be next to impossible. So, I'm stuck iterating over something can can change out from under me. In the end, the programmer will still have to handle border cases specially. Besides, even if you *could* lock your disk-based mapping, are you really going to do that in situations where its sharable (that's what databases they are there for, after all)? I suspect you're going to keep the database mutable and work around any resulting problems. If you want to implement "for key in dict:", why not just have the VM call keys() under the covers and use that list? It would be no worse than the situation today where you call "for key in dict.keys():", and with the same caveats. If you're dumb enough to do that for an on-disk mapping object, well, you get what you asked for. Skip
Skip Montanaro wrote:
MAL> This thread is an offspring of the "for something in dict:" thread. MAL> The problem we face when iterating over mutable objects is that the MAL> underlying objects can change. By marking them read-only we can MAL> safely iterate over their contents.
I suspect you'll find it difficult to mark dbm/bsddb/gdbm files read-only. (And what about Andy Dustman's cool sqldict stuff?) If you can't extend this concept in a reasonable fashion to cover (most of) the other objects that smell like dictionaries, I think you'll just be adding needless complications for a feature than can't be used where it's really needed.
We are currently only talking about Python dictionaries here, even though other objects could also benefit from this.
I see no problem asking for the items() of an in-memory dictionary in order to get a predictable list to iterate over, but doing that for disk-based mappings would be next to impossible. So, I'm stuck iterating over something can can change out from under me. In the end, the programmer will still have to handle border cases specially. Besides, even if you *could* lock your disk-based mapping, are you really going to do that in situations where its sharable (that's what databases they are there for, after all)? I suspect you're going to keep the database mutable and work around any resulting problems.
If you want to implement "for key in dict:", why not just have the VM call keys() under the covers and use that list? It would be no worse than the situation today where you call "for key in dict.keys():", and with the same caveats. If you're dumb enough to do that for an on-disk mapping object, well, you get what you asked for.
That's why iterators do a much better task here. In DB design these are usually called cursors which the allow moving inside large result sets. But this really is a different topic... Readonlyness could be put to some good use in optimizing data structure for which you know that they won't change anymore. Temporary readonlyness has the nice sideeffect of allowing low-level lock implementations and makes writing thread safe code easier to handle, because you can make assertions w/r to the immutability of an object during a certain period of time explicit in your code. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
Skip Montanaro <skip@mojam.com>:
Can someone give me an example where this is actually useful and can't be handled through some existing mechanism?
I can envisage cases where you want to build a data structure incrementally, and then treat it as immutable so you can use it as a dict key, etc. There's currently no way to do that to a list without copying it. So, it could be handy to have a way of turning a list into a tuple in-place. It would have to be a one-way transformation, otherwise you could start using it as a dict key, make it mutable again, and cause havoc. Suggested implementation: When you allocate the space for the values of a list, leave enough room for the PyObject_HEAD of a tuple at the beginning. Then you can turn that memory block into a real tuple later, and flag the original list object as immutable so you can't change it later via that route. Hmmm, would waste a bit of space for each list object. Maybe this should be a special list-about-to-become-tuple type. (Tist? Luple?) Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+
M.-A. Lemburg <mal@lemburg.com>:
Eric, could you write a PEP for this ?
Not yet. I'm about (at Guido's suggestion) to submit a revised ternary-select proposal. Let's process that first. -- <a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a> "Today, we need a nation of Minutemen, citizens who are not only prepared to take arms, but citizens who regard the preservation of freedom as the basic purpose of their daily life and who are willing to consciously work and sacrifice for that freedom." -- John F. Kennedy
[Guido]
The same order that for k,v in dict.items() will yield, of course.
[MAL]
And then people find out that the order has some sorting properties and start to use it...
Except that it has none. dict insertion has never used any comparison outcome beyond "equal"/"not equal", so any ordering you think you see is-- and always was --an illusion.
With all this confusion about how to actually write the iteration on dictionary items, wouldn't it make more sense to implement an extension module which then provides a __getitem__ style iterator for dictionaries by interfacing to PyDict_Next() ? The module could have three different iterators: 1. iterate over items 2. ... over keys 3. ... over values The reasoning behind this is that the __getitem__ interface is well established and this doesn't introduce any new syntax while still providing speed and flexibility. -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
[Guido]
... I'm expecting (though don't have much proof) that most loops over dicts don't mutate the dict.
Safe bet! I do recall writing one once: it del'ed keys for which the associated count was 1, because the rest of the algorithm was only interested in duplicates.
Maybe we could add a flag to the dict that issues an error when a new key is inserted during such a for loop? (I don't think the key order can be affected when a key is *deleted*.)
That latter is true but specific to this implementation. "Can't mutate the dict period" is easier to keep straight, and probably harmless in practice (if not, it could be relaxed later). Recall that a similar trick is played during list.sort(), replacing the list's type pointer for the duration (to point to an internal "immutable list" type, same as the list type except the "dangerous" slots point to a function that raises an "immutable list" TypeError). Then no runtime expense is incurred for regular lists to keep checking flags. I thought of this as an elegant use for switching types at runtime; you may still be appalled by it, though!
[Guido]
Maybe we could add a flag to the dict that issues an error when a new key is inserted during such a for loop?
FWIW, some of the java2 collections decided to throw a Concurrent- ModificationException in the iterator if the collection was modified during the iteration. Generally none of java2 collections can be modified while iterating over it (the exception is calling .remove() on the iterator object and not all collections support that).
(I don't think the key order can be affected when a key is *deleted*.)
Probably also true for the Hashtables which is backing our PyDictionary, but I'll rather not depend too much on it being true. [Tim]
That latter is true but specific to this implementation. "Can't mutate the dict period" is easier to keep straight, and probably harmless in practice (if not, it could be relaxed later).
Agree.
Recall that a similar trick is played during list.sort(), replacing the list's type pointer for the duration (to point to an internal "immutable list" type, same as the list type except the "dangerous" slots point to a function that raises an "immutable list" TypeError). Then no runtime expense is incurred for regular lists to keep checking flags. I thought of this as an elegant use for switching types at runtime; you may still be appalled by it, though!
Changing the type of a type? Yuck! I might very likely be reading the CPython sources wrongly, but it seems this trick will cause an BadInternalCall if some other C extension are trying to modify a list while it is freezed by the type switching trick. I imagine this would happen if the extension called: PyList_SetItem(myList, 0, aValue); I guess Jython could support this from the python side, but its hard to ensure from the java side without adding an additional PyList_Check(..) to all list methods. It just doesn't feel like the right thing to go since it would cause slower access to all mutable objects. regards, finn
On Tue, Jan 30, 2001 at 05:34:10PM +0000, Finn Bock wrote:
Recall that a similar trick is played during list.sort(), replacing the list's type pointer for the duration (to point to an internal "immutable list" type, same as the list type except the "dangerous" slots point to a function that raises an "immutable list" TypeError). Then no runtime expense is incurred for regular lists to keep checking flags. I thought of this as an elegant use for switching types at runtime; you may still be appalled by it, though!
Changing the type of a type? Yuck!
No, the typeobject itself isn't changed -- that would freeze *all* dicts/lists/whatever, not just the one we want. We'd be changing the type of an object (or 'type instance', if you want, but not "type 'instance'"), not the type of a type.
I might very likely be reading the CPython sources wrongly, but it seems this trick will cause an BadInternalCall if some other C extension are trying to modify a list while it is freezed by the type switching trick. I imagine this would happen if the extension called:
PyList_SetItem(myList, 0, aValue);
Only if PyList_SetItem refuses to handle 'frozen' lists. In my eyes, 'frozen' lists should still pass PyList_Check(), but also PyList_Frozen() (or whatever), and methods/operations that modify the listobject would have to check if the list is frozen, and raise an appropriate error if so. This might throw 'unexpected' errors, but only in situations that can't happen right now! -- Thomas Wouters <thomas@xs4all.net> Hi! I'm a .signature virus! copy me into your .signature file to help me spread!
Only if PyList_SetItem refuses to handle 'frozen' lists. In my eyes, 'frozen' lists should still pass PyList_Check(), but also PyList_Frozen() (or whatever), and methods/operations that modify the listobject would have to check if the list is frozen, and raise an appropriate error if so. This might throw 'unexpected' errors.
did someone just subscribe me to the perl-porters list? -1 on "modal freeze" (it's madness) -0 on an "immutable dictionary" type in the core
On Tue, Jan 30, 2001 at 11:45:16PM +0100, Fredrik Lundh wrote:
did someone just subscribe me to the perl-porters list?
-1 on "modal freeze" (it's madness) -0 on an "immutable dictionary" type in the core
I'm glad I'm not the only one who had that feeling. I agree with your votes too. Neil
[Finn Bock]
Changing the type of a type? Yuck!
No, it temporarily changes the type of the single list being sorted, like so, where "self" is a pointer to a PyListObject (which is a list, not a list *type* object): self->ob_type = &immutable_list_type; err = samplesortslice(self->ob_item, self->ob_item + self->ob_size, compare); self->ob_type = &PyList_Type; immutable_list_type is "just like" PyList_Type, except that the slots for mutating methods point to a function that raises a TypeError. Before this drastic step came years of increasingly ugly hacks trying to stop core dumps when people mutated a list during the sort. Python's sort is very complex, and lots of pointers are tucked away -- having the size of the array, or its position in memory, or the set of objects it contains, change as a side effect of doing a compare, would be difficult and expensive to recover from -- and by "difficult" read "nobody ever managed to get it right before this" <0.5 wink>.
I might very likely be reading the CPython sources wrongly, but it seems this trick will cause an BadInternalCall if some other C extension are trying to modify a list while it is freezed by the type switching trick. I imagine this would happen if the extension called:
PyList_SetItem(myList, 0, aValue);
Well, in CPython it's not "legal" for any other thread to use the C API while the sort is in progress, because the thread doing the sort holds the global interpreter lock for the duration. So this could happen "legally" only if a comparison function called by the sort called out to a C extension attempting to mutate the list. In that case, fine, it *is* a bad call: mutation is not allowed during list sorting, so they deserve whatever they get -- and far better a "bad internal call" than a core dump. If the immutable_list_type were used more generally, it would require more general support (but I see Thomas already talked about that -- thanks).
[Ping]
dict[key] = 1 if key in dict: ... for key in dict: ...
[Guido]
No chance of a time-machine escape, but I *can* say that I agree that Ping's proposal makes a lot of sense. This is a reversal of my previous opinion on this matter. (Take note -- those don't happen very often! :-)
First to submit a working patch gets a free copy of 2.1a2 and subsequent releases,
Thomas since submitted a patch to do the "if key in dict" part (which I reviewed and accepted, pending resolution of doc issues).
It does not do the "for key in dict" part. It's not entirely clear whether you intended to approve that part too (I've simplified away many layers of quoting in the above <wink>). In any case, nobody is working on that part.
WRT that part, Ping produced some stats in:
http://mail.python.org/pipermail/python-dev/2001-January/012106.html
How often do you write 'dict.has_key(x)'? (std lib says: 206) How often do you write 'for x in dict.keys()'? (std lib says: 49)
How often do you write 'x in dict.values()'? (std lib says: 0) How often do you write 'for x in dict.values()'? (std lib says: 3)
However, he did not report on occurrences of
for k, v in dict.items()
I'm not clear exactly which files he examined in the above, or how the counts were obtained. So I don't know how this compares: I counted 188 instances of the string ".items(" in 122 .py files, under the dist/ portion of current CVS. A number of those were assignment and return stmts, others were dict.items() in an arglist, and at least one was in a comment. After weeding those out, I was left with 153 legit "for" loops iterating over x.items(). In all:
153 iterating over x.items() 118 " over x.keys() 17 " over x.values()
So I conclude that iterating over .values() is significantly more common than iterating over .keys().
I did a less sophisticated count but come to the same conclusion: iterations over items() are (somewhat) more common than over keys(), and values() are 1-2 orders of magnitude less common. My numbers: $ cd python/src/Lib $ grep 'for .*items():' *.py | wc -l 47 $ grep 'for .*keys():' *.py | wc -l 43 $ grep 'for .*values():' *.py | wc -l 2
On c.l.py about an hour ago, Thomas complained that two (out of two) of his coworkers guessed wrong about what
for x in dict:
would do, but didn't say what they *did* think it would do. Since Thomas doesn't work with idiots, I'm guessing they *didn't* guess it would iterate over either values or the lines of a freshly-opened file named "dict" <wink>.
I don't much value to the readability argument: typically, one will write "for key in dict" or "for name in dict" and then it's obvious what is meant.
So if you did intend to approve "for x in dict" iterating over dict.keys(), maybe you want to call me out on that "approval post" I forged under your name.
But here's my dilemma. "if (k, v) in dict" is clearly useless (nobody has even asked me for a has_item() method). I can live with "x in list" checking the values and "x in dict" checking the keys. But I can *not* live with "x in dict" equivalent to "dict.has_key(x)" if "for x in dict" would mean "for x in dict.items()". I also think that defining "x in dict" but not "for x in dict" will be confusing. So we need to think more. How about: for key in dict: ... # ... over keys for key:value in dict: ... # ... over items This is syntactically unambiguous (a colon is currently illegal in that position). This also suggests: for index:value in list: ... # ... over zip(range(len(list), list) while doesn't strike me as bad or ugly, and would fulfill my brother's dearest wish. (And why didn't we think of this before?) --Guido van Rossum (home page: http://www.python.org/~guido/)
On Mon, Jan 29, 2001 at 09:48:22AM -0500, Guido van Rossum wrote:
How about:
for key in dict: ... # ... over keys
for key:value in dict: ... # ... over items
This is syntactically unambiguous (a colon is currently illegal in that position).
I won't comment on the syntax right now, I need to look at it for a while first :-) However, what about MAL's point about dict ordering, internally ? Wouldn't FOR_LOOP be forced to generate a list of keys anyway, to avoid skipping keys ? I know currently the dict implementation doesn't do any reordering except during adds/deletes, but there is nothing in the language ref that supports that -- it's an implementation detail. Would we make a future enhancement where (some form of) gc would 'clean up' large dictionaries impossible ? -- Thomas Wouters <thomas@xs4all.net> Hi! I'm a .signature virus! copy me into your .signature file to help me spread!
[Guido]
I did a less sophisticated count but come to the same conclusion: iterations over items() are (somewhat) more common than over keys(), and values() are 1-2 orders of magnitude less common. My numbers:
$ cd python/src/Lib $ grep 'for .*items():' *.py | wc -l 47 $ grep 'for .*keys():' *.py | wc -l 43 $ grep 'for .*values():' *.py | wc -l 2
I like my larger sample and anal methodology better <wink>. A closer look showed that it may have been unduly biased by the mass of files in Lib/encodings/, where encoding_map = {} for k,v in decoding_map.items(): encoding_map[v] = k is at the end of most files (btw, MAL, that's the answer to your question: people would expect "the same" ordering you expected there, i.e. none in particular).
... I don't much value to the readability argument: typically, one will write "for key in dict" or "for name in dict" and then it's obvious what is meant.
Well, "fiddlesticks" comes to mind <0.9 wink>. If I've got a dict mapping phone numbers to names, "for name in dict" is dead backwards. for vevent in keydefs.keys(): for x in self.subdirs.keys(): for name in lsumdict.keys(): for locale in self.descriptions.keys(): for name in attrs.keys(): for func in other.top_level.keys(): for func in target.keys(): for i in u2.keys(): for s in d.keys(): for url in self.bad.keys(): are other cases in the CVS tree where I don't think the name makes it obvious in the absence of ".keys()". But I don't personally give any weight to whether people can guess what something does at first glance. My rule is that it doesn't matter, provided it's (a) easy to learn; and (especially), (b) hard to *forget* once you've learned it. A classic example is Python's "points between elements" treatment of slice indices: few people guess right what that does at first glance, but once they "get it" they're delighted and rarely mess up again. And I think this is "like that".
... But here's my dilemma. "if (k, v) in dict" is clearly useless (nobody has even asked me for a has_item() method).
Yup.
I can live with "x in list" checking the values and "x in dict" checking the keys. But I can *not* live with "x in dict" equivalent to "dict.has_key(x)" if "for x in dict" would mean "for x in dict.items()".
That's why I brought it up -- it's not entirely clear what's to be done here.
I also think that defining "x in dict" but not "for x in dict" will be confusing.
So we need to think more.
The hoped-for next step indeed.
How about:
for key in dict: ... # ... over keys
for key:value in dict: ... # ... over items
This is syntactically unambiguous (a colon is currently illegal in that position).
Cool! Can we resist adding if key:value in dict for "parallelism"? (I know I can ...) 2/3rd of these are marginally more attractive: for key: in dict: # over dict.keys() for :value in dict: # over dict.values() for : in dict: # a delay loop
This also suggests:
for index:value in list: ... # ... over zip(range(len(list), list)
while doesn't strike me as bad or ugly, and would fulfill my brother's dearest wish.
You mean besides the one that you fry in hell for not adding "for ... indexing"? Ya, probably.
(And why didn't we think of this before?)
Best guess: we were focused exclusively on sequences, and a colon just didn't suggest itself in that context. Second-best guess: having finally approved one of these gimmicks, you finally got desperate enough to make it work <wink>. ponderingly y'rs - tim
This is all PEP material now. Tim, do you want to own the PEP? It seems just up your alley!
Cool! Can we resist adding
if key:value in dict
for "parallelism"? (I know I can ...)
That's easy to resist because, unlike ``for key:value in dict'', it's not unambiguous: ``if key:value in dict'' is already legal syntax currently, with 'key' as the condition and 'value in dict' as the (not particularly useful) body of the if statement.
(And why didn't we think of this before?)
Best guess: we were focused exclusively on sequences, and a colon just didn't suggest itself in that context. Second-best guess: having finally approved one of these gimmicks, you finally got desperate enough to make it work <wink>.
I'm certainly more comfortable with just ``for key in dict'' than with the whole slow of extensions using colons. But, again, that's for the PEP to fight over. --Guido van Rossum (home page: http://www.python.org/~guido/)
[Guido]
This is all PEP material now.
Yup.
Tim, do you want to own the PEP?
Not really. Available time is finite, and this isn't at the top of the list of things I'd like to see (resuming the discussion of generators + coroutines + iteration protocol comes to mind first).
Cool! Can we resist adding
if key:value in dict
for "parallelism"? (I know I can ...)
That's easy to resist because, unlike ``for key:value in dict'', it's not unambiguous:
But if (key:value) in dict is. Just trying to help whoever *does* want the PEP <wink>.
... I'm certainly more comfortable with just ``for key in dict'' than with the whole slow of extensions using colons.
What about just the for key:value in dict for index:value in sequence extensions? The degenerate forms (omitting x or y or both in x:y) are mechanical variations so are likely to get raised.
But, again, that's for the PEP to fight over.
PEPs are easier if you Pronounce on things you hate early so that those can get recorded in the "BDFL Pronouncements" section without further ado. whatever-this-may-look-like-it's-not-a-pep-discussion<wink>-ly y'rs - tim
[Tim Peters on adding yet more syntatic sugar]
Available time is finite, and this isn't at the top of the list of things I'd like to see (resuming the discussion of generators + coroutines + iteration protocol comes to mind first).
What's the chances of getting generators into 2.2? The implementation should not be hard. Didn't Steven Majewski have something years ago? Why do we always get sidetracked on trying to figure out how to do coroutines and continuations? Generators would add real power to the language and are simple enough that most users could benefit from them. Also, it should be possible to design an interface that does not preclude the addition of coroutines or continuations later. I'm not volunteering to champion the cause just yet. I just want to know if there is some issue I'm missing. Neil
"NS" == Neil Schemenauer <nas@arctrix.com> writes:
NS> What's the chances of getting generators into 2.2? The NS> implementation should not be hard. Didn't Steven Majewski NS> have something years ago? Why do we always get sidetracked on NS> trying to figure out how to do coroutines and continuations? I'd be +1 on someone wrestling PEP 220 from Gordon's icy claws, renaming it just "Generators" and filling it out for the 2.2 time frame. If we want to address coroutines and continuations later, we can write separate PEPs for them. Send me a draft. -Barry
I'd be +1 on someone wrestling PEP 220 from Gordon's icy claws, renaming it just "Generators" and filling it out for the 2.2 time frame. If we want to address coroutines and continuations later, we can write separate PEPs for them.
I think it's better not to re-use PEP 220 for that. --Guido van Rossum (home page: http://www.python.org/~guido/)
[Tim Peters on adding yet more syntatic sugar]
Available time is finite, and this isn't at the top of the list of things I'd like to see (resuming the discussion of generators + coroutines + iteration protocol comes to mind first).
What's the chances of getting generators into 2.2? The implementation should not be hard. Didn't Steven Majewski have something years ago? Why do we always get sidetracked on trying to figure out how to do coroutines and continuations?
I think there's a very good chance of getting them into 2.2. But it *is* true that coroutines are a very attractice piece of land "just nextdoor". On the other hand, continiations are a mirage, so don't try to go there. :-)
Generators would add real power to the language and are simple enough that most users could benefit from them. Also, it should be possible to design an interface that does not preclude the addition of coroutines or continuations later.
I'm not volunteering to champion the cause just yet. I just want to know if there is some issue I'm missing.
There are different ways to do interators. Here is a very "tame" proposal (and definitely in the realm of 2.2), that doesn't require any coroutine-like tricks. Let's propose that for var in expr: ...do something with var... will henceforth be translated into __iter = iterator(expr) while __iter.more(): var = __iter.next() ...do something with var... -- or some variation that combines more() and next() (I don't care). Then a new built-in function iterator() is needed that creates an iterator object. It should try two things: (1) If the object implements __iterator__() (or a C API equivalent), call that and be done; this way arbitrary iterators can be created. (2) If the object smells like a sequence (how to test???), use an iterator sort of like this: class Iterator: def __init__(self, sequence): self.sequence = sequence self.index = 0 def more(self): # Store the item so that each index is tried exactly once try: self.item = self.sequence[self.index] except IndexError: return 0 else: self.index = self.index + 1 return 1 def next(self): return self.item (I don't necessarily mean that all those instance variables should be publicly available.) The built-in sequence types can use a very fast built-in iterator type that uses a C int for the index and doesn't store the item in the iterator. (This should be as fast as Marc-Andre's for loop optimization using a C counter.) Dictionaries can define an appropriate iterator that uses PyDict_Next(). If the argument to iterator() is itself an iterator (how to test???), it returns the argument unchanged, so that one can also write for var in iterator(obj): ...do something with var... Files of course should have iterators that return the next input line. We could build filtering and mapping iterators that take an iterator argument and do certain manipulations with the elements; this would effectively introduce the notion lazy evaluation on sequences. Etc., etc. This does not come close to Icon generators -- but it doesn't require any coroutine-like capabilities, unlike those. --Guido van Rossum (home page: http://www.python.org/~guido/)
On Tue, 30 Jan 2001 19:49:24 -0500, Guido van Rossum <guido@digicool.com> wrote:
There are different ways to do interators.
Here is a very "tame" proposal (and definitely in the realm of 2.2), that doesn't require any coroutine-like tricks. Let's propose that
for var in expr: ...do something with var...
will henceforth be translated into
__iter = iterator(expr) while __iter.more(): var = __iter.next() ...do something with var...
I'm +1 on that...but Tim's "try to use that to write something that will return the nodes of a binary tree" still haunts me. Personally, though, I'd thin down the interface to while 1: try: var = __iter.next() except NoMoreError: break # pseudo-break? With the usual caveat that this is a lie as far as "else" is concerned (IOW, pseudo-break gets into the else)
Then a new built-in function iterator() is needed that creates an iterator object. It should try two things:
(1) If the object implements __iterator__() (or a C API equivalent), call that and be done; this way arbitrary iterators can be created.
(2) If the object smells like a sequence (how to test???), use an iterator sort of like this:
Why not, "if the object doesn't have __iterator__, try this. If it won't work, we'll find out by the exception that will be thrown in our face". class Iterator: def __init__(self, seq): self.seq = seq self.index = 0 def next(self): try: try: return self.seq[self.index] # <- smells like except IndexError: raise NoMoreError(self.index) finally: self.index += 1
(I don't necessarily mean that all those instance variables should be publicly available.)
But what about your poor brother? <wink> Er....I mean, this would make implementing "indexing" really about just getting the index from the iterator.
If the argument to iterator() is itself an iterator (how to test???),
No idea, and this looks problematic. I see your point -- but it's still problematic. -- Moshe Zadka <sig@zadka.site.co.il> This is a signature anti-virus. Please stop the spread of signature viruses! Fingerprint: 4BD1 7705 EEC0 260A 7F21 4817 C7FC A636 46D0 1BD6
Moshe Zadka <moshez@zadka.site.co.il>:
Tim's "try to use that to write something that will return the nodes of a binary tree" still haunts me.
Instead of an iterator protocol, how about a generator protocol? Now that we're getting nested scopes, it should be possible to arrange it so that for x in thing: ...stuff... gets compiled as something like def _body(x): ...stuff... thing.__generate__(_body) (Actually it would be more complicated than that - for backward compatibility you'd want a new bytecode that would look for a __generator__ attribute and emulate the old iteration protocol otherwise.) Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+
Guido van Rossum <guido@digicool.com>:
But it *is* true that coroutines are a very attractice piece of land "just nextdoor".
Unfortunately there's a big high fence in between topped with barbed wire and patrolled by vicious guard dogs. :-( Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+
On Thu, 1 Feb 2001, Greg Ewing wrote:
Guido van Rossum <guido@digicool.com>:
But it *is* true that coroutines are a very attractice piece of land "just nextdoor".
Unfortunately there's a big high fence in between topped with barbed wire and patrolled by vicious guard dogs. :-(
Perhaps you meant, lightly killed and topped with quintuple-smooth, treble milk chocolate? :) -- ?!ng "PS: tongue is firmly in cheek PPS: regrettably, that's my tongue in my cheek" -- M. H.
[Neil Schemenauer]
What's the chances of getting generators into 2.2?
Unknown. IMO it has more to do with generalizing the iteration protocol than with generators per se (a generator object that doesn't play nice with "for" is unpleasant to use; otoh, a generator object that can't be used divorced from "for" is frustrating too (like when comparing the fringes of two trees efficiently, which requires interleaving two distinct traversals, each naturally recursive on its own)).
The implementation should not be hard. Didn't Steven Majewski have something years ago?
Yes, but Guido also sketched out a nearly complete implementation within the last year or so.
Why do we always get sidetracked on trying to figure out how to do coroutines and continuations?
Sorry, I've been failing to find a good answer to that question for a decade <0.4 wink>. I should note, though, that Guido's current notion of "generator" is stronger than Icon/CLU/Sather's (which are "strictly stack-like"), and requires machinery more elaborate than StevenM (or Guido) sketched before.
Generators would add real power to the language and are simple enough that most users could benefit from them. Also, it should be possible to design an interface that does not preclude the addition of coroutines or continuations later.
Agreed.
I'm not volunteering to champion the cause just yet. I just want to know if there is some issue I'm missing.
microthreads have an enthusiastic and possibly growing audience. That gets into (C) stacklessness, though, as do coroutines. I'm afraid that once you go beyond "simple" (Icon) generators, a whole world of other stuff gets pulled in. The key trick to implementing simple generators in current Python is simply to decline decrementing the frame's refcount upon a "suspend" (of course the full details are more involved than *just* that, but they mostly follow *from* just that). everything-is-the-enemy-of-something-ly y'rs - tim
Neil Schemenauer <nas@arctrix.com>:
[Tim Peters on adding yet more syntatic sugar]
Available time is finite, and this isn't at the top of the list of things I'd like to see (resuming the discussion of generators + coroutines + iteration protocol comes to mind first).
What's the chances of getting generators into 2.2? The implementation should not be hard. Didn't Steven Majewski have something years ago? Why do we always get sidetracked on trying to figure out how to do coroutines and continuations?
Generators would add real power to the language and are simple enough that most users could benefit from them. Also, it should be possible to design an interface that does not preclude the addition of coroutines or continuations later.
I agree. I think this is a very importand growth direction for the language. -- <a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a> The whole aim of practical politics is to keep the populace alarmed (and hence clamorous to be led to safety) by menacing it with an endless series of hobgoblins, all of them imaginary. -- H.L. Mencken
Not really. Available time is finite, and this isn't at the top of the list of things I'd like to see (resuming the discussion of generators + coroutines + iteration protocol comes to mind first).
OK, get going on that one then!
Cool! Can we resist adding
if key:value in dict
for "parallelism"? (I know I can ...)
That's easy to resist because, unlike ``for key:value in dict'', it's not unambiguous:
But
if (key:value) in dict
is. Just trying to help whoever *does* want the PEP <wink>.
OK, I'll pronounce -1 on this one. It looks ugly to me -- too reminiscent of C's if (...) required parentheses. Also it suggests that (key:value) is a new tuple notation that might be useful in other contexts -- which it's not.
... I'm certainly more comfortable with just ``for key in dict'' than with the whole slow of extensions using colons.
What about just the
for key:value in dict for index:value in sequence
extensions?
I'm not against these -- I'd say +0.5.
The degenerate forms (omitting x or y or both in x:y) are mechanical variations so are likely to get raised.
For those, +0.2.
But, again, that's for the PEP to fight over.
PEPs are easier if you Pronounce on things you hate early so that those can get recorded in the "BDFL Pronouncements" section without further ado.
At your service -- see above. --Guido van Rossum (home page: http://www.python.org/~guido/)
On Tue, Jan 30, 2001 at 07:28:44PM -0500, Guido van Rossum wrote:
What about just the
for key:value in dict for index:value in sequence
extensions?
I'm not against these -- I'd say +0.5.
What, fractions ? Isn't that against the whole idea of (+|-)(0|1) ? :) But since we are voting, I'm -0 on this right now, and might end up -1 or +0, depending on the implementation; I still can't *see* this, though I wouldn't be myself if I hadn't tried to implement it anyway :) And I ran into some fairly mind-boggling issues. The worst bit is 'how the f*ck does FOR_LOOP know if something's a dict or a list'. And the almost-as-bad bit is 'WTF to do for user classes, extension types and almost-list/almost-dict practically-builtin types (arrays, the *dbm's, etc.)'. After some sleep-deprived consideration I gave up and decided we need an iteration/generator protocol first. However, my life's been busy (or rather, my work has been) with all kinds of small and not so small details, and I haven't been getting much sleep in the last week or so, so I might be overlooking something very simple. That's why I can go either way based on implementation -- it might prove me wrong :) Until my boss is back and I stop being 'responsible' (end of this week, start of next week) and I get a chance to get rid of about 2 months of work backlog (the time he was away) I won't have time to champion or even contribute to such a PEP. Then again, by that time I might be preparing for IPC9 (_if_ my boss sends me there) or even my ApacheCon US presentation (which got accepted today, yay!) So, if that other message was an attempt to drop the PEP on me, Guido, the answer is the same as I tend to give to suits that show up next to my desk wanting to discuss something important (to them) right away: "b'gg'r 'ff" :) I'll-save-my-answer-to-PR-officers-doing-the-same-for-when-you-do-something- -*really*-offensive-ly <wink> y'rs -- Thomas Wouters <thomas@xs4all.net> Hi! I'm a .signature virus! copy me into your .signature file to help me spread!
On Wed, 31 Jan 2001, Thomas Wouters wrote:
I still can't *see* this, though I wouldn't be myself if I hadn't tried to implement it anyway :) And I ran into some fairly mind-boggling issues. The worst bit is 'how the f*ck does FOR_LOOP know if something's a dict or a list'.
I believe the Pythonic answer to that is "see if the appropriate method is available". The best definition of "sequence-like" or "mapping-like" i can come up with is: x is sequence-like if it provides __getitem__() but not keys() x is mapping-like if it provides __getitem__() and keys() But in our case, since we need iteration, we can look for specific methods that have to do with just what we need for iteration and nothing else. Thus, e.g. a mapping-like class without a values() method is no problem if we never ask to iterate over values.
And the almost-as-bad bit is 'WTF to do for user classes, extension types and almost-list/almost-dict practically-builtin types
I think it can be done; the draft PEP at http://www.lfw.org/python/pep-iterators.html is a best-attempt at supporting everything just as you would expect. Let me know if you think there are important cases it doesn't cover. I know, the table mp_iteritems __iteritems__, __iter__, items, __getitem__ mp_iterkeys __iterkeys__, __iter__, keys, __getitem__ mp_itervalues __itervalues__, __iter__, values, __getitem__ sq_iter __iter__, __getitem__ might look a little frightening, but it's not so bad, and i think it's about as simple as you can make it while continuing to support existing pseudo-lists and pseudo-dictionaries. No instance should ever provide __iter__ at the same time as any of the other __iter*__ methods anyway. -- ?!ng "The only `intuitive' interface is the nipple. After that, it's all learned." -- Bruce Ediger, on user interfaces
Ping> x is sequence-like if it provides __getitem__() but not keys() So why does this barf? >>> [].__getitem__ Traceback (most recent call last): File "<stdin>", line 1, in ? AttributeError: __getitem__ (Obviously, lists *do* understand __getitem__ at some level. Why isn't it exposed in the method table?) Skip
On Wed, 31 Jan 2001, Skip Montanaro wrote:
Ping> x is sequence-like if it provides __getitem__() but not keys()
So why does this barf?
>>> [].__getitem__
I was describing how to tell if instances are sequence-like. Before we get to make that judgement, first we have to look at the C method table. So: x is sequence-like if it has tp_as_sequence; all instances have tp_as_sequence; an instance is sequence-like if it has __getitem__() but not keys() x is mapping-like if it has tp_as_mapping; all instances have tp_as_mapping; an instance is mapping-like if it has both __getitem__() and keys() The "in" operator is implemented this way. x customizes "in" if it has sq_contains; all instances have sq_contains; an instance customizes "in" if it has __contains__() If sq_contains is missing, or if an instance has no __contains__ method, we supply the default behaviour by comparing the operand to each member of x in turn. This default behaviour is implemented twice: once in PyObject_Contains, and once in instance_contains. So i proposed this same structure for sq_iter and __iter__. x customizes "for ... in x" if it has sq_iter; all instances have sq_iter; an instance customizes "in" if it has __iter__() If sq_iter is missing, or if an instance has no __iter__ method, we supply the default behaviour by calling PyObject_GetItem on x and incrementing the index until IndexError. -- ?!ng "The only `intuitive' interface is the nipple. After that, it's all learned." -- Bruce Ediger, on user interfaces
[Ping]
x is sequence-like if it provides __getitem__() but not keys()
[Skip]
So why does this barf?
>>> [].__getitem__ Traceback (most recent call last): File "<stdin>", line 1, in ? AttributeError: __getitem__
(Obviously, lists *do* understand __getitem__ at some level. Why isn't it exposed in the method table?)
The old type/class split: list is a type, and types spell their "method tables" in ways that have little in common with how classes do it. See PyObject_GetItem in abstract.c for gory details (e.g., dicts spell their version of getitem via ->tp_as_mapping->mp_subscript(...), while lists spell it ->tp_as_sequence->sq_item(...); neither has any binding to the attr "__getitem__"; instance objects fill in both the tp_as_mapping and tp_as_sequence slots, then map both the mp_subscript and sq_item slots to classobject.c's instance_item, which in turn looks up "__getitem__"). bet-you're-sorry-you-asked<wink>-ly y'rs - tim
"Tim" == Tim Peters <tim.one@home.com> writes:
>> (Obviously, lists *do* understand __getitem__ at some level. Why >> isn't it exposed in the method table?) Tim> The old type/class split: list is a type, and types spell their Tim> "method tables" in ways that have little in common with how classes Tim> do it. The problem that rolls around in the back of my mind from time-to-time is that since Python doesn't currently support interfaces, checking for specific methods seems to be the only reasonable way to determine if a object does what you want or not. What would break if we decided to simply add __getitem__ (and other sequence methods) to list object's method table? Would they foul something up or would simply sit around quietly waiting for hasattr to notice them? Skip
On Wed, 31 Jan 2001, Skip Montanaro wrote:
What would break if we decided to simply add __getitem__ (and other sequence methods) to list object's method table? Would they foul something up or would simply sit around quietly waiting for hasattr to notice them?
That would work for lists, but not for any extension types that use the sq_* protocol to behave like sequences. For now, anyway, we're stuck with the two separate protocols whether we like it or not. -- ?!ng Two links diverged in a Web, and i -- i took the one less travelled by. -- with apologies to Robert Frost
>> What would break if we decided to simply add __getitem__ (and other >> sequence methods) to list object's method table? Ping> That would work for lists, but not for any extension types that Ping> use the sq_* protocol to behave like sequences. Could extension writers add those methods to their modules? I know I'm really getting off-topic here, but the whole visible interface idea crops up from time-to-time. I guess I'm just nibbling around the edges a bit to try and understand the problem better. Skip
[Skip Montanaro]
The problem that rolls around in the back of my mind from time-to-time is that since Python doesn't currently support interfaces, checking for specific methods seems to be the only reasonable way to determine if a object does what you want or not.
Except that-- alas! --"what I want" is almost always for it to respond to some specific methods. For example, I don't believe I've *ever* written a class that responds to all the "number" methods (in particular, I'm almost certain not to bother implementing a notion of "shift"). It's also rare for me to define a class that implements all the "sequence" or "mapping" methods. So if we had a Interface.Sequence, all my code would still check for individual sequence operations anyway. Take it to the extreme, and each method becomes an Interface unto itself, which then get grouped into collections in different ways by different people, and in the end I *still* check for specific methods rather than fight with umpteen competing hierarchies. The most interesting "interfaces" to me are things like EuclideanDomain: a set of guarantees about how methods *interact*, and almost nothing to do with which methods a thing supports. A simpler example is TotalOrdering: there is no method unique to total orderings, instead it's a guarantee about how cmp *behaves*. If you want know whether an object x supports slicing, *trying* x[:0] is as direct as it gets. You just hope that x isn't an instance of class Human: def __getslice__(self, lo, hi): """Return a list of activities planned for human self. lo and hi bound the timespan of activities to be returned, in seconds from the epoch. If lo is less than the birthdate of self, treat lo as if it were self's birthdate. If hi is greater than the expected lifetime of self, treat hi as if it were the expected lifetime of self, but also send an execution order to ensure that self does not live beyond that time (this may seem drastic, but the alternative was complaints from customers who exceeded their expected lifetimes, and then demanded to know why "the stupid software" cut off their calendars "early" -- hey, we'll implement infinite memory when humans are immortal). """ don't-think-it-hasn't-happened<wink>-ly y'rs - tim
Tim Peters <tim.one@home.com>:
The old type/class split: list is a type, and types spell their "method tables" in ways that have little in common with how classes do it.
Maybe as a first step towards type/class unification one day, we could add __xxx__ attributes to all the builtin types, and start to think of the method table as the definitive source of all methods, with the tp_xxx slots being a sort of cache for the most commonly used ones. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+
The old type/class split: list is a type, and types spell their "method tables" in ways that have little in common with how classes do it.
Maybe as a first step towards type/class unification one day, we could add __xxx__ attributes to all the builtin types, and start to think of the method table as the definitive source of all methods, with the tp_xxx slots being a sort of cache for the most commonly used ones.
Yes, I've often thought that we should be able to heal the split for 95% by using a few well-aimed tricks like this. Later... --Guido van Rossum (home page: http://www.python.org/~guido/)
On Sun, Feb 04, 2001 at 11:47:26PM -0500, Guido van Rossum wrote:
Yes, I've often thought that we should be able to heal the split for 95% by using a few well-aimed tricks like this. Later...
I was playing around this weekend with the class/type problem. Without too much effort I had an interpreter that could to things like this: >>> class MyInt(type(1)): ... pass ... >>> i = MyInt(10) >>> i 10 >>> i + 1 11 The major changes were allowing PyClassObject to subclass types (ie. changing PyClass_Check(op) to (PyClass_Check(op) || PyType_Check(op))), writing a _PyType_Lookup function, and making class_lookup use it. The experiment has convinced me that we can allow subclasses of types quite easily without major changes. It has also given me some ideas on "the right way" to solve this problem. The rough scheme I can up yesterday goes like this: PyObject { int ob_refcnt; PyClass ob_class; } PyClass { PyObject_HEAD char *cl_name; getattrfunc cl_getattr; PyMethodTable *cl_methods; } PyMethodTable { binaryfunc nb_add; binaryfunc nb_sub; ... } When calling a method on a object the interpreter would first check for a direct method and if that does not exist then call cl_getattr. Obviously there are still a few details to be worked out. :-) Neil
On Sun, Feb 04, 2001 at 11:47:26PM -0500, Guido van Rossum wrote:
Yes, I've often thought that we should be able to heal the split for 95% by using a few well-aimed tricks like this. Later...
I was playing around this weekend with the class/type problem. Without too much effort I had an interpreter that could to things like this:
>>> class MyInt(type(1)): ... pass ... >>> i = MyInt(10) >>> i 10 >>> i + 1 11
Now, can you do things like this: >>> from types import * >>> class MyInt(IntType): # add a method def add1(self): return self+1 >>> i = MyInt(10) >>> i.add1() 11 >>> and like this: >>> class MyInt(IntType): # override division def __div__(self, other): return float(self) / other def __rdiv__(self, other): return other / float(self) >>> i = MyInt(10) >>> i/3 0.33333333333333331 >>> I'm not asking for adding new instance variables (slots), but that of course would be the next step of difficulty up.
The major changes were allowing PyClassObject to subclass types (ie. changing PyClass_Check(op) to (PyClass_Check(op) || PyType_Check(op))), writing a _PyType_Lookup function, and making class_lookup use it.
Yeah, but that's still nasty. We should strive for unifying PyClass and PyType instead of having both around.
The experiment has convinced me that we can allow subclasses of types quite easily without major changes. It has also given me some ideas on "the right way" to solve this problem. The rough scheme I can up yesterday goes like this:
p> PyObject {
int ob_refcnt; PyClass ob_class;
(plus type-specific fields I suppose)
}
PyClass { PyObject_HEAD char *cl_name; getattrfunc cl_getattr; PyMethodTable *cl_methods; }
PyMethodTable { binaryfunc nb_add; binaryfunc nb_sub; ... }
When calling a method on a object the interpreter would first check for a direct method and if that does not exist then call cl_getattr. Obviously there are still a few details to be worked out. :-)
Yeah... Like you should be able to ask for ListType.append and get an unbound built-in method back, which can be applied to a list: ListType.append([], 1) === [].append(1) And ditto for operators: IntType.__add__(1, 2) === 1+2 And a C API like PyNumber_Add(x, y) should be equivalent to using x.__add__(y), too. --Guido van Rossum (home page: http://www.python.org/~guido/)
On Mon, Feb 05, 2001 at 01:37:39PM -0500, Guido van Rossum wrote:
Now, can you do things like this: [example cut]
No, it would have to be written like this: >>> from types import * >>> class MyInt(IntType): # add a method def add1(self): return self.value+1 >>> i = MyInt(10) >>> i.add1() 11 >>> Note the value attribute. The IntType.__init__ method is basicly: def __init__(self, value): self.value = value
PyObject { int ob_refcnt; PyClass ob_class;
(plus type-specific fields I suppose)
Yes, the instance attributes. In this scheme all objects are instances of some class.
Yeah... Like you should be able to ask for ListType.append and get an unbound built-in method back, which can be applied to a list:
ListType.append([], 1) === [].append(1)
Right. My changes on the weekend where quite close to making this work. Neil
On Mon, Feb 05, 2001 at 01:37:39PM -0500, Guido van Rossum wrote:
Now, can you do things like this: [example cut]
No, it would have to be written like this:
>>> from types import * >>> class MyInt(IntType): # add a method def add1(self): return self.value+1
>>> i = MyInt(10) >>> i.add1() 11 >>>
Note the value attribute. The IntType.__init__ method is basicly:
def __init__(self, value): self.value = value
So, "class MyInt(IntType)" acts as a sort-of automagical "UserInt" class creation? (Analogous to UserList etc.) I'm not sure I like that. Why do we have to have this? --Guido van Rossum (home page: http://www.python.org/~guido/)
On Mon, Feb 05, 2001 at 03:24:19PM -0500, Guido van Rossum wrote:
So, "class MyInt(IntType)" acts as a sort-of automagical "UserInt" class creation? (Analogous to UserList etc.) I'm not sure I like that. Why do we have to have this?
The problem is where to store the information in the PyIntObject structure. I don't think my solution is great either. Neil
On Mon, Feb 05, 2001 at 11:04:22AM -0800, Neil Schemenauer wrote:
On Mon, Feb 05, 2001 at 01:37:39PM -0500, Guido van Rossum wrote:
Now, can you do things like this: [example cut]
No, it would have to be written like this:
>>> from types import * >>> class MyInt(IntType): # add a method def add1(self): return self.value+1
Why ? Couldn't IntType do with an __add__[*] method that does this ugly magic for you ? Same for __sub__, __int__ and so on. *] Yes, yes, I know, it's a type, not a class, but you know what I mean :) -- Thomas Wouters <thomas@xs4all.net> Hi! I'm a .signature virus! copy me into your .signature file to help me spread!
On Tue, Feb 06, 2001 at 03:57:12PM +0100, Thomas Wouters wrote:
Why ? Couldn't IntType do with an __add__[*] method that does this ugly magic for you ? Same for __sub__, __int__ and so on.
You're right. I'm pretty sure my modified interpreter would handle "return self+1" just fine (I can't test it right now). If you wanted to override the __add__ method you would have to write "return IntType.__add__(self, 1)". Neil
<someone whose attribution has been lost>:
for index:value in sequence
-1, because we only construct dicts using that notation, not sequences. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+
On Mon, Jan 29, 2001 at 08:39:17PM -0500, Tim Peters wrote:
for key: in dict: # over dict.keys() for :value in dict: # over dict.values() for : in dict: # a delay loop
Wot's the last one supposed to do ? 'for unused_var in range(len(dict)):' ? -- Thomas Wouters <thomas@xs4all.net> Hi! I'm a .signature virus! copy me into your .signature file to help me spread!
for key: in dict: # over dict.keys() for :value in dict: # over dict.values() for : in dict: # a delay loop
[Thomas Wouters]
Wot's the last one supposed to do ? 'for unused_var in range(len(dict)):' ?
Well, as the preceding line said in the original:
2/3rd of these are marginally more attractive [than "if key:value in dict"]:
I think you've guessed which 2/3 those are <wink>. I don't see that the last line has any visible semantics whatsoever, so Python can do whatever it likes, provided it doesn't do anything visible. You still hang out on c.l.py! So you gotta know that if something of the form x:y is suggested, people will line up to suggest meanings for the 3 obvious variations, along with x::y and x:-:y and x lambda y too <0.9 wink>.
participants (14)
-
barry@digicool.com
-
bckfnn@worldonline.dk
-
Eric S. Raymond
-
Fredrik Lundh
-
Greg Ewing
-
Guido van Rossum
-
Ka-Ping Yee
-
M.-A. Lemburg
-
Mark Hammond
-
Moshe Zadka
-
Neil Schemenauer
-
Skip Montanaro
-
Thomas Wouters
-
Tim Peters