a feature i'd like to see in python #1: better iteration control
many times writing somewhat complex loops over lists i've found the need to sometimes delete an item from the list. currently there's no easy way to do so; basically, you have to write something like i = 0 while i < len(list): el = list[i] ...do something... if el should be deleted: del list[i] else: i += 1 note that you can't write for x in list: or even for i in xrange(len(list)): note also that you need to do some trickiness to adjust the index appropriately when deleting. i'd much rather see something like: for x:iter in list: ...do something... if x should be deleted: iter.delete() the idea is that you have a way of retrieving both the element itself and the iterator for the element, so that you can then call methods on the iterator. it shouldn't be too hard to implement iter.delete(), as well as iter.insert() and similar functions. (the recent changes to the generator protocol in 2.5 might help.) the only question then is how to access the iterator. the syntax i've proposed, with `x:iter', seems fairly logical (note that parallels with slice notation, which also uses a colon) and doesn't introduce any new operators. (comma is impossible since `for x,iter in list:' already has a meaning) btw someone is probably going to come out and say "why don't you just use a list comprehension with an `if' clause? the problems are [1] i'd like this to be destructive; [2] i'd like this to work over non-lists as well, e.g. hash-tables; [3] list comprehensions work well in simple cases, but in more complicated cases when you may be doing various things on each step, and might not know whether you need to delete or insert an element until after you've done various other things, a list comprehension would not fit naturally; [4] this mechanism is extendible to insertions, replacements and other such changes as well as just filterings. ben
On Sun, Dec 03, 2006 at 06:24:17AM -0600, Ben Wing wrote:
many times writing somewhat complex loops over lists i've found the need to sometimes delete an item from the list. currently there's no easy way to do so; basically, you have to write something like
i = 0 while i < len(list): el = list[i] ...do something... if el should be deleted: del list[i] else: i += 1
note that you can't write
for x in list:
or even
for i in xrange(len(list)):
note also that you need to do some trickiness to adjust the index appropriately when deleting.
i'd much rather see something like:
for x:iter in list: ...do something... if x should be deleted: iter.delete()
the idea is that you have a way of retrieving both the element itself and the iterator for the element, so that you can then call methods on the iterator. it shouldn't be too hard to implement iter.delete(), as well as iter.insert() and similar functions. (the recent changes to the generator protocol in 2.5 might help.)
the only question then is how to access the iterator. the syntax i've proposed, with `x:iter', seems fairly logical (note that parallels with slice notation, which also uses a colon) and doesn't introduce any new operators. (comma is impossible since `for x,iter in list:' already has a meaning)
btw someone is probably going to come out and say "why don't you just use a list comprehension with an `if' clause? the problems are [1] i'd like this to be destructive;
Just use slice assignment. l = list(xrange(100)) l2 = [x for x in l if x > 50] l[:] = l2[:]
[2] i'd like this to work over non-lists as well, e.g. hash-tables;
There in is the sucky part; iterator protocol is simple; what you're proposing is extending iterators so that they recall the last value (else iter.delete() would not do anything), which... eh. Think it sucks, to say the least ;) Simple example of where this gets ugly is in iterating over a file.
[3] list comprehensions work well in simple cases, but in more complicated cases when you may be doing various things on each step, and might not know whether you need to delete or insert an element until after you've done various other things, a list comprehension would not fit naturally;
Slice assignments work fine here.
[4] this mechanism is extendible to insertions, replacements and other such changes as well as just filterings.
Yes it is, except the new 'iterator' isn't as extendable to arbitrary sequences after. Also is ignoring the fact that doing in place deletion of lists as you walk can get massively costly quick; worst case, quad for removal (hence collections.deque existing). ~harring
Ben Wing schrieb:
i'd much rather see something like:
for x:iter in list: ...do something... if x should be deleted: iter.delete()
You can easily implement that feature yourself if you need it, at least for lists (or sequences that support integer indexing): class deletable_iter: def __init__(self, sequence): self.sequence = sequence self.last = -1 def __iter__(self): return self def next(self): self.last += 1 try: return self.sequence[self.last] except IndexError: raise StopIteration def delete_last(self): del self.sequence[self.last] self.last -= 1 You use this class like this: x = [1,2,3,4,5,6,7,8] y = deletable_iter(x) for i in y: print i if i%2 == 0: y.delete_last() print x This cannot be generalized for the iteration protocol, because it might not be possible for the iterator to delete an element after it has returned it. Notice that this version "supports" multiple invocations of delete_last in a single step; it may be useful to allow at most one deletion, and raise an exception if a second deletion is attempted. Even if the deletion was added to the iterator protocol (which would require a PEP, IMO), I don't think the syntax should be changed. If you want access to the iterator in the loop, you should explicitly assign it to a variable before entering the loop. Regards, Martin
On 12/3/06, Ben Wing <ben@666.com> wrote:
many times writing somewhat complex loops over lists i've found the need to sometimes delete an item from the list. currently there's no easy way to do so; basically, you have to write something like
As I don't believe there's any need for a language extension, I've posted my approach to this on comp.lang.python: http://groups.google.com/group/comp.lang.python/browse_thread/thread/724aa6b... Note that deleting from the list as you iterate over it tends to have very poor performance for larger lists. My post includes some timings, demonstrating this. -- Adam Olsen, aka Rhamphoryncus
"Ben Wing" <ben@666.com> wrote in message news:4572C1F1.1050600@666.com...
many times writing somewhat complex loops over lists i've found the need to sometimes delete an item from the list. currently there's no easy way to do so; basically, you have to write something like
i = 0 while i < len(list): el = list[i] ...do something... if el should be deleted: del list[i] else: i += 1
My view: while loops are the standard construct for iteration. They give you complete control over the control variable(s). Hence no need for 'better iteration control'. The above is easy enough and quite clear.
note that you can't write
for x in list:
or even
for i in xrange(len(list)):
For loops are for the common case of straightforwardly iterating over a sequence. They are efficient both to write and execute. Let's not mess with them. -1
note also that you need to do some trickiness to adjust the index appropriately when deleting.
Iterate in reverse and no index adjustment is needed Terry Jan Reedy
Brian Harring wrote:
On Sun, Dec 03, 2006 at 06:24:17AM -0600, Ben Wing wrote:
many times writing somewhat complex loops over lists i've found the need to sometimes delete an item from the list. currently there's no easy way to do so; basically, you have to write something like
i = 0 while i < len(list): el = list[i] ...do something... if el should be deleted: del list[i] else: i += 1
note that you can't write
for x in list:
or even
for i in xrange(len(list)):
note also that you need to do some trickiness to adjust the index appropriately when deleting.
i'd much rather see something like:
for x:iter in list: ...do something... if x should be deleted: iter.delete()
the idea is that you have a way of retrieving both the element itself and the iterator for the element, so that you can then call methods on the iterator. it shouldn't be too hard to implement iter.delete(), as well as iter.insert() and similar functions. (the recent changes to the generator protocol in 2.5 might help.)
the only question then is how to access the iterator. the syntax i've proposed, with `x:iter', seems fairly logical (note that parallels with slice notation, which also uses a colon) and doesn't introduce any new operators. (comma is impossible since `for x,iter in list:' already has a meaning)
btw someone is probably going to come out and say "why don't you just use a list comprehension with an `if' clause? the problems are [1] i'd like this to be destructive;
Just use slice assignment.
l = list(xrange(100)) l2 = [x for x in l if x > 50] l[:] = l2[:]
[2] i'd like this to work over non-lists as well, e.g. hash-tables;
There in is the sucky part; iterator protocol is simple; what you're proposing is extending iterators so that they recall the last value (else iter.delete() would not do anything), which... eh.
Think it sucks, to say the least ;)
Simple example of where this gets ugly is in iterating over a file.
with some more thought i see that a language extension to `for' isn't needed, as you can write something like it = iter(foo) for x in it: ... but i still don't see why supporting iter.delete() is so wrong. clearly it doesn't need to work on files or other such things where it doesn't make sense. before you diss this completely, note that java supports exactly the same thing: http://java.sun.com/j2se/1.4.2/docs/api/java/util/Iterator.html remove public void *remove*() Removes from the underlying collection the last element returned by the iterator (optional operation). This method can be called only once per call to next. The behavior of an iterator is unspecified if the underlying collection is modified while the iteration is in progress in any way other than by calling this method. *Throws:* |UnsupportedOperationException <../../java/lang/UnsupportedOperationException.html>| - if the remove operation is not supported by this Iterator. |IllegalStateException <../../java/lang/IllegalStateException.html>| - if the next method has not yet been called, or the remove method has already been called after the last call to the next method.
On Sun, Dec 03, 2006 at 08:35:58PM -0600, Ben Wing wrote:
but i still don't see why supporting iter.delete() is so wrong. clearly it doesn't need to work on files or other such things where it doesn't make sense.
before you diss this completely, note that java supports exactly the same thing:
http://java.sun.com/j2se/1.4.2/docs/api/java/util/Iterator.html
Not all iterators would support remove; that right there is a bit of an issue since right now, the only exception you need to expect for iterator protocol is StopIteration being thrown when the iterator has nothing more to yield. So, it's no longer simpler, which is a bit of a con in my opinion. Question is, where _would_ it work? Doesn't really make much sense for generators (doable with 2.5, but most generators are just that, generators, not modifiable views), doesn't make sense for itertools.* for the most part, since it's combination of iterators. For dict; it actually *cannot* work. You can't remove keys from a dict as you're iterating over it (can change the val of a key, but not remove the key). So iter.delete would require fair bit of changes internally to dict, either tracking what it's yielded already, or forcing iterkeys to actually be iter(keys()) (creating an intermediate list), which is worse for memory usage and general performance. Set's suffer the same thing; can't change what it contains while iterating, have to restart the iteration after a removal/addition. Tuples are immutable, so end of discusion there. Leaves lists... which personally, I view as a mostly bad thing to be doing anyways. Trying to pop an item out of the middle of a list results in shifting everything right of it one spot to the left; this sucks from a performance standpoint, again, worst case, quad. Now... occasionally, have to do it admittedly. But it's not something you actaully want to be doing in your code all that much- admittedly generating a new list to avoid that hit also sucks somewhat, but the worst case there is far more behaved, a temp trade of space vs runtime. What I'm trying to get at is that what iter(list).next().delete() would do isn't a good thing to paper over, it makes the op look like it costs nothing when it can cost a _lot_. Unless I'm missing something, the only real usage of this is a list (could do it on files also, although that would suffer the same issue as a list, just worse via truncate calls). Would work better on collections.deque, but deque is a linked list and used rather selectively. So... why add it, if it's basically for one major type, and it's not a good idea to blindly do such an action on that type in the first place? ~harring
Brian Harring wrote:
On Sun, Dec 03, 2006 at 08:35:58PM -0600, Ben Wing wrote:
but i still don't see why supporting iter.delete() is so wrong. clearly it doesn't need to work on files or other such things where it doesn't make sense.
before you diss this completely, note that java supports exactly the same thing:
http://java.sun.com/j2se/1.4.2/docs/api/java/util/Iterator.html
Not all iterators would support remove; that right there is a bit of an issue since right now, the only exception you need to expect for iterator protocol is StopIteration being thrown when the iterator has nothing more to yield.
So, it's no longer simpler, which is a bit of a con in my opinion.
well, it's not like adding a new exception to any existing iterator method. it would only occur if you call iter.delete() in the wrong context.
For dict; it actually *cannot* work. You can't remove keys from a dict as you're iterating over it (can change the val of a key, but not remove the key). So iter.delete would require fair bit of changes internally to dict, either tracking what it's yielded already, or forcing iterkeys to actually be iter(keys()) (creating an intermediate list), which is worse for memory usage and general performance.
Set's suffer the same thing; can't change what it contains while iterating, have to restart the iteration after a removal/addition.
Leaves lists... which personally, I view as a mostly bad thing to be doing anyways. Trying to pop an item out of the middle of a list results in shifting everything right of it one spot to the left; this sucks from a performance standpoint, again, worst case, quad.
Now... occasionally, have to do it admittedly. But it's not something you actaully want to be doing in your code all that much- admittedly generating a new list to avoid that hit also sucks somewhat, but the worst case there is far more behaved, a temp trade of space vs runtime.
What I'm trying to get at is that what iter(list).next().delete() would do isn't a good thing to paper over, it makes the op look like it costs nothing when it can cost a _lot_.
i do see your point. i was trying to remember what i ended up doing when i ran into this issue before, and in fact i ended up just making a new list and appending all the elements i didn't want to delete. i see you'd get N^2 behavior if you deleted lots of elements from a list, but you get the same thing currently if you call `del list[i]' a lot; it's not obvious that iter.delete() actually "papers over" the cost any more than `del list[i]' does.
Unless I'm missing something, the only real usage of this is a list (could do it on files also, although that would suffer the same issue as a list, just worse via truncate calls). Would work better on collections.deque, but deque is a linked list and used rather selectively.
i actually think now this would be best for hash tables and such. copying a large hash table is a real drag, and using the iterator protocol is the *only* way to iterate over the elements of a hash table. in fact, i imagine this is exactly why java added this functionality. certainly in lisp, the `maphash' function explicitly allows you to delete the current element being iterated over, for the same reason. however, for hash tables there's no reason to use iter.delete(). you should just be able to write `del hash[x]'. is this disallowed currently? if so, it seems like something that should be fixed.
So... why add it, if it's basically for one major type, and it's not a good idea to blindly do such an action on that type in the first place?
ok, i see your point and i retract the suggestion. ben
Brian Harring schrieb:
For dict; it actually *cannot* work. You can't remove keys from a dict as you're iterating over it (can change the val of a key, but not remove the key).
I think this is incorrect. The implementation could well support it, putting a dummy object into the deleted key (which deletion needs to do, anyway).
So iter.delete would require fair bit of changes internally to dict, either tracking what it's yielded already, or forcing iterkeys to actually be iter(keys()) (creating an intermediate list), which is worse for memory usage and general performance.
I don't think either is necessary; deletion could occur "directly".
Set's suffer the same thing; can't change what it contains while iterating, have to restart the iteration after a removal/addition.
Again, I think that's incorrect.
Tuples are immutable, so end of discusion there.
True.
Now... occasionally, have to do it admittedly. But it's not something you actaully want to be doing in your code all that much- admittedly generating a new list to avoid that hit also sucks somewhat, but the worst case there is far more behaved, a temp trade of space vs runtime.
Also true. Regards, Martin
Ben Wing schrieb:
i do see your point. i was trying to remember what i ended up doing when i ran into this issue before, and in fact i ended up just making a new list and appending all the elements i didn't want to delete. i see you'd get N^2 behavior if you deleted lots of elements from a list, but you get the same thing currently if you call `del list[i]' a lot; it's not obvious that iter.delete() actually "papers over" the cost any more than `del list[i]' does.
No, it wouldn't. OTOH, copying into a new list and then performing slice-assignment is O(N). So performance-wise, it is (surprisingly) asymptotically better to create a new list.
however, for hash tables there's no reason to use iter.delete(). you should just be able to write `del hash[x]'. is this disallowed currently?
Try for yourself: py> x={1:2,3:4} py> for k in x: ... if k < 2: ... del x[k] ... Traceback (most recent call last): File "<stdin>", line 1, in ? RuntimeError: dictionary changed size during iteration
if so, it seems like something that should be fixed.
It's difficult to fix: deleting an item may cause a resizing/rehashing of the entire dictionary, hence the dictionary is looked while being iterated over. I *think* that one could support deletion (but not addition); this would need further investigation. Regards, Martin
On Mon, Dec 04, 2006 at 07:15:35PM +0100, "Martin v. L??wis" wrote:
Brian Harring schrieb:
For dict; it actually *cannot* work. You can't remove keys from a dict as you're iterating over it (can change the val of a key, but not remove the key).
I think this is incorrect. The implementation could well support it, putting a dummy object into the deleted key (which deletion needs to do, anyway).
The implementation already uses a sentinel (NULL)- point was that it does not support iteration over a dict that's being deleted from *currently* though. One thing to note; delitem is the easy case. Allowing for mutating the mapping as you're iterating via delitem implies that setitem should work also; setitem however can trigger a resize. Finally, if dicts were ever modified to shrink based on load, the resize there would be an issue. Mind you I've not seen proposals of that sort, just pointing out the potential.
So iter.delete would require fair bit of changes internally to dict, either tracking what it's yielded already, or forcing iterkeys to actually be iter(keys()) (creating an intermediate list), which is worse for memory usage and general performance.
I don't think either is necessary; deletion could occur "directly".
Set's suffer the same thing; can't change what it contains while iterating, have to restart the iteration after a removal/addition.
Again, I think that's incorrect.
Again, was refering to existing implementation (and long standing rules/conventions regarding it). Set suffers the same issue with setitem meanwhile. In my opinion, no point in doing the deltitem modification without a matching setitem. Not saying I think the modification is worth it mind you, just that there should be symmetry ;) ~harring
Brian Harring schrieb:
I think this is incorrect. The implementation could well support it, putting a dummy object into the deleted key (which deletion needs to do, anyway).
The implementation already uses a sentinel (NULL)-
That' not the full truth. The implementation has a separate object (called the dummy object, see dictobject.c:140 in the trunk) to fill put in for deleted objects. You can't use NULL here because then the open addressing would break. NULL is used as a sentinel for the open addressing; the dummy object indicates that you have to continue searching.
point was that it does not support iteration over a dict that's being deleted from *currently* though.
Yes, but I believe that's out of fear that you have to do resizing; iteration cannot be supported in the presence of resizing. Deletion does not involve resizing, instead, ma_fill won't decrease during deletion, so the next addition may trigger resizing if the fill level goes above 2/3.
One thing to note; delitem is the easy case. Allowing for mutating the mapping as you're iterating via delitem implies that setitem should work also; setitem however can trigger a resize.
So that implication is wrong, no? Of course, setitem for an existing key could be allowed; that's also a meaningful operation while you are iterating over a dictionary.
Finally, if dicts were ever modified to shrink based on load, the resize there would be an issue. Mind you I've not seen proposals of that sort, just pointing out the potential.
You'd have to defer the resizing while the dictionary is locked.
In my opinion, no point in doing the deltitem modification without a matching setitem. Not saying I think the modification is worth it mind you, just that there should be symmetry ;)
IMO, it is clear that this symmetry is *not* necessary, for the reason that we are discussing: if iterators where to support a delete operation, then it would be possible to provide that for dict iterators. It wouldn't be necessary to support any other updates; it wouldn't even be necessary to support del d[k], even if k is the key returned from the iterator's .next(). Regards, Martin
[Ben Wing ]
many times writing somewhat complex loops over lists i've found the need to sometimes delete an item from the list. currently there's no easy way to do so; basically, you have to write something like
[Adam Olsen]
As I don't believe there's any need for a language extension, I've posted my approach to this on comp.lang.python:
http://groups.google.com/group/comp.lang.python/browse_thread/thread/724aa6b...
Note that deleting from the list as you iterate over it tends to have very poor performance for larger lists. My post includes some timings, demonstrating this.
Well said. And even if performance considerations could be addressed, it would be a crummy idea , leading people into knotty code. When looping over a structure, the act of actively inserting and deleting entries, can introduce unexpected feedback between loop control and the loop body. This is a recipe for bugs and hard-to-maintain code. Overall, feature #1 is a non-starter and hopefully this thread will die quickly or move off-list. Raymond
participants (6)
-
"Martin v. Löwis"
-
Adam Olsen
-
Ben Wing
-
Brian Harring
-
Raymond Hettinger
-
Terry Reedy