Re: [Python-ideas] clear() method for lists

what about another method clone() (or copy())? i think this maybe useful either. ------------------ Original ------------------ From: "average"<dreamingforward@gmail.com>; Date: Mon, Feb 8, 2010 09:14 AM To: "Gerald Britton"<gerald.britton@gmail.com>; Cc: "Python-Ideas"<python-ideas@python.org>; Subject: Re: [Python-ideas] clear() method for lists On Fri, Feb 5, 2010 at 1:39 PM, Gerald Britton <gerald.britton@gmail.com> wrote:
In the abstract it seems like such a method should be part of the Container ABC. Since the idea of a container would imply a method to clear its contents. mark _______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas

On Wed, Feb 10, 2010 at 9:12 AM, Mathias Panzenböck <grosser.meister.morti@gmx.net> wrote:
Yes, I plan to ask for copy() as well, when the bug tracker opens up for 3.3, 3.4, etc. The issue is not, "Is there already a way to do this?" but rather, "Can we have consistent interfaces in the sequence types and collections where possible and appropriate?" dict() and set() already support both clear() and copy() methods. Previous posters have pointed to the disconnect and showed the problem of having to test if a given iterable supports the clear() method before calling it, in functions that take any iterable. Also, for what it's worth: s1 = set() s2 = s1.copy() is faster than s1 = set() s2 = set(s1) (and also for dict()) probably because the first is specifically-written for the copy operation whereas the second actually iterates over s1, one item at a time. (At least I think that's what's going on). I suppose that a list().copy() method might also be faster than the other two approaches to copy a list. Lastly, for completeness, I suppose copy() might be appropriate for both tuple and deque as well. -- Gerald Britton

On Feb 10, 2010, at 6:54 AM, Gerald Britton wrote:
Use the copy module.
I question your timing skills. Both call the same routine to do the work of copying entries: set_copy() calls make_new_set() which calls set_update_internal() set_init() calls set_update_internal() If there is any difference at all, it is the constant overhead of passing an argument to set(), not the implementation itself. The actual set building work is the same.
You need to read some code, learn about ref counts, etc. There's more to list copying than a memcpy(). If list.copy() were added, it would use that same underlying code as list(s) and s[:]. There would be no speed-up.
Lastly, for completeness, I suppose copy() might be appropriate for both tuple and deque as well.
Tuples? Really? An immutable collection is its own copy. Raymond

On Wed, Feb 10, 2010 at 12:10 PM, Raymond Hettinger <raymond.hettinger@gmail.com> wrote:
Thanks for the feedback. I also question my timing skills (and most other skills that I think I have). That's what's good about bouncing ideas around here. Silly ones get shot down, and rightly so! -- Gerald Britton

On Thu, Feb 11, 2010 at 3:08 AM, Simon Brunning <simon@brunningonline.net> wrote:
Say you had a problem where you started with a basic tuple, then needed to add items to it to produce some result. Now suppose you want to do that repeatedly. You don't want to disturb the basic tuple, so you make a copy of it before extending it. e.g.
if tuple() had a copy() method, I could write: country_state = country.copy() + ("NY",) etc. Not that this is necessarily "better" in some way. I'm just thinking about consistency across the built-in types. If dict() and set() have copy(), why not list() and tuple()? On the other hand, if the consensus is _not_ to add the copy() method to lists and tuples, why not deprecate the method in sets and dicts and encourage folks to use the copy module or just use "newdict = dict(olddict)" and "newset = set(oldset)" to build a new dictionary or set from an existing one?
-- Gerald Britton

On Thu, Feb 11, 2010 at 09:51:42AM -0500, Gerald Britton wrote:
You can never "disturb" a tuple - it's a read-only object. Oleg. -- Oleg Broytman http://phd.pp.ru/ phd@phd.pp.ru Programmers don't die, they just GOSUB without RETURN.

On Thu, Feb 11, 2010 at 03:21:27PM +0000, Matthew Russell wrote:
It's not a bug. += is not obliged to increase (extend) objects in place. In case of read-only objects += creates a new extended object and returns it:
You don't suppose that 2 magically became 3, do you? Instead += replaces an integer object pointed to by a with a different integer object. The same is true for tuples. The original tuple of len 2 was replaced by a completely new tuple of len 3. If you hold a reference to the original tuple you can find it's still intact:
Oleg. -- Oleg Broytman http://phd.pp.ru/ phd@phd.pp.ru Programmers don't die, they just GOSUB without RETURN.

On Thu, Feb 11, 2010 at 10:21, Matthew Russell <matt.horizon5@gmail.com>wrote:
a is a list; augmented assignment mutates it, but it's still the same object.
b is a tuple; augmented assignment creates a new object and re-binds "b" to it. -- Tim Lesher <tlesher@gmail.com>

On Thu, Feb 11, 2010 at 10:38 AM, Tim Lesher <tlesher@gmail.com> wrote:
Thanks all for helping me understand this better. The subtly above is something I missed. I searched the doc for a description of it but couldn't readily find it. Tim's simple one-line statement and the example above does it very nicely. Switching gears for a moment, what is the feeling regarding the copy() methods for dictionaries and sets? Are they truly redundant? Should they be deprecated? Should users be encouraged to use the copy module or just use "newdict = dict(olddict)" and "newset = set(oldset)" to build a new dictionary or set from an existing one? -- Gerald Britton

On 2/11/2010 12:35 PM, Gerald Britton wrote:
On Thu, Feb 11, 2010 at 10:38 AM, Tim Lesher<tlesher@gmail.com> wrote:
I did not even know that they exist and do not know why they exist. In my opinion, set(x) should special case s being a set/frozenset, and maybe even a dict, and so whatever set.copy does now. Ditto for dict. Terry Jan Reedy

Gerald Britton writes:
It's in the language reference. It is only two lines (the definition of "immutable" and the description of assignment semantics), so easy to miss. :-) There probably is some discussion in the tutorial.
I think they are redundant. new = type(old) should be the standard idiom for an efficient shallow copy. If that doesn't serve your application's needs, use the copy module. The responsibility for discrimination is the application programmer's. Superficially this might seem to violate TOOWTDI, but actually, not. Shallow copies and deep copies are two very different "Its", and have to be decided by the app author in any case. I don't see what .copy can add. .clear is another matter, in terms of semantics. However, the same effect can be achieve at the cost of indirection and extra garbage: class DictWithClear(object): def __init__(self): self.clear() def clear(self): d = {} # Implement other dict methods here. This is obviously wasteful if all you want to do is add .clear to a "bare" dictionary. However, in many cases the dictionary is an attribute of a larger structure already and the only direct reference to the dictionary is from that structure. Then clearing by replacing the obsolete dictionary with a fresh empty one is hardly less efficient than clearing the obsolete contents. There are other arguments *for* the .clear method (eg, it would be a possibly useful optimization if instead of a class with a dictionary attribute, the class inherited from the dictionary).

Speaking of new potential list methods, how about list.get(index, default=None) ala dict.get ? I'm sure this has must have come up at some point but can't find it ATM. George

George Sakkis wrote:
I believe it runs afoul of the moratorium, but a getitem() builtin might be a better idea (since it would then work for any class that implements __getitem__). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------

On 12 Feb 2010, at 10:58 , Nick Coghlan wrote:
Maybe just extending operator.itemgetter with a "default" kwarg? Wouldn't run afoul the moratorium, and would be quite a nice extension to itemgetter. Though I'm not sure it's a very good idea for lists. Semantically, lists are to be iterated, not really to be indexed.

Masklinn wrote:
Yeah, a kw-only arg for itemgetter and attrgetter could definitely work. It would be somewhat clunky to use though.
Though I'm not sure it's a very good idea for lists. Semantically, lists are to be iterated, not really to be indexed.
Using short lists as record sets happens all the time (especially with things like str.split and other parsing operations that build up their results incrementally). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------

Masklinn wrote:
Though I'm not sure it's a very good idea for lists. Semantically, lists are to be iterated, not really to be indexed.
I don't think I agree that lists are meant primarily for iteration. Indexing them is a perfectly legitimate and useful thing to do. However, I would agree that a list.get() operation with a default seems to be a rather rare requirement. Usually when you index a list, the index is generated by some algorithm that guarantees it's within range. I can't remember ever wanting a list.get() myself, and if I ever did, I would be quite happy to write my own. -- Greg

Nick Coghlan wrote:
Not quite the same way, though. The dict get() method knows about the internals of the object, so it can work very efficiently and without danger of masking bugs by catching the wrong exception. A generic getitem() wouldn't be able to do that. -- Greg

Thanks Tim. The dict and set types _do_ have clear() methods, but not the list() type. I first ran into this sometime ago when a question was posted about it. It intrigued me because I saw what I thought was a gap. Basically I like things to be consistent. I was also wondering about garbage collection. If I have a humongous list, e.g. and "clear" it with: mylist = [] does the old content not need to be garbage collected? Might it not continue to occupy its memory for a while? OTOH do dict.clear() and set.clear() immediately free their memory or does it just get queued for garbage collection? On Thu, Feb 11, 2010 at 9:54 PM, Stephen J. Turnbull <turnbull@sk.tsukuba.ac.jp> wrote:
-- Gerald Britton

Gerald Britton writes:
thislist = generate_me_a_humongous_list() thatlist = thislist thatlist = [] Definitely no garbage collection. The starting point of using garbage collection is that in general you don't know *locally* whether something is reachable or not. So you need to do a global analysis.
OTOH do dict.clear() and set.clear() immediately free their memory or does it just get queued for garbage collection?
This is covered in the manuals, but the gist is that every Python object knows how many other objects are pointing to it (called a refcount). When an object's refcount drops to zero, it gets collected (immediately, IIRC). However ... thislist = [] thatlist = [thislist] thislist.append(thatlist) and you have a reference cycle. These cycles are also collected, but this requires more effort, and so it is done only occasionally.

Stephen J. Turnbull wrote:
This description applies for CPython (the one from python.org), since that uses refcounting with cyclic garbage collection. Other Python implementations work differently (e.g. Jython and IronPython rely on the garbage collector in their underlying VMs) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------

On 11 February 2010 14:51, Gerald Britton <gerald.britton@gmail.com> wrote:
Say you had a problem where you started with a basic tuple, then needed to add items to it to produce some result.
Bzzzz! Tuples are immutable - you can't add items to them. -- Cheers, Simon B.

On 11 February 2010 14:51, Gerald Britton <gerald.britton@gmail.com> wrote:
You do know that tuples are immutable, don't you? Python 2.6.4 (r264:75708, Oct 26 2009, 08:23:19) [MSC v.1500 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information.
It sounds like you could do with reading the Python documentation a bit more closely before proposing changes... Paul

On Thu, Feb 11, 2010 at 3:51 PM, Gerald Britton <gerald.britton@gmail.com> wrote:
Note that for a tuple T tuple(T) == T So you can already write: country_state = country + ("NY",) and it will already have exactly the same effect that tuple(country) or your proposed country.copy() would have. -- André Engels, andreengels@gmail.com

Mathias Panzenböck wrote:
On 02/10/2010 11:39 AM, wxyarv wrote:
what about another method clone() (or copy())?
Last time copying lists was discussed, I seem to remember there were considered to be too many ways of doing it already, so I can't see another one being added. -- Greg

On Wed, Feb 10, 2010 at 9:12 AM, Mathias Panzenböck <grosser.meister.morti@gmx.net> wrote:
Yes, I plan to ask for copy() as well, when the bug tracker opens up for 3.3, 3.4, etc. The issue is not, "Is there already a way to do this?" but rather, "Can we have consistent interfaces in the sequence types and collections where possible and appropriate?" dict() and set() already support both clear() and copy() methods. Previous posters have pointed to the disconnect and showed the problem of having to test if a given iterable supports the clear() method before calling it, in functions that take any iterable. Also, for what it's worth: s1 = set() s2 = s1.copy() is faster than s1 = set() s2 = set(s1) (and also for dict()) probably because the first is specifically-written for the copy operation whereas the second actually iterates over s1, one item at a time. (At least I think that's what's going on). I suppose that a list().copy() method might also be faster than the other two approaches to copy a list. Lastly, for completeness, I suppose copy() might be appropriate for both tuple and deque as well. -- Gerald Britton

On Feb 10, 2010, at 6:54 AM, Gerald Britton wrote:
Use the copy module.
I question your timing skills. Both call the same routine to do the work of copying entries: set_copy() calls make_new_set() which calls set_update_internal() set_init() calls set_update_internal() If there is any difference at all, it is the constant overhead of passing an argument to set(), not the implementation itself. The actual set building work is the same.
You need to read some code, learn about ref counts, etc. There's more to list copying than a memcpy(). If list.copy() were added, it would use that same underlying code as list(s) and s[:]. There would be no speed-up.
Lastly, for completeness, I suppose copy() might be appropriate for both tuple and deque as well.
Tuples? Really? An immutable collection is its own copy. Raymond

On Wed, Feb 10, 2010 at 12:10 PM, Raymond Hettinger <raymond.hettinger@gmail.com> wrote:
Thanks for the feedback. I also question my timing skills (and most other skills that I think I have). That's what's good about bouncing ideas around here. Silly ones get shot down, and rightly so! -- Gerald Britton

On Thu, Feb 11, 2010 at 3:08 AM, Simon Brunning <simon@brunningonline.net> wrote:
Say you had a problem where you started with a basic tuple, then needed to add items to it to produce some result. Now suppose you want to do that repeatedly. You don't want to disturb the basic tuple, so you make a copy of it before extending it. e.g.
if tuple() had a copy() method, I could write: country_state = country.copy() + ("NY",) etc. Not that this is necessarily "better" in some way. I'm just thinking about consistency across the built-in types. If dict() and set() have copy(), why not list() and tuple()? On the other hand, if the consensus is _not_ to add the copy() method to lists and tuples, why not deprecate the method in sets and dicts and encourage folks to use the copy module or just use "newdict = dict(olddict)" and "newset = set(oldset)" to build a new dictionary or set from an existing one?
-- Gerald Britton

On Thu, Feb 11, 2010 at 09:51:42AM -0500, Gerald Britton wrote:
You can never "disturb" a tuple - it's a read-only object. Oleg. -- Oleg Broytman http://phd.pp.ru/ phd@phd.pp.ru Programmers don't die, they just GOSUB without RETURN.

On Thu, Feb 11, 2010 at 03:21:27PM +0000, Matthew Russell wrote:
It's not a bug. += is not obliged to increase (extend) objects in place. In case of read-only objects += creates a new extended object and returns it:
You don't suppose that 2 magically became 3, do you? Instead += replaces an integer object pointed to by a with a different integer object. The same is true for tuples. The original tuple of len 2 was replaced by a completely new tuple of len 3. If you hold a reference to the original tuple you can find it's still intact:
Oleg. -- Oleg Broytman http://phd.pp.ru/ phd@phd.pp.ru Programmers don't die, they just GOSUB without RETURN.

On Thu, Feb 11, 2010 at 10:21, Matthew Russell <matt.horizon5@gmail.com>wrote:
a is a list; augmented assignment mutates it, but it's still the same object.
b is a tuple; augmented assignment creates a new object and re-binds "b" to it. -- Tim Lesher <tlesher@gmail.com>

On Thu, Feb 11, 2010 at 10:38 AM, Tim Lesher <tlesher@gmail.com> wrote:
Thanks all for helping me understand this better. The subtly above is something I missed. I searched the doc for a description of it but couldn't readily find it. Tim's simple one-line statement and the example above does it very nicely. Switching gears for a moment, what is the feeling regarding the copy() methods for dictionaries and sets? Are they truly redundant? Should they be deprecated? Should users be encouraged to use the copy module or just use "newdict = dict(olddict)" and "newset = set(oldset)" to build a new dictionary or set from an existing one? -- Gerald Britton

On 2/11/2010 12:35 PM, Gerald Britton wrote:
On Thu, Feb 11, 2010 at 10:38 AM, Tim Lesher<tlesher@gmail.com> wrote:
I did not even know that they exist and do not know why they exist. In my opinion, set(x) should special case s being a set/frozenset, and maybe even a dict, and so whatever set.copy does now. Ditto for dict. Terry Jan Reedy

Gerald Britton writes:
It's in the language reference. It is only two lines (the definition of "immutable" and the description of assignment semantics), so easy to miss. :-) There probably is some discussion in the tutorial.
I think they are redundant. new = type(old) should be the standard idiom for an efficient shallow copy. If that doesn't serve your application's needs, use the copy module. The responsibility for discrimination is the application programmer's. Superficially this might seem to violate TOOWTDI, but actually, not. Shallow copies and deep copies are two very different "Its", and have to be decided by the app author in any case. I don't see what .copy can add. .clear is another matter, in terms of semantics. However, the same effect can be achieve at the cost of indirection and extra garbage: class DictWithClear(object): def __init__(self): self.clear() def clear(self): d = {} # Implement other dict methods here. This is obviously wasteful if all you want to do is add .clear to a "bare" dictionary. However, in many cases the dictionary is an attribute of a larger structure already and the only direct reference to the dictionary is from that structure. Then clearing by replacing the obsolete dictionary with a fresh empty one is hardly less efficient than clearing the obsolete contents. There are other arguments *for* the .clear method (eg, it would be a possibly useful optimization if instead of a class with a dictionary attribute, the class inherited from the dictionary).

Speaking of new potential list methods, how about list.get(index, default=None) ala dict.get ? I'm sure this has must have come up at some point but can't find it ATM. George

George Sakkis wrote:
I believe it runs afoul of the moratorium, but a getitem() builtin might be a better idea (since it would then work for any class that implements __getitem__). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------

On 12 Feb 2010, at 10:58 , Nick Coghlan wrote:
Maybe just extending operator.itemgetter with a "default" kwarg? Wouldn't run afoul the moratorium, and would be quite a nice extension to itemgetter. Though I'm not sure it's a very good idea for lists. Semantically, lists are to be iterated, not really to be indexed.

Masklinn wrote:
Yeah, a kw-only arg for itemgetter and attrgetter could definitely work. It would be somewhat clunky to use though.
Though I'm not sure it's a very good idea for lists. Semantically, lists are to be iterated, not really to be indexed.
Using short lists as record sets happens all the time (especially with things like str.split and other parsing operations that build up their results incrementally). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------

Masklinn wrote:
Though I'm not sure it's a very good idea for lists. Semantically, lists are to be iterated, not really to be indexed.
I don't think I agree that lists are meant primarily for iteration. Indexing them is a perfectly legitimate and useful thing to do. However, I would agree that a list.get() operation with a default seems to be a rather rare requirement. Usually when you index a list, the index is generated by some algorithm that guarantees it's within range. I can't remember ever wanting a list.get() myself, and if I ever did, I would be quite happy to write my own. -- Greg

Nick Coghlan wrote:
Not quite the same way, though. The dict get() method knows about the internals of the object, so it can work very efficiently and without danger of masking bugs by catching the wrong exception. A generic getitem() wouldn't be able to do that. -- Greg

Thanks Tim. The dict and set types _do_ have clear() methods, but not the list() type. I first ran into this sometime ago when a question was posted about it. It intrigued me because I saw what I thought was a gap. Basically I like things to be consistent. I was also wondering about garbage collection. If I have a humongous list, e.g. and "clear" it with: mylist = [] does the old content not need to be garbage collected? Might it not continue to occupy its memory for a while? OTOH do dict.clear() and set.clear() immediately free their memory or does it just get queued for garbage collection? On Thu, Feb 11, 2010 at 9:54 PM, Stephen J. Turnbull <turnbull@sk.tsukuba.ac.jp> wrote:
-- Gerald Britton

Gerald Britton writes:
thislist = generate_me_a_humongous_list() thatlist = thislist thatlist = [] Definitely no garbage collection. The starting point of using garbage collection is that in general you don't know *locally* whether something is reachable or not. So you need to do a global analysis.
OTOH do dict.clear() and set.clear() immediately free their memory or does it just get queued for garbage collection?
This is covered in the manuals, but the gist is that every Python object knows how many other objects are pointing to it (called a refcount). When an object's refcount drops to zero, it gets collected (immediately, IIRC). However ... thislist = [] thatlist = [thislist] thislist.append(thatlist) and you have a reference cycle. These cycles are also collected, but this requires more effort, and so it is done only occasionally.

Stephen J. Turnbull wrote:
This description applies for CPython (the one from python.org), since that uses refcounting with cyclic garbage collection. Other Python implementations work differently (e.g. Jython and IronPython rely on the garbage collector in their underlying VMs) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------

On 11 February 2010 14:51, Gerald Britton <gerald.britton@gmail.com> wrote:
Say you had a problem where you started with a basic tuple, then needed to add items to it to produce some result.
Bzzzz! Tuples are immutable - you can't add items to them. -- Cheers, Simon B.

On 11 February 2010 14:51, Gerald Britton <gerald.britton@gmail.com> wrote:
You do know that tuples are immutable, don't you? Python 2.6.4 (r264:75708, Oct 26 2009, 08:23:19) [MSC v.1500 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information.
It sounds like you could do with reading the Python documentation a bit more closely before proposing changes... Paul

On Thu, Feb 11, 2010 at 3:51 PM, Gerald Britton <gerald.britton@gmail.com> wrote:
Note that for a tuple T tuple(T) == T So you can already write: country_state = country + ("NY",) and it will already have exactly the same effect that tuple(country) or your proposed country.copy() would have. -- André Engels, andreengels@gmail.com

Mathias Panzenböck wrote:
On 02/10/2010 11:39 AM, wxyarv wrote:
what about another method clone() (or copy())?
Last time copying lists was discussed, I seem to remember there were considered to be too many ways of doing it already, so I can't see another one being added. -- Greg
participants (17)
-
Andre Engels
-
George Sakkis
-
Gerald Britton
-
Greg Ewing
-
Masklinn
-
Mathias Panzenböck
-
Matthew Russell
-
Nick Coghlan
-
Oleg Broytman
-
Paul Moore
-
Raymond Hettinger
-
Simon Brunning
-
Stephen J. Turnbull
-
Stephen J. Turnbull
-
Terry Reedy
-
Tim Lesher
-
wxyarv