Inconsistent API for sets.Set and build-in set
I've been looking at the API for sets.Set and built-in set objects in Python 2.4.1 and I think I have found some minor inconsistencies. Background: we have an object that is very similar to "sets" and we originally modeled the API after sets.Set since we started with Python 2.3. Now I'm trying to update the API so that it's consistent with Python 2.4's built-in set object and I've noticed the following discrepancies. set.Set has both an .update() method and a .union_update() method. They appear to be completely identical, accepting either a Set object or an arbitrary sequence. This is the case despite the docstring difference for these two methods and the fact that Set.update() isn't documented on the texinfo page. Built-in set has only .update() though but it acts just like the set.Set methods above. Note that of all these methods, only Set.update() is documented to take an iterable argument. These inconsistencies could prove a bit problematic when porting Py2.3 applications using sets.Set to Py2.4 using built-in set. I'd like to fix this for Python 2.4.2, and I think the changes are pretty minimal. If there are no objections, I propose to do the following (only in Python 2.4 and 2.5): * Add set.union_update() as an alias for set.update(). * Add to docstrings for all methods that 't' can be any iterable. * Update texinfo documentation to add Set.update() and set.union_update() and explain that all can take iterables I consider this a bug in 2.4, not a new feature, because without it, it makes more work in porting applications. Thoughts? I'm willing to Just Fix It, but if someone wants to see a patch first, I'll be happy to generate it and post it to SF. -Barry
On Thu, Jun 30, 2005, Barry Warsaw wrote:
If there are no objections, I propose to do the following (only in Python 2.4 and 2.5):
* Add set.union_update() as an alias for set.update(). * Add to docstrings for all methods that 't' can be any iterable. * Update texinfo documentation to add Set.update() and set.union_update() and explain that all can take iterables
I consider this a bug in 2.4, not a new feature, because without it, it makes more work in porting applications.
+0 (I'm only not +1 because I don't use sets much -- I'm still mired in Python 2.2 -- but I'm always happy to see inconsistencies resolved) I'll guess that Raymond will probably want 2.5 to have set.union_update() get a PendingDeprecationWarning. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ f u cn rd ths, u cn gt a gd jb n nx prgrmmng.
If there are no objections, I propose to do the following (only in Python 2.4 and 2.5):
* Add set.union_update() as an alias for set.update().
No. It was intentional to drop the duplicate method with the hard-to-use name. There was some thought given to deprecating sets.union_update() but that would have just caused unnecessary grief.
* Add to docstrings for all methods that 't' can be any
iterable.
* Update texinfo documentation to add Set.update() and set.union_update() and explain that all can take iterables
Feel free to assign a doc patch to me.
I consider this a bug in 2.4, not a new feature, because without it, it makes more work in porting applications.
Bah. It's just one of the handful of search/replace steps: Set --> set ImmutableSet --> frozenset union_update --> update
I'm willing to Just Fix It,
Please don't. All of the differences between set and Set were intentional improvements (i.e. the hash algorithm's are different). Raymond
On Thu, 2005-06-30 at 13:37, Raymond Hettinger wrote:
If there are no objections, I propose to do the following (only in Python 2.4 and 2.5):
* Add set.union_update() as an alias for set.update().
No. It was intentional to drop the duplicate method with the hard-to-use name. There was some thought given to deprecating sets.union_update() but that would have just caused unnecessary grief.
Oh, okay. Did you run out of clever abbreviations after s/union_update/update/ or do you think that symmetric_difference_update is already easy enough to use? ;) Why was "update" chosen when you have two other forms of longer *_update() methods on sets? This is after all, a union and it's arguably more confusing not to have that in the name (especially given the "easy-to-use" other *_update() methods).
I consider this a bug in 2.4, not a new feature, because without it, it makes more work in porting applications.
Bah. It's just one of the handful of search/replace steps:
Set --> set ImmutableSet --> frozenset union_update --> update
But an unnecessary one, IMO.
I'm willing to Just Fix It,
Please don't. All of the differences between set and Set were intentional improvements (i.e. the hash algorithm's are different).
I don't care about the implementation, I'm sure it's vastly superior. I'm concerned with the API. I don't agree that dropping union_update() is necessarily an improvement, but I guess I had my chance when the PEP was being debated, so I'll drop it. I do think you owe it to users to describe the differences in PLR $2.3.7 to aid people in the transition process. -Barry
Barry Warsaw wrote:
I've been looking at the API for sets.Set and built-in set objects in Python 2.4.1 and I think I have found some minor inconsistencies.
This comment reminds me another small inconsistency/annoyance. Should copy and clear functions be added to lists, to be more consistent with dict and set, and easing generic code? (Sorry if discussed before, I haven't found anything about that in the archives) Regards, Nicolas
[Nicolas Fleury]
This comment reminds me another small inconsistency/annoyance.
Should copy and clear functions be added to lists, to be more consistent with dict and set, and easing generic code?
I think not. Use copy.copy() for generic copying -- it works across a wide range of objects. Alternatively, use the constructor as generic way to make duplicates: dup = set(s) dup = list(l) dup = dict(d) dup = tuple(t) # note, the duplicate is original object here :-) I would think that that generic clearing is a lark. First, it only applies to mutable objects. Second, it would likely only be useful in the presence of a generic method for adding to the cleared container (as opposed to the existing append(), add(), and setitem() methods for lists, sets, and dictionaries respectively). So, for lists, stick with the current idiom: mylist[:] = [] # clear Raymond
On Thursday 30 June 2005 17:26, Raymond Hettinger wrote:
the current idiom:
mylist[:] = [] # clear
Unless you happen to prefer the other current idiom: del mylist[:] -Fred -- Fred L. Drake, Jr. <fdrake at acm.org>
[Raymond Hettinger]
the current idiom:
mylist[:] = [] # clear
[Fred L. Drake, Jr.]
Unless you happen to prefer the other current idiom:
del mylist[:]
Or my personal favorite, while mylist: del mylist[::2] Then the original index positions with the most consecutive trailing 1 bits survive the longest, which is important to avoid ZODB cache bugs <wink>.
"Tim Peters" <tim.peters@gmail.com> wrote in message news:1f7befae05063015207acb85dc@mail.gmail.com...
Or my personal favorite,
while mylist: del mylist[::2]
Then the original index positions with the most consecutive trailing 1 bits survive the longest, which is important to avoid ZODB cache bugs <wink>.
This is a joke, hopefully, and in that case, I fell for it. If not, please provide a url with related discussion (for educational purposes :)
[Tim Peters]
Or my personal favorite,
while mylist: del mylist[::2]
Then the original index positions with the most consecutive trailing 1 bits survive the longest, which is important to avoid ZODB cache bugs <wink>.
[Christos Georgiou]
This is a joke, hopefully, and in that case, I fell for it. If not, please provide a url with related discussion (for educational purposes :)
Fell for what? It's certainly true that the code snippet allows the original index positions with the most consecutive trailing 1 bits to survive the longest (on the first iteration, all even index positions (no trailing 1 bits) are deleted, and you don't really need a URL to figure out what happens on the i'th iteration). The idea that this is helpful in avoiding anything's "cache bugs" is purely <wink>-worthy, though.
Raymond Hettinger wrote:
I would think that that generic clearing is a lark. First, it only applies to mutable objects. Second, it would likely only be useful in the presence of a generic method for adding to the cleared container (as opposed to the existing append(), add(), and setitem() methods for lists, sets, and dictionaries respectively). So, for lists, stick with the current idiom:
mylist[:] = [] # clear
Pros of list.clear: + easy to find in documentation and help() + readability & clarity of intention in code + commonality with other mutable collections + easier to search on "clear()" (well, at least for me...) Cons of list.clear: + Yet another method on list + Three ways to do the same thing. mylist[:] = [] del mylist[:] mylist.clear() (Although the implementation will use one of slice operators, so I guess that depends on how you count ;) I would agree generic clearing is a lark in terms of programming feature. However, I have been asked how to clear a list more than a handful of times. Personally, my opinion is that having a list.clear method would be a net win, especially since the implementation can be implemented via __setitem__ or __delitem__. Are there more Cons than those I have listed?
[Shane Holloway]
I would agree generic clearing is a lark in terms of programming feature. However, I have been asked how to clear a list more than a handful of times.
list.clear() does not solve the root problem. The question is symptomatic of not understanding slicing. Avoidance of that knowledge doesn't help the person at all. Slicing is part of Python 101 and is basic to the language. There is a reason this subject is presented right at the beginning of the tutorial. By the time a person is writing apps that modify lists in-place (clearing and rebuilding), then they need to know how lists work. So, a better solution is to submit a doc patch to improve the tutorial's presentation on the subject. IMO, there is a much stronger case for Fredrik's proposed join() builtin. That addresses an issue that feels warty on the first day you learn it and still feels that way years later. Raymond
Raymond Hettinger wrote:
[Shane Holloway]
I would agree generic clearing is a lark in terms of programming feature. However, I have been asked how to clear a list more than a handful of times.
list.clear() does not solve the root problem. The question is symptomatic of not understanding slicing. Avoidance of that knowledge doesn't help the person at all. Slicing is part of Python 101 and is basic to the language. There is a reason this subject is presented right at the beginning of the tutorial.
I tend to not agree. I don't think it is symptomatic of "not understanding slicing", but instead of "not being in a slicing state of mind". What I see in day-to-day is that when programmers want to clear a list, they are sometimes in a situation where you have some persistent list that need to be cleared, so the first reflex is not to think about slicing. They end up in a situation where they only use slicing to clear the lists they use, when their problem has nothing to do with slicing IMHO. But I understand that would add yet another way to clear a list, while the function is necessary for sets and dictionaries. Regards, Nicolas
Raymond Hettinger wrote:
Use copy.copy() for generic copying -- it works across a wide range of objects. Alternatively, use the constructor as generic way to make duplicates:
dup = set(s) dup = list(l) dup = dict(d) dup = tuple(t) # note, the duplicate is original object here :-)
I know all this, but why then is there a copy method for sets and dictionaries? What justification is valid for sets and dictionaries that doesn't apply to lists? Regards, Nicolas
Raymond Hettinger wrote:
Use copy.copy() for generic copying -- it works across a wide range of objects. Alternatively, use the constructor as generic way to make duplicates:
dup = set(s) dup = list(l) dup = dict(d) dup = tuple(t) # note, the duplicate is original object here :-)
[Nicolas Fleury]
I know all this, but why then is there a copy method for sets and dictionaries? What justification is valid for sets and dictionaries that doesn't apply to lists?
Several thoughts: * maybe it's a Dutch thing; * dict.copy() pre-dated dict(d) and (IIRC) copy.copy(); * sets and dicts don't have the [:] syntax available to them; * the __copy__() method is new way to make things copy.copyable without fattening the apparent API or rewriting the copy module; * because generic copying isn't important enough to add more ways to do the same thing; * and because Guido believes beginners tend to copy too much (that is one reason why copy.copy is not a builtin) and that the language should encourage correct behavior. Raymond
Raymond Hettinger wrote:
Several thoughts:
As I told you in a private dicussion, you have convinced me about copy. About clear, however, I feel I have to defend my colleagues and myself, who almost all wasted time once (but only once) searching how to clear a list. Improving the docs (like adding an additional example in the table at http://www.python.org/doc/2.4.1/lib/typesseq-mutable.html) would be good. To me, "del mylist[:]" and "mylist[:] = []" are not "how to clear a list" but "how to clear list using slicing". That's why I think it's counter-intuitive, since you end up using slicing in a situation that has nothing to do with slicing. We agree there's no need about generic clearing. It's only about consistency and ease of learning/self-documentation. So let's look at the reasons to not do it: - It's only useful for new Python programmers (I mean first-time clearers), once you know it, you know it. - That would be a third way to clear a list. However, I don't like this argument in this specific case, because IMO the current ways are just slicing capabilities, as "<< 1" and "* 2" can be the same on a int. - All APIs trying to emulate a list would end up incomplete. I have difficulty judging that one. A method addition doesn't sound so bad to me. If it is the showstopper, maybe a Python 3000 thing? Overall, I think the addition of clear would be an improvement to the language, particularly in the autocompletion world of ours;) Regards, Nicolas
participants (9)
-
Aahz
-
Barry Warsaw
-
Christos Georgiou
-
Fred L. Drake, Jr.
-
Nicolas Fleury
-
Raymond Hettinger
-
Raymond Hettinger
-
Shane Holloway (IEEE)
-
Tim Peters