Add OrderedSet now that OrderedDict is in collections
There was a somewhat ancient discussion on OrderedDict and OrderedSet before: http://mail.python.org/pipermail/python-dev/2005-March/051915.html The resolution seemed to be that neither of them should be in stdlib. Now that OrderedDict is in and Raymond Hettinger has created a solid OrderedSet implementation: http://code.activestate.com/recipes/576694/ , could the latter also be included in collections? Here's a very generic use-case: def _update_key(dct, key, val): """ Update a key in dict *dct*. If they key already exists in *dct* but the value doesn't, a set of previous values is created and the value added to it. """ if key in dct: if dct[key] == val: return s = set(dct[key]) s.update(val) dct[key] = s else: dct[key] = val The problem is that I both need to remove duplicates and retain insertion order like list.append().
On Sun, Apr 12, 2009 at 1:22 AM, Mart Sõmermaa <mrts.pydev@gmail.com> wrote:
There was a somewhat ancient discussion on OrderedDict and OrderedSet before: http://mail.python.org/pipermail/python-dev/2005-March/051915.html
The resolution seemed to be that neither of them should be in stdlib. Now that OrderedDict is in and Raymond Hettinger has created a solid OrderedSet implementation: http://code.activestate.com/recipes/576694/ , could the latter also be included in collections?
So, let's review what we have in terms of data structures: Structure Stable Unique Python type ------------------------------------------- Multiset no no - Set no yes set Map no yes dict List yes no list, tuple Ordered set yes yes - Ordered map yes yes collections.OrderedDict where "stable" means that input order is retained. As Multiset is arguably quite useless, only Ordered set is missing from "total" coverage of data structures. And it is practical as well. Am I really the only one who would like to see this in stdlib?
On Fri, May 8, 2009 at 10:58 AM, Mart Sõmermaa <mrts.pydev@gmail.com> wrote:
So, let's review what we have in terms of data structures:
s/data structures/container data structures/
from "total" coverage of data structures. And it is practical as well.
s/"total" coverage of data structures/"total" coverage of container data structures based on ordering and uniqueness criteria/
On Fri, May 8, 2009 at 4:14 PM, Arnaud Delobelle <arnodel@googlemail.com> wrote:
2009/5/8 Mart Sõmermaa <mrts.pydev@gmail.com>:
As Multiset is arguably quite useless[...]
Arguably, it is far from useless.
-- Arnaud
Can you perhaps provide a use case where a list (an "ordered multiset") does not suffice?
Mart Sõmermaa wrote:
On Fri, May 8, 2009 at 4:14 PM, Arnaud Delobelle <arnodel@googlemail.com> wrote:
2009/5/8 Mart Sõmermaa <mrts.pydev@gmail.com>:
As Multiset is arguably quite useless[...] Arguably, it is far from useless.
Can you perhaps provide a use case where a list (an "ordered multiset") does not suffice?
Raymond Hettinger has created a Counter class instead of a strict multiset class. It's useful for efficiently counting how many times things occur. Without it you would have to build a dict of counts (I've done it myself). Practicality beats purity, etc.
Le Fri, 8 May 2009 17:24:41 +0300, Mart Sõmermaa <mrts.pydev@gmail.com> s'exprima ainsi:
On Fri, May 8, 2009 at 4:14 PM, Arnaud Delobelle <arnodel@googlemail.com> wrote:
2009/5/8 Mart Sõmermaa <mrts.pydev@gmail.com>:
As Multiset is arguably quite useless[...]
Arguably, it is far from useless.
-- Arnaud
Can you perhaps provide a use case where a list (an "ordered multiset") does not suffice?
I feel the same. Unorder (like in dict or set) is not a feature, I guess. Set is useful for it ensures uniqueness of items, not because of unorder. A kind of "unique-item" sequence would do as well. Unorder is rather a consequence of hash implementation for performance, but I cannot find any use case where it would be a useful feature. Examples welcome. Denis ------ la vita e estrany
On Sat, May 9, 2009 at 5:17 AM, spir <denis.spir@free.fr> wrote:
I feel the same. Unorder (like in dict or set) is not a feature, I guess. Set is useful for it ensures uniqueness of items, not because of unorder. A kind of "unique-item" sequence would do as well. Unorder is rather a consequence of hash implementation for performance, but I cannot find any use case where it would be a useful feature. Examples welcome.
Unorder is an absence of a feature, so it will never be "useful". However, unorder frees up the implementation to be more efficient since there is less information to keep track of. Also, insertion order isn't always the desired order. For example, maintaining a sorted list of unique items is often handy. In other cases, a different ordering may be useful. -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com>
Daniel Stutzbach wrote:
On Sat, May 9, 2009 at 5:17 AM, spir <denis.spir@free.fr <mailto:denis.spir@free.fr>> wrote:
I feel the same. Unorder (like in dict or set) is not a feature, I guess. Set is useful for it ensures uniqueness of items, not because of unorder. A kind of "unique-item" sequence would do as well. Unorder is rather a consequence of hash implementation for performance, but I cannot find any use case where it would be a useful feature. Examples welcome.
Unorder is an absence of a feature, so it will never be "useful". However, unorder frees up the implementation to be more efficient since there is less information to keep track of.
Also, insertion order isn't always the desired order. For example, maintaining a sorted list of unique items is often handy. In other cases, a different ordering may be useful.
There's also the question of whether the order is significant. Does a == b if a and b contain exactly the same items, but in a different order? If they are lists, then yes; if they are multisets, then no.
On Sat, 9 May 2009 11:53:42 pm MRAB wrote:
There's also the question of whether the order is significant. Does a == b if a and b contain exactly the same items, but in a different order? If they are lists, then yes; if they are multisets, then no.
I'm afraid you have made a mistake. List equality does not ignore order.
[1, 2, 3] == [2, 1, 3] False
-- Steven D'Aprano
Steven D'Aprano wrote:
On Sat, 9 May 2009 11:53:42 pm MRAB wrote:
There's also the question of whether the order is significant. Does a == b if a and b contain exactly the same items, but in a different order? If they are lists, then yes; if they are multisets, then no.
I'm afraid you have made a mistake. List equality does not ignore order.
[1, 2, 3] == [2, 1, 3] False
Oops! Wrong way round. I meant: If they are multisets, then yes; if they are lists, then no.
On 9 May 2009, at 11:17, spir wrote:
I feel the same. Unorder (like in dict or set) is not a feature, I guess. Set is useful for it ensures uniqueness of items, not because of unorder.
That is one reason why sets are useful. Another reason, which is at least as important, is that {1,2}=={2,1} is true - and that's precisely because the elements are not ordered. So absence of order is definitely a useful feature. -- Arnaud
Mart Sõmermaa wrote:
On Sun, Apr 12, 2009 at 1:22 AM, Mart Sõmermaa <mrts.pydev-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
There was a somewhat ancient discussion on OrderedDict and OrderedSet before: http://mail.python.org/pipermail/python-dev/2005-March/051915.html
The resolution seemed to be that neither of them should be in stdlib. Now that OrderedDict is in and Raymond Hettinger has created a solid OrderedSet implementation: http://code.activestate.com/recipes/576694/ , could the latter also be included in collections?
So, let's review what we have in terms of data structures:
Structure Stable Unique Python type ------------------------------------------- Multiset no no - Set no yes set Map no yes dict List yes no list, tuple Ordered set yes yes - Ordered map yes yes collections.OrderedDict
where "stable" means that input order is retained.
As Multiset is arguably quite useless, only Ordered set is missing from "total" coverage of data structures. And it is practical as well.
Am I really the only one who would like to see this in stdlib?
What are the use cases? An 'ordered set' is basically a list + set.
On Fri, May 8, 2009 at 9:12 PM, Terry Reedy <tjreedy@udel.edu> wrote:
What are the use cases? An 'ordered set' is basically a list + set.
A list with no duplicate values (can be alternatively called unique list) is a common problem judging on the following: http://www.google.com/codesearch?q=lang%3Apython+list+unique http://www.google.com/codesearch?q=lang%3Apython+list+duplicates http://www.peterbe.com/plog/uniqifiers-benchmark/ http://code.activestate.com/recipes/52560/ http://www.google.ee/search?q=python+unique+list My use case stems from http://bugs.python.org/issue5877 , see http://github.com/mrts/qparams/blob/bf1b29ad46f9d848d5609de6de0bfac1200da310...
So, let's review what we have in terms of data structures:
Structure Stable Unique Python type ------------------------------------------- Multiset no no - Set no yes set Map no yes dict List yes no list, tuple Ordered set yes yes - Ordered map yes yes collections.OrderedDict
where "stable" means that input order is retained.
<slightly off topic> And where do Multimaps fit in? And Ordered Mulitmaps? Example of an Ordered Multimap in the wild: the headers of an email message, where multiple values may be attached to a single key, and the ordering of key-value pairs is important (and not all key-value pairs with the same key are necessarily adjacent). But basically it is anything that behaves like a sequence of key-value pairs. Come to think of it, 'headers' in lots of network protocols are like this. (ps: +0 on adding ordered sets)
participants (8)
-
Arnaud Delobelle
-
Daniel Stutzbach
-
Jan Kanis
-
Mart Sõmermaa
-
MRAB
-
spir
-
Steven D'Aprano
-
Terry Reedy