[Python-ideas] Add orderedset as set(iterable, *, ordered=False) and similarly for frozenset.

Jim Baker jim.baker at python.org
Sun Feb 8 17:30:37 CET 2015


Andrew,

You're absolutely right - the semantics matter, as they always do. There
are significant choices in the Java ecosystem because of this. If we are
preserving insertion order, vs a comparator based ordering, skip lists and
the specific ConcurrentSkipListMap implementation will not work.

As often the case, it's good to try things out. I looked at PyPy 2.5.0 this
morning, and both set and dict are insertion order preserving.
Interestingly, when creating a set:

>>> x = {1,2,3,4,5,6,7}

the resulting set is the reverse of its input, which I find surprising:

>>>> x
set([7, 6, 5, 4, 3, 2, 1])

Note that this is not the case if initializing from a sequence:

>>>> set([1,2,3,4,5,6,7])
set([1, 2, 3, 4, 5, 6, 7])

Using LinkedHashSet from Jython also does what is expected:

>>> from java.util import LinkedHashSet as LHS
>>> LHS([1,2,3,4,5,6,7])
[1, 2, 3, 4, 5, 6, 7]

(Iterators, etc, are consistent with the representations that are printed
in these cases from PyPy and Jython.)

But there's one more important thing we have to consider. Jython needs
ConcurrentMap semantics, so that we can support our interpretation of the
Python memory model (sequential consistency, plus weakly consistent
iteration is also depended on, see also
http://www.jython.org/jythonbook/en/1.0/Concurrency.html#python-memory-model).
There's some interesting work out there to support a
ConcurrentLinkedHashMap implementation (
https://code.google.com/p/concurrentlinkedhashmap/), the design of which
was used as the basis of the design of the LRU cache in Google Guava, but
it would require some further investigation to see how this would impact
Jython.

Trying out an alternative implementation in Jython would still be on the
order of a few lines of code changes, however :), which gives us a nice low
cost for such experiments.

- Jim

On Sun, Feb 8, 2015 at 12:31 AM, Andrew Barnert <
abarnert at yahoo.com.dmarc.invalid> wrote:

> On Feb 7, 2015, at 20:02, Jim Baker <jim.baker at python.org> wrote:
>
> Interesting. It's more than a couple of lines of code to change for Jython
> (I tried), but now that we require Java 7, we should be able to implement
> identical ordered behavior to PyPy for dict, __dict__, set, and frozenset
> using ConcurrentSkipListMap instead of our current usage of
> ConcurrentHashMap.
>
>
> How? A skip list is a sorted collection--kept in the natural order of the
> keys, not the insertion order. Of course you can transform each key into a
> (next_index(), key) pair at insert time to keep insertion order instead of
> natural sorting order, but then you lose the logarithmic lookup time when
> searching just by key.
>
> And even if I'm missing something stupid and there is a way to make this
> work, you're still only going to get logarithmic performance, not
> linear. And you're probably going to change the requirement on dict keys
> from hashable to fully ordered (which, on top of being different, is
> substantially harder to detect at runtime, and also doesn't map cleanly to
> immutable among Python's standard data types). So, I don't think it would
> be a good idea even if it were possible.
>
> (There's a reason that Java, as well as C++ and other languages, provides
> sets and maps based on both hash tables and sorted logarithmic
> structures--because they're not interchangeable.)
>
> (Other collections like defaultdict and weakref collections use Google
> Guava's MapMaker, which doesn't have an ordered option.)
>
> I think this is a great idea from a user experience perspective.
>
> On Thu, Feb 5, 2015 at 3:04 PM, Amaury Forgeot d'Arc <amauryfa at gmail.com>
> wrote:
>
>> 2015-02-05 23:02 GMT+01:00 Chris Barker <chris.barker at noaa.gov>:
>>
>>> Interesting -- somehow this message came through on the google group
>>> mirror of  python-ideas only -- so replying to it barfed.
>>>
>>> He's my reply again, with the proper address this time:
>>>
>>> On Wed, Feb 4, 2015 at 1:14 PM, Neil Girdhar <mistersheik at gmail.com>
>>>  wrote:
>>>
>>>> Hundreds of people want an orderedset (
>>>> http://stackoverflow.com/questions/1653970/does-python-have-an-ordered-set)
>>>> and yet the ordered sets that are in PyPI (oset, ordered-set) are
>>>> incomplete.  Please consider adding an ordered set that has *all* of the
>>>> same functionality as set.
>>>>
>>>
>>>
>>>> (If anyone wants to add a complete ordered set to PyPI that would also
>>>> be very useful for me!)
>>>>
>>>
>>> It seems like a good way to go here is to contribute to one of the
>>> projects already on PyPI -- and once you like it, propose that it be added
>>> to the standard library.
>>>
>>> And can you leverage the OrderedDict implementation already in the std
>>> lib? Or maybe better, the new one in PyPy?
>>>
>>
>> PyPy dicts and sets (the built-in ones) are already ordered by default.
>>
>> --
>> Amaury Forgeot d'Arc
>>
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at python.org
>> https://mail.python.org/mailman/listinfo/python-ideas
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
>
>
>
> --
> - Jim
>
> jim.baker@{colorado.edu|python.org|rackspace.com|zyasoft.com}
> twitter.com/jimbaker
> github.com/jimbaker
> bitbucket.com/jimbaker
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
>


-- 
- Jim

jim.baker@{colorado.edu|python.org|rackspace.com|zyasoft.com}
twitter.com/jimbaker
github.com/jimbaker
bitbucket.com/jimbaker
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150208/76dc1030/attachment-0001.html>


More information about the Python-ideas mailing list