[Python-ideas] Add orderedset as set(iterable, *, ordered=False) and similarly for frozenset.

Sun Feb 8 17:41:16 CET 2015

Thanks, Raymond.

My only question is when you say:

Perhaps, I wasn't clear.  The PyPy folks have offered to port their work
back into Python.  They're offering a nice combination of faster
performance (especially for iteration), better memory utilization, and as a
by-product retaining insertion order.  If their offer comes to fruition,
then this whole discussion is moot.

Do you mean it will be moot because :

a) OrderedSet will be declared as a synonym for "set"

b) The Python language specification will be changed so that set is always
ordered, or

c) something else?

Thanks

Neil

On Sat, Feb 7, 2015 at 10:21 PM, Raymond Hettinger <
raymond.hettinger at gmail.com> wrote:

>
> > * It isn't entirely clear what the set-to-set operations should do to
> maintain order, whether equality tests should require matching order,
> whether we care about loss of commutativity, and whether we're willing to
> forgo the usual optimizations on set-to-set operations (i.e. the
> intersection algorithm loops over the smaller of the two input sets and
> checks membership in the larger set -- that is fast, but it would lose
> order if the second set were the smaller of the two).
> >
> >> I agree it's not obvious, but I think that with a little debate we
> could come together on a reasonable choice.
>
> "In the face of ambiguity, refuse the temptation to guess."   Some of the
> participants on python-ideas need to develop a much stronger aversion to
> opening a can of worms ;-)
>
>
> >  You could even block the set-to-set operations for now.  The main use
> of OrderedSet just requires add, update, and iteration as your recipe
> illustrates.
>
> Since you don't seem have any specific requirements for the order created
> by set-to-set operations, then just use the existing set operations on
> OrderedDict.keys():
>
>     >>> a = OrderedDict.fromkeys(['raymond', 'rachel', 'matthew'])
>     >>> b = OrderedDict.fromkeys(['matthew', 'calvin', 'rachel'])
>     >>> a.keys() & b.keys()
>     {'rachel', 'matthew'}
>     >>> a.keys() | b.keys()
>     {'rachel', 'calvin', 'matthew', 'raymond'}
>     >>> a.keys() - b.keys()
>     {'raymond'}
>     >>> a.keys() ^ b.keys()
>     {'calvin', 'raymond'}
>     >>> list(a.keys())
>     ['raymond', 'rachel', 'matthew']
>
>
> >
> > * The standard library should provide fast, clean fundamental building
> blocks and defer all of the myriad of variants and derivations to PyPI.
>  We already provide OrderedDict and MutableSet so that rolling your own
> OrderedSet isn't hard (see the recipe below).
> >
> > I think that OrderedSet is a fundamental building block :)
>
> The rule for "fundamental" that I've used for itertools is whether the
> thing you want can easily be built out of what you've already got.   For
> example, you can't make accumulate() out of the other itertools, but you
> can make tabulate(func) from map(func, count()).  Likewise, you can easily
> make an OrderedSet from an OrderedDict and MutableSet, or if you're needs
> are less, just use the existing set operations on dict views as shown above.
>
>
> > * We're likely to soon have an OrderedDict written in C (I believe Eric
> Snow has been working on that for some time), so anything build on top of
> an OrderedDict will inherit all the speed benefits.
> >
> > Sure.  It's also easy to have an OrderedSet written in C (especially
> once OrderedDict is done.)
>
> You're profoundly under-estimating how much work it is to build and
> maintain a separate C implementation.  I've maintained the code for
> collections and sets for over a decade and can assure you that the
> maintenance burden has been enormous.
>
>
> > Also, there is some chance that PyPy folks will contribute their work
> making regular sets and dicts ordered by default; in which case, having a
> separate OrderedSet would have been a waste and the question would be moot.
> >
> > I think that using dicts and sets and assuming that they're ordered
> because you plan on using PyPy is a terrible error.
>
> Perhaps, I wasn't clear.  The PyPy folks have offered to port their work
> back into Python.  They're offering a nice combination of faster
> performance (especially for iteration), better memory utilization, and as a
> by-product retaining insertion order.  If their offer comes to fruition,
> then this whole discussion is moot.
>
> >
> > To be honest, my main issue is my personal need.
>
> The usual solution for personal needs is to have a personal utils module
> (it sounds like you've been using my recipe for over a year).  Decisions
> about what to put in the standard library have far reaching impact and we
> need to consider what is best for most of the users most of the time (and
> to consider our own maintenance burden).
>
> I thought ordered sets were interesting enough to write an implementation
> six years ago, but frankly there are many other other things that have a
> much better case for being added to collections (for example, some sort of
> graph structure, some kind of binary tree, some kind of trie, or perhaps a
> bloom filter).
>
>
> > If there was a complete PyPI recipe, I would be very happy.
>
> Then, that is what we should do.
>
>
> > I don't think I'm the only one, but maybe you're right and I am.
> >
> >
> > * One thing I've used to test the waters is to include a link in the
> documentation to an OrderedSet recipe (it's in the "See also" section at
> the very bottom of the page at
> https://docs.python.org/2/library/collections.html ).  As far as I can
> tell, there has been nearly zero uptake based on this link.  That gives us
> a good indication that there is very little real-world need for an
> OrderedSet.
> >
> > I've been using your recipe for at least a year (thank you).  How are
> you measuring uptake?
>
>
> That is a very difficult question to answer (there is no precise tool for
> saying X people care about thing Y).  The inputs I do have are imprecise
> and incomplete but never-the-less still informative.  When google's code
> search was still alive, there was a direct way of seeing what people used
> in published code, and I frequently searched to see what people were using
> in the python world and how they were using it (for example, that is how I
> was able to determine that str.swapcase() was nearly useless and propose
> that it be removed for Python 3).
>
> Since then, I rely on a number of other sources such as reading the python
> bug tracker every day for the last 15 years to know what people are
> requesting.  Likewise, as frequent contributor to stack overflow you can
> see what people are asking about.  As a person who does python code reviews
> and teaches python courses for a living, I get to see into the code base
> for many companies.  Also, I spend time reading the source for popular
> python tools to see what their requirements are. Another source is
> reviewing what is included in toolsets from Continuum and Enthought to see
> what their client's have asked for.  None of this is precise or
> sufficiently inclusive, but it does give me some basis for finding out what
> people care about.
>
>
> >>
> >> What I suggest is adding the recipe (like the one shown below) directly
> to the docs in this section:
> >>
> https://docs.python.org/2/library/collections.html#ordereddict-examples-and-recipes
> >> There we already have code for a LastUpdatedOrderedDict() and an
> OrderedCounter().
> >>
> >> In the past, I've used these recipes as a way to gauge interest.  When
> the interest was sufficient, it got promoted from a recipe in the docs to a
> full implementation living in a standard library module.  For example,
> itertools.combinations_with_replacement() and itertools.accumulate()
> started out as recipes in the docs.
> >>
> >> In general, there shouldn't be a rush to push more things in the
> standard library.  We get many suggestions in the form, "it might be nice
> to have XXX in the standard library", however, in many cases there has been
> so little uptake in real-world code that it is isn't worth the maintenance
> burden or for the burden on the working programmer for having too many
> choices in the standard library (I encounter very few people who actually
> know most of what is available in the standard library).
> >
> > I agree.
>
> Sounds good.  I think we can put an OrderedSet() recipe in the docs and
> another out on PyPI, and be done with it.
>
>
> >
> >
> > PyPI is a successful secondary source of code and has become part of the
> working programmer's life.  We're gradually making it easier and easier to
> pull that code on-demand.  IMO, that is where most of the collections
> variants should live. Otherwise, it would be easy to fill the collections
> module with dozens of rarely used dict/set variants, red-black trees,
> pairing heaps, tries, graph implementations, bitarrays, bitsets, persistent
> collections, etc.)
> >
> > my-two-cents-ly,
> >
> >
> >
> > Agreed.
> >
> > It would be nice to complete one of the recipes (oset or ordered-set) —
> at least for me.
>
> Completing a recipe is more reasonable than pushing for this to be in the
> standard library.
>
>
> Raymond
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150208/7dd6bf73/attachment.html>