[Python-ideas] Add orderedset as set(iterable, *, ordered=False) and similarly for frozenset.
Neil Girdhar
mistersheik at gmail.com
Sun Feb 8 17:41:16 CET 2015
Thanks, Raymond.
My only question is when you say:
Perhaps, I wasn't clear. The PyPy folks have offered to port their work
back into Python. They're offering a nice combination of faster
performance (especially for iteration), better memory utilization, and as a
by-product retaining insertion order. If their offer comes to fruition,
then this whole discussion is moot.
Do you mean it will be moot because :
a) OrderedSet will be declared as a synonym for "set"
b) The Python language specification will be changed so that set is always
ordered, or
c) something else?
Thanks
Neil
On Sat, Feb 7, 2015 at 10:21 PM, Raymond Hettinger <
raymond.hettinger at gmail.com> wrote:
>
> > * It isn't entirely clear what the set-to-set operations should do to
> maintain order, whether equality tests should require matching order,
> whether we care about loss of commutativity, and whether we're willing to
> forgo the usual optimizations on set-to-set operations (i.e. the
> intersection algorithm loops over the smaller of the two input sets and
> checks membership in the larger set -- that is fast, but it would lose
> order if the second set were the smaller of the two).
> >
> >> I agree it's not obvious, but I think that with a little debate we
> could come together on a reasonable choice.
>
> "In the face of ambiguity, refuse the temptation to guess." Some of the
> participants on python-ideas need to develop a much stronger aversion to
> opening a can of worms ;-)
>
>
> > You could even block the set-to-set operations for now. The main use
> of OrderedSet just requires add, update, and iteration as your recipe
> illustrates.
>
> Since you don't seem have any specific requirements for the order created
> by set-to-set operations, then just use the existing set operations on
> OrderedDict.keys():
>
> >>> a = OrderedDict.fromkeys(['raymond', 'rachel', 'matthew'])
> >>> b = OrderedDict.fromkeys(['matthew', 'calvin', 'rachel'])
> >>> a.keys() & b.keys()
> {'rachel', 'matthew'}
> >>> a.keys() | b.keys()
> {'rachel', 'calvin', 'matthew', 'raymond'}
> >>> a.keys() - b.keys()
> {'raymond'}
> >>> a.keys() ^ b.keys()
> {'calvin', 'raymond'}
> >>> list(a.keys())
> ['raymond', 'rachel', 'matthew']
>
>
> >
> > * The standard library should provide fast, clean fundamental building
> blocks and defer all of the myriad of variants and derivations to PyPI.
> We already provide OrderedDict and MutableSet so that rolling your own
> OrderedSet isn't hard (see the recipe below).
> >
> > I think that OrderedSet is a fundamental building block :)
>
> The rule for "fundamental" that I've used for itertools is whether the
> thing you want can easily be built out of what you've already got. For
> example, you can't make accumulate() out of the other itertools, but you
> can make tabulate(func) from map(func, count()). Likewise, you can easily
> make an OrderedSet from an OrderedDict and MutableSet, or if you're needs
> are less, just use the existing set operations on dict views as shown above.
>
>
> > * We're likely to soon have an OrderedDict written in C (I believe Eric
> Snow has been working on that for some time), so anything build on top of
> an OrderedDict will inherit all the speed benefits.
> >
> > Sure. It's also easy to have an OrderedSet written in C (especially
> once OrderedDict is done.)
>
> You're profoundly under-estimating how much work it is to build and
> maintain a separate C implementation. I've maintained the code for
> collections and sets for over a decade and can assure you that the
> maintenance burden has been enormous.
>
>
> > Also, there is some chance that PyPy folks will contribute their work
> making regular sets and dicts ordered by default; in which case, having a
> separate OrderedSet would have been a waste and the question would be moot.
> >
> > I think that using dicts and sets and assuming that they're ordered
> because you plan on using PyPy is a terrible error.
>
> Perhaps, I wasn't clear. The PyPy folks have offered to port their work
> back into Python. They're offering a nice combination of faster
> performance (especially for iteration), better memory utilization, and as a
> by-product retaining insertion order. If their offer comes to fruition,
> then this whole discussion is moot.
>
> >
> > To be honest, my main issue is my personal need.
>
> The usual solution for personal needs is to have a personal utils module
> (it sounds like you've been using my recipe for over a year). Decisions
> about what to put in the standard library have far reaching impact and we
> need to consider what is best for most of the users most of the time (and
> to consider our own maintenance burden).
>
> I thought ordered sets were interesting enough to write an implementation
> six years ago, but frankly there are many other other things that have a
> much better case for being added to collections (for example, some sort of
> graph structure, some kind of binary tree, some kind of trie, or perhaps a
> bloom filter).
>
>
> > If there was a complete PyPI recipe, I would be very happy.
>
> Then, that is what we should do.
>
>
> > I don't think I'm the only one, but maybe you're right and I am.
> >
> >
> > * One thing I've used to test the waters is to include a link in the
> documentation to an OrderedSet recipe (it's in the "See also" section at
> the very bottom of the page at
> https://docs.python.org/2/library/collections.html ). As far as I can
> tell, there has been nearly zero uptake based on this link. That gives us
> a good indication that there is very little real-world need for an
> OrderedSet.
> >
> > I've been using your recipe for at least a year (thank you). How are
> you measuring uptake?
>
>
> That is a very difficult question to answer (there is no precise tool for
> saying X people care about thing Y). The inputs I do have are imprecise
> and incomplete but never-the-less still informative. When google's code
> search was still alive, there was a direct way of seeing what people used
> in published code, and I frequently searched to see what people were using
> in the python world and how they were using it (for example, that is how I
> was able to determine that str.swapcase() was nearly useless and propose
> that it be removed for Python 3).
>
> Since then, I rely on a number of other sources such as reading the python
> bug tracker every day for the last 15 years to know what people are
> requesting. Likewise, as frequent contributor to stack overflow you can
> see what people are asking about. As a person who does python code reviews
> and teaches python courses for a living, I get to see into the code base
> for many companies. Also, I spend time reading the source for popular
> python tools to see what their requirements are. Another source is
> reviewing what is included in toolsets from Continuum and Enthought to see
> what their client's have asked for. None of this is precise or
> sufficiently inclusive, but it does give me some basis for finding out what
> people care about.
>
>
> >>
> >> What I suggest is adding the recipe (like the one shown below) directly
> to the docs in this section:
> >>
> https://docs.python.org/2/library/collections.html#ordereddict-examples-and-recipes
> >> There we already have code for a LastUpdatedOrderedDict() and an
> OrderedCounter().
> >>
> >> In the past, I've used these recipes as a way to gauge interest. When
> the interest was sufficient, it got promoted from a recipe in the docs to a
> full implementation living in a standard library module. For example,
> itertools.combinations_with_replacement() and itertools.accumulate()
> started out as recipes in the docs.
> >>
> >> In general, there shouldn't be a rush to push more things in the
> standard library. We get many suggestions in the form, "it might be nice
> to have XXX in the standard library", however, in many cases there has been
> so little uptake in real-world code that it is isn't worth the maintenance
> burden or for the burden on the working programmer for having too many
> choices in the standard library (I encounter very few people who actually
> know most of what is available in the standard library).
> >
> > I agree.
>
> Sounds good. I think we can put an OrderedSet() recipe in the docs and
> another out on PyPI, and be done with it.
>
>
> >
> >
> > PyPI is a successful secondary source of code and has become part of the
> working programmer's life. We're gradually making it easier and easier to
> pull that code on-demand. IMO, that is where most of the collections
> variants should live. Otherwise, it would be easy to fill the collections
> module with dozens of rarely used dict/set variants, red-black trees,
> pairing heaps, tries, graph implementations, bitarrays, bitsets, persistent
> collections, etc.)
> >
> > my-two-cents-ly,
> >
> >
> >
> > Agreed.
> >
> > It would be nice to complete one of the recipes (oset or ordered-set) —
> at least for me.
>
> Completing a recipe is more reasonable than pushing for this to be in the
> standard library.
>
>
> Raymond
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150208/7dd6bf73/attachment.html>
More information about the Python-ideas
mailing list