Adding a thin wrapper class around the functions in stdlib.heapq
Nothing so bombastic this time. The heapq functions are basically all named "heapsomething", and basically all take a "heap" for their first argument, with supplementary args coming after. It's a textbook example of the (hypothetical) Object Oriented Manifesto™ where defining a class increases type safety and programmers' conceptual clarity. There're practically no drawbacks, and the code to be added would be very simple. Updating the tests and docs would probably be harder. In pure Python, such a class would look like this: class Heap(list): def __init__(self, iterable=None): if iterable: super().__init__(iterable) else: super().__init__() self.heapify() push = heapq.heappush pop = heapq.heappop pushpop = heapq.heappushpop replace = heapq.heapreplace heapify = heapq.heapify # This could be a simple wrapper as well, but I had the following thoughts anyways, so here they are def nsmallest(self, n, key=None): # heapq.nsmallest makes a *max* heap of the first n elements, # while we know that self is already a min heap, so we can # make the max heap construction faster self[:n] = reversed(self[:n]) return heapq.nsmallest(n, self, key) # do we define nlargest on a minheap?? Wrapping around the C builtin functions (which aren't descriptors) would be a bit harder, but not much so: from functools import partial class Heap(list): def __init__(self, iterable=None): if iterable: super().__init__(iterable) else: super().__init__() self.heapify = partial(heapq.heapify, self) self.push = partial(heapq.heappush, self) ... self.heapify() Thoughts?
This has been brought up multiple times. Last time was on this thread https://mail.python.org/pipermail/python-ideas/2016-October/043024.html. On Tue, Nov 21, 2017 at 3:13 AM, bunslow <bunslow@gmail.com> wrote:
Nothing so bombastic this time. The heapq functions are basically all named "heapsomething", and basically all take a "heap" for their first argument, with supplementary args coming after. It's a textbook example of the (hypothetical) Object Oriented Manifesto™ where defining a class increases type safety and programmers' conceptual clarity. There're practically no drawbacks, and the code to be added would be very simple. Updating the tests and docs would probably be harder.
In pure Python, such a class would look like this:
class Heap(list):
def __init__(self, iterable=None): if iterable: super().__init__(iterable) else: super().__init__()
self.heapify()
push = heapq.heappush pop = heapq.heappop pushpop = heapq.heappushpop replace = heapq.heapreplace heapify = heapq.heapify
# This could be a simple wrapper as well, but I had the following thoughts anyways, so here they are def nsmallest(self, n, key=None): # heapq.nsmallest makes a *max* heap of the first n elements, # while we know that self is already a min heap, so we can # make the max heap construction faster self[:n] = reversed(self[:n]) return heapq.nsmallest(n, self, key)
# do we define nlargest on a minheap??
Wrapping around the C builtin functions (which aren't descriptors) would be a bit harder, but not much so:
from functools import partial
class Heap(list): def __init__(self, iterable=None): if iterable: super().__init__(iterable) else: super().__init__()
self.heapify = partial(heapq.heapify, self) self.push = partial(heapq.heappush, self) ...
self.heapify()
Thoughts?
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
-- Sebastian Kreft
Perhaps such repetition is a sign that *something* needs to be done... Thanks for the link though. I'm new enough to the community that it didn't even occur to me to search for prior discussions. On Mon, Nov 20, 2017 at 8:38 PM, Sebastian Kreft <skreft@gmail.com> wrote:
This has been brought up multiple times. Last time was on this thread https://mail.python.org/pipermail/python-ideas/2016-October/043024.html.
On Tue, Nov 21, 2017 at 3:13 AM, bunslow <bunslow@gmail.com> wrote:
Nothing so bombastic this time. The heapq functions are basically all named "heapsomething", and basically all take a "heap" for their first argument, with supplementary args coming after. It's a textbook example of the (hypothetical) Object Oriented Manifesto™ where defining a class increases type safety and programmers' conceptual clarity. There're practically no drawbacks, and the code to be added would be very simple. Updating the tests and docs would probably be harder.
In pure Python, such a class would look like this:
class Heap(list):
def __init__(self, iterable=None): if iterable: super().__init__(iterable) else: super().__init__()
self.heapify()
push = heapq.heappush pop = heapq.heappop pushpop = heapq.heappushpop replace = heapq.heapreplace heapify = heapq.heapify
# This could be a simple wrapper as well, but I had the following thoughts anyways, so here they are def nsmallest(self, n, key=None): # heapq.nsmallest makes a *max* heap of the first n elements, # while we know that self is already a min heap, so we can # make the max heap construction faster self[:n] = reversed(self[:n]) return heapq.nsmallest(n, self, key)
# do we define nlargest on a minheap??
Wrapping around the C builtin functions (which aren't descriptors) would be a bit harder, but not much so:
from functools import partial
class Heap(list): def __init__(self, iterable=None): if iterable: super().__init__(iterable) else: super().__init__()
self.heapify = partial(heapq.heapify, self) self.push = partial(heapq.heappush, self) ...
self.heapify()
Thoughts?
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
-- Sebastian Kreft
bunslow writes:
Perhaps such repetition is a sign that *something* needs to be done...
It's not. Most such repetition is due to new people who have not read the last 20 years of archives. :-) (In case the smiley isn't clear enough, parse that as "all the new people who aren't cray-cray".) That doesn't mean there's nothing to the idea, it just means that *mere repetition* of certain proposals is only a sign that "great[sic] minds think alike," while Python development is attracting new people at a high rate. Repetition of certain arguments *in the same thread*, on the other hand, is often a sign that a meeting of minds has not be reached, and that can be symptomatic of problems (though it's not necessarily an indication of the necessity of a proposal). Regards, Steve
Maybe, that suffices: https://pypi.python.org/pypi/xheap On 21.11.2017 03:46, bunslow wrote:
Perhaps such repetition is a sign that *something* needs to be done...
Thanks for the link though. I'm new enough to the community that it didn't even occur to me to search for prior discussions.
On Mon, Nov 20, 2017 at 8:38 PM, Sebastian Kreft <skreft@gmail.com <mailto:skreft@gmail.com>> wrote:
This has been brought up multiple times. Last time was on this thread https://mail.python.org/pipermail/python-ideas/2016-October/043024.html <https://mail.python.org/pipermail/python-ideas/2016-October/043024.html>.
On Tue, Nov 21, 2017 at 3:13 AM, bunslow <bunslow@gmail.com <mailto:bunslow@gmail.com>> wrote:
Nothing so bombastic this time. The heapq functions are basically all named "heapsomething", and basically all take a "heap" for their first argument, with supplementary args coming after. It's a textbook example of the (hypothetical) Object Oriented Manifesto™ where defining a class increases type safety and programmers' conceptual clarity. There're practically no drawbacks, and the code to be added would be very simple. Updating the tests and docs would probably be harder.
In pure Python, such a class would look like this:
class Heap(list):
def __init__(self, iterable=None): if iterable: super().__init__(iterable) else: super().__init__()
self.heapify()
push = heapq.heappush pop = heapq.heappop pushpop = heapq.heappushpop replace = heapq.heapreplace heapify = heapq.heapify
# This could be a simple wrapper as well, but I had the following thoughts anyways, so here they are def nsmallest(self, n, key=None): # heapq.nsmallest makes a *max* heap of the first n elements, # while we know that self is already a min heap, so we can # make the max heap construction faster self[:n] = reversed(self[:n]) return heapq.nsmallest(n, self, key)
# do we define nlargest on a minheap??
Wrapping around the C builtin functions (which aren't descriptors) would be a bit harder, but not much so:
from functools import partial
class Heap(list): def __init__(self, iterable=None): if iterable: super().__init__(iterable) else: super().__init__()
self.heapify = partial(heapq.heapify, self) self.push = partial(heapq.heappush, self) ...
self.heapify()
Thoughts?
_______________________________________________ Python-ideas mailing list Python-ideas@python.org <mailto:Python-ideas@python.org> https://mail.python.org/mailman/listinfo/python-ideas <https://mail.python.org/mailman/listinfo/python-ideas> Code of Conduct: http://python.org/psf/codeofconduct/ <http://python.org/psf/codeofconduct/>
-- Sebastian Kreft
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
On Tue, Nov 21, 2017 at 4:16 PM, Sven R. Kunze <srkunze@mail.de> wrote:
Maybe, that suffices: https://pypi.python.org/pypi/xheap
I still think the heapq.heap* functions are atrocious and they should immediately be packaged with *no additional features* into a stdlib object for reasons along the line of https://mail.python.org/pipermail/python-ideas/2017-October/047514.html If the improvements in xheap are converging on boringly-stable, maybe bring them in at a later date.
On Tue, Nov 21, 2017 at 04:56:27PM -0600, Nick Timkovich wrote:
On Tue, Nov 21, 2017 at 4:16 PM, Sven R. Kunze <srkunze@mail.de> wrote:
Maybe, that suffices: https://pypi.python.org/pypi/xheap
I still think the heapq.heap* functions are atrocious and they should immediately be packaged with *no additional features* into a stdlib object for reasons along the line of https://mail.python.org/pipermail/python-ideas/2017-October/047514.html
I think you pasted the wrong URL. That link is about pip, and the discoverability of third-party libraries. It says nothing about why functions are "atrocious" and why wrapping them into an object is better. But generally, Python's APIs are not "pure object oriented" in the Java sense, and we don't needlessly create objects just for the sake of ensuring everything is an object. Functions are fine too, and if the only difference between a function and method is the syntax you use to call it: function(value, arguments) versus value.function(arguments) then that's a difference that makes no difference, and there is no advantage to using an object wrapper. See also: http://steve-yegge.blogspot.com.au/2006/03/execution-in-kingdom-of-nouns.htm... for a perspective on how the emphasis on objects is harmful. What advantage is there to making the heap functions into Heap methods? -- Steve
On Wed, Nov 22, 2017 at 11:45 AM, Steven D'Aprano <steve@pearwood.info> wrote:
On Tue, Nov 21, 2017 at 04:56:27PM -0600, Nick Timkovich wrote:
On Tue, Nov 21, 2017 at 4:16 PM, Sven R. Kunze <srkunze@mail.de> wrote:
Maybe, that suffices: https://pypi.python.org/pypi/xheap
I still think the heapq.heap* functions are atrocious and they should immediately be packaged with *no additional features* into a stdlib object for reasons along the line of https://mail.python.org/pipermail/python-ideas/2017-October/047514.html
I think you pasted the wrong URL. That link is about pip, and the discoverability of third-party libraries. It says nothing about why functions are "atrocious" and why wrapping them into an object is better.
But generally, Python's APIs are not "pure object oriented" in the Java sense, and we don't needlessly create objects just for the sake of ensuring everything is an object. Functions are fine too, and if the only difference between a function and method is the syntax you use to call it:
function(value, arguments)
versus
value.function(arguments)
then that's a difference that makes no difference, and there is no advantage to using an object wrapper.
See also:
http://steve-yegge.blogspot.com.au/2006/03/execution-in-kingdom-of-nouns.htm...
for a perspective on how the emphasis on objects is harmful.
What advantage is there to making the heap functions into Heap methods?
A heap is an actual thing. We don't have dict methods implemented as functions operating on a "blob of memory" object; we have dict as a type. So the most simple and obvious way to work with a heap would be something like: from heaps import Heap h = Heap() h.push(blah) h.pop() So the question is more: why, with Python being the way it is, do the heap functions operate on a list? I think heapq.heapify is the answer: in linear time, it heapifies a list *in place*. I don't think there's any reason to have *both* interfaces to the heap functionality, but it certainly isn't illogical to try to treat a heap as a thing, and therefore have a Heap type. ChrisA
Chris Angelico wrote:
So the question is more: why, with Python being the way it is, do the heap functions operate on a list? I think heapq.heapify is the answer: in linear time, it heapifies a list *in place*.
There's no reason a Heap object couldn't accomodate that case. E.g. the constructor could optionally accept an existing list to heapify and wrap. -- Greg
On Wed, Nov 22, 2017 at 3:55 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Chris Angelico wrote:
So the question is more: why, with Python being the way it is, do the heap functions operate on a list? I think heapq.heapify is the answer: in linear time, it heapifies a list *in place*.
There's no reason a Heap object couldn't accomodate that case. E.g. the constructor could optionally accept an existing list to heapify and wrap.
Yeah but if it's wrapping an existing list, it's not really constructing a new object. Every other constructor in Python will make something independent of its origin (if you call list() on a list, you get back a brand new copy, for instance), but to take advantage of in-place heapification, it would have to mutate the input. I think that would be even more surprising. ChrisA
On Tue, Nov 21, 2017 at 6:45 PM, Steven D'Aprano <steve@pearwood.info> wrote:
On Tue, Nov 21, 2017 at 04:56:27PM -0600, Nick Timkovich wrote:
On Tue, Nov 21, 2017 at 4:16 PM, Sven R. Kunze <srkunze@mail.de> wrote:
Maybe, that suffices: https://pypi.python.org/pypi/xheap
I still think the heapq.heap* functions are atrocious and they should immediately be packaged with *no additional features* into a stdlib object for reasons along the line of https://mail.python.org/pipermail/python-ideas/2017-October/047514.html
I think you pasted the wrong URL. That link is about pip, and the discoverability of third-party libraries.
Probably straining the reference, but akin to "don't teach JSON with Python, teach XML-RPC because it supports that better in the stdlib", why not "don't teach heaps, just use lists with pop and insert+sort".
But generally, Python's APIs are not "pure object oriented" in the Java sense, and we don't needlessly create objects just for the sake of ensuring everything is an object. Functions are fine too...
Functions are great. I'm a big fan of functions. However, 1. Once there are several that all have the same thing as an argument: thing_operation1(thing, arg), thing_operation2(thing, arg)...it's about time to bind them together. 2. And especially for the heap "soft-datatype": once it's heapified, naively modifying it with other methods will ruin the heap invariant. **The actual list you pass around can't be treated as a list.**
On 22.11.2017 07:22, Nick Timkovich wrote:
Functions are great. I'm a big fan of functions. However,
1. Once there are several that all have the same thing as an argument: thing_operation1(thing, arg), thing_operation2(thing, arg)...it's about time to bind them together. 2. And especially for the heap "soft-datatype": once it's heapified, naively modifying it with other methods will ruin the heap invariant. **The actual list you pass around can't be treated as a list.**
Two important aspects to which I want to add some thoughts. The reason why I created xheap was because I actually needed a more complex datastructure which required more bookkeeping: 1) custom orders on the same data (list) 2) fast removal from within the heap The plain "Heap" implementation (object wrapper of heapq) was just for the sake of completion. Cheers, Sven PS: I tend to think that implementing xheap was easier BECAUSE the stdlib provided a function-only heap implementation. Nonetheless, both ways do have their merits.
On Wed, 22 Nov 2017 17:11:53 +1100 Chris Angelico <rosuav@gmail.com> wrote:
On Wed, Nov 22, 2017 at 3:55 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Chris Angelico wrote:
So the question is more: why, with Python being the way it is, do the heap functions operate on a list? I think heapq.heapify is the answer: in linear time, it heapifies a list *in place*.
There's no reason a Heap object couldn't accomodate that case. E.g. the constructor could optionally accept an existing list to heapify and wrap.
Yeah but if it's wrapping an existing list, it's not really constructing a new object. Every other constructor in Python will make something independent of its origin (if you call list() on a list, you get back a brand new copy, for instance), but to take advantage of in-place heapification, it would have to mutate the input. I think that would be even more surprising.
In any case, copying a list a pretty cheap, so in-place heapification isn't a very important optimization. Regards Antoine.
On Wed, 22 Nov 2017 00:22:00 -0600 Nick Timkovich <prometheus235@gmail.com> wrote:
Functions are great. I'm a big fan of functions. However,
1. Once there are several that all have the same thing as an argument: thing_operation1(thing, arg), thing_operation2(thing, arg)...it's about time to bind them together. 2. And especially for the heap "soft-datatype": once it's heapified, naively modifying it with other methods will ruin the heap invariant. **The actual list you pass around can't be treated as a list.**
A third reason: documentation and discoverability. If I type help(some_heapified_list) at the prompt, I get the usual documentation for list methods, not the documentation of heapq functions... Regards Antoine.
I thought the purpose of heapq was to have a ready-made example for instructors on how an API can be improved by applying object-oriented techniques. ;-) I think adding a HeapQueue class would be a great idea. Obviously the existing functions would need to continue existing for backward compatibility. Stephan 2017-11-22 12:09 GMT+01:00 Antoine Pitrou <solipsis@pitrou.net>:
On Wed, 22 Nov 2017 00:22:00 -0600 Nick Timkovich <prometheus235@gmail.com> wrote:
Functions are great. I'm a big fan of functions. However,
1. Once there are several that all have the same thing as an argument: thing_operation1(thing, arg), thing_operation2(thing, arg)...it's about time to bind them together. 2. And especially for the heap "soft-datatype": once it's heapified, naively modifying it with other methods will ruin the heap invariant.
**The
actual list you pass around can't be treated as a list.**
A third reason: documentation and discoverability. If I type help(some_heapified_list) at the prompt, I get the usual documentation for list methods, not the documentation of heapq functions...
Regards
Antoine.
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Chris Angelico wrote:
Yeah but if it's wrapping an existing list, it's not really constructing a new object.
That depends on what you consider the object to be. There are existing examples of objects that wrap other objects and mutate them, e.g. TextIOWrapper. If it would make anyone happier, there could be two classes, Heap and HeapWrapper (with one probably being implemented as a subclass of the other). -- Greg
On 22 November 2017 at 11:00, Chris Angelico <rosuav@gmail.com> wrote:
So the question is more: why, with Python being the way it is, do the heap functions operate on a list? I think heapq.heapify is the answer: in linear time, it heapifies a list *in place*.
I don't think there's any reason to have *both* interfaces to the heap functionality, but it certainly isn't illogical to try to treat a heap as a thing, and therefore have a Heap type.
Right, the parallel here is that we have a "heapq" module for the same reason that we have list.sort(), sorted(), and the bisect module, rather than a single "SortedList" type. https://code.activestate.com/recipes/577197-sortedcollection/ then provides an example of how to combine those into a "SortedCollection" type. That said, I'm still in favour of adding an object-oriented wrapper to either `collections` or the `heapq` module for all the classic OO reasons: - it makes it easier to reliably maintain the heap invariant (just drop your list reference after passing it to the heap wrapper) - it gives both human readers and static code analysers more information to work with ("this is a heap" rather than "this is a list") - it provides a hook for improved interactive help on heap instances I don't have any great concerns about potential confusion - the OO wrapper will be easy enough to use that anyone that's unsure will likely gravitate towards that, while the lower level `heapq` functions will remain available for folks writing their own heap implementations. This effect would likely be even more pronounced if the OO wrapper were made available as `collections.Heap` (`collections` already imports the `heapq` module for use in the Counter implementation). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
Honestly, I don't see the value in a thin object-oriented wrapper around heapq functions. I'm a big -1 on the idea. I'm the author of sortedcontainers ( https://pypi.python.org/pypi/sortedcontainers/) so I interact with a lot of people using sorted collections types. My observations show folk's needs tend to fit a bimodal distribution. At one end are those who get by with list.sort, bisect, or heapq and they seem to appreciate the simple function-based approach those modules provide. At the other end are those who want a SortedList data type and we have some good options on PyPI and some good building-blocks in the standard library. Personally, I think "sorted", "bisect" and "heapq" in the standard library are brilliant examples of the Python-way or "zen." I've learned a lot by studying their code and I encourage others to do the same. Just because something can be object-oriented doesn't mean it should be. There's a lot to be said for simplicity. I also think Nick's arguments are valid but I don't find them convincing. What I think would be sufficient is a "See also:" blurb like that under https://docs.python.org/3/library/bisect.html#bisect.insort which also references SortedContainers at http://www.grantjenks.com/docs/sortedcontainers/ and the same blurb on heapq. I think that would be a reasonable next-step before we include any new data type in the standard library. Grant On Wed, Nov 22, 2017 at 8:05 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 22 November 2017 at 11:00, Chris Angelico <rosuav@gmail.com> wrote:
So the question is more: why, with Python being the way it is, do the heap functions operate on a list? I think heapq.heapify is the answer: in linear time, it heapifies a list *in place*.
I don't think there's any reason to have *both* interfaces to the heap functionality, but it certainly isn't illogical to try to treat a heap as a thing, and therefore have a Heap type.
Right, the parallel here is that we have a "heapq" module for the same reason that we have list.sort(), sorted(), and the bisect module, rather than a single "SortedList" type. https://code.activestate.com/ recipes/577197-sortedcollection/ then provides an example of how to combine those into a "SortedCollection" type.
That said, I'm still in favour of adding an object-oriented wrapper to either `collections` or the `heapq` module for all the classic OO reasons:
- it makes it easier to reliably maintain the heap invariant (just drop your list reference after passing it to the heap wrapper) - it gives both human readers and static code analysers more information to work with ("this is a heap" rather than "this is a list") - it provides a hook for improved interactive help on heap instances
I don't have any great concerns about potential confusion - the OO wrapper will be easy enough to use that anyone that's unsure will likely gravitate towards that, while the lower level `heapq` functions will remain available for folks writing their own heap implementations.
This effect would likely be even more pronounced if the OO wrapper were made available as `collections.Heap` (`collections` already imports the `heapq` module for use in the Counter implementation).
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Something *should* be object oriented with the functions in question all operate on the same data type, and in particular, those functions/verbs *are only well defined for that type*. heapq.heappush(list-not-heap, item) is perfectly valid code in the current interface, but doesn't make any sense at all. And list.sort is *not* function based, it's an object oriented method. sorted() provides the same functionality for other types, given that it's a well defined function for a wide variety of sequences (unlike heappush). (list.sort is a method because unlike sorted(), it operates inplace, and is thus only meaningful for mutable, "complete" (all in memory, not "lazy") sequences -- i.e., a list.) I've never used bisect, so I'll refrain from commenting on it. At the end of the day, the patch proposed is merely a wrapper around the functional approach; you are welcome to continue using it as you like, it's not going anywhere. I would propose that the docs put the OOP version first though. On Wed, Nov 22, 2017 at 11:11 PM, Grant Jenks <grant.jenks@gmail.com> wrote:
Honestly, I don't see the value in a thin object-oriented wrapper around heapq functions. I'm a big -1 on the idea.
I'm the author of sortedcontainers (https://pypi.python.org/pypi/ sortedcontainers/) so I interact with a lot of people using sorted collections types. My observations show folk's needs tend to fit a bimodal distribution. At one end are those who get by with list.sort, bisect, or heapq and they seem to appreciate the simple function-based approach those modules provide. At the other end are those who want a SortedList data type and we have some good options on PyPI and some good building-blocks in the standard library.
Personally, I think "sorted", "bisect" and "heapq" in the standard library are brilliant examples of the Python-way or "zen." I've learned a lot by studying their code and I encourage others to do the same. Just because something can be object-oriented doesn't mean it should be. There's a lot to be said for simplicity. I also think Nick's arguments are valid but I don't find them convincing.
What I think would be sufficient is a "See also:" blurb like that under https://docs.python.org/3/library/bisect.html#bisect.insort which also references SortedContainers at http://www.grantjenks.com/ docs/sortedcontainers/ and the same blurb on heapq. I think that would be a reasonable next-step before we include any new data type in the standard library.
Grant
On Wed, Nov 22, 2017 at 8:05 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 22 November 2017 at 11:00, Chris Angelico <rosuav@gmail.com> wrote:
So the question is more: why, with Python being the way it is, do the heap functions operate on a list? I think heapq.heapify is the answer: in linear time, it heapifies a list *in place*.
I don't think there's any reason to have *both* interfaces to the heap functionality, but it certainly isn't illogical to try to treat a heap as a thing, and therefore have a Heap type.
Right, the parallel here is that we have a "heapq" module for the same reason that we have list.sort(), sorted(), and the bisect module, rather than a single "SortedList" type. https://code.activestate.com/r ecipes/577197-sortedcollection/ then provides an example of how to combine those into a "SortedCollection" type.
That said, I'm still in favour of adding an object-oriented wrapper to either `collections` or the `heapq` module for all the classic OO reasons:
- it makes it easier to reliably maintain the heap invariant (just drop your list reference after passing it to the heap wrapper) - it gives both human readers and static code analysers more information to work with ("this is a heap" rather than "this is a list") - it provides a hook for improved interactive help on heap instances
I don't have any great concerns about potential confusion - the OO wrapper will be easy enough to use that anyone that's unsure will likely gravitate towards that, while the lower level `heapq` functions will remain available for folks writing their own heap implementations.
This effect would likely be even more pronounced if the OO wrapper were made available as `collections.Heap` (`collections` already imports the `heapq` module for use in the Counter implementation).
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
On 23 November 2017 at 15:22, bunslow <bunslow@gmail.com> wrote:
Something *should* be object oriented with the functions in question all operate on the same data type, and in particular, those functions/verbs *are only well defined for that type*. heapq.heappush(list-not-heap, item) is perfectly valid code in the current interface, but doesn't make any sense at all. And list.sort is *not* function based, it's an object oriented method. sorted() provides the same functionality for other types, given that it's a well defined function for a wide variety of sequences (unlike heappush). (list.sort is a method because unlike sorted(), it operates inplace, and is thus only meaningful for mutable, "complete" (all in memory, not "lazy") sequences -- i.e., a list.)
I've never used bisect, so I'll refrain from commenting on it.
At the end of the day, the patch proposed is merely a wrapper around the functional approach; you are welcome to continue using it as you like, it's not going anywhere. I would propose that the docs put the OOP version first though.
Ensuring the docs are clearly separated is part of why I think "collections.Heap" would make more sense as a proposal than "heapq.Heap" - collections already imports heapq for use in the Counter implementation, and adding another primitive container type to collections is a smaller conceptual change than updating heapq to being more than just a heap algorithm library. I'd also note that a collections.Heap class that accepts the underlying list to use as its sole parameter would compose nicely with the list builtin to allow creation of a heap from an arbitrary iterable: heap = collections.Heap(list(iterable)) rather than the current: heap = list(iterable) heapq.heapify(heap) That's akin to the difference between: data = sorted(iterable) and: data = list(iterable) data.sort() I don't think it would be a disaster if we continued to leave this out, but I don't think it would be a major maintenance burden or future barrier to learning to add it, either. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Wed, Nov 22, 2017 at 9:22 PM, bunslow <bunslow@gmail.com> wrote:
Something *should* be object oriented with the functions in question all operate on the same data type, and in particular, those functions/verbs *are only well defined for that type*.
But here you are missing something that I think is important. These functions are valid on *any* data type that has the methods of a mutable sequence and has had heapq.heapify called on it. This is part of the brilliance of heapq. Suppose we wanted to create a list-like data-type backed by a filesystem or database. I have done exactly such a thing like: from collections import MutableSequence class MyFileSystemList(MutableSequence): ... data = MyFileSystemList('dir-path') heapq.heapify(data) heapq.heappush(data, 'text') And it will work! The heap algorithm is exposed through a high-level functional interface so that you can take advantage of duck-typing. This is an important Python feature. heapq.heappush(list-not-heap, item) is perfectly valid code in the current
interface, but doesn't make any sense at all. And list.sort is *not* function based, it's an object oriented method. sorted() provides the same functionality for other types, given that it's a well defined function for a wide variety of sequences (unlike heappush). (list.sort is a method because unlike sorted(), it operates inplace, and is thus only meaningful for mutable, "complete" (all in memory, not "lazy") sequences -- i.e., a list.)
Personally, I find it irritating that list.sort will not similarly work until I inherit from list. I would prefer if it worked on mutable sequences. But regardless, if you inherit from "list", define all your own methods to store data in a file system, database, whatever then "list.sort" will work just fine as if it were a function: class MyFileSystemList(list): ... # Define methods to read/write from file system. data = MyFileSystemList() list.sort(data) So it's not the case that list.sort is a method and must only be used as such. Rather, list.sort captures the algorithm known as sorting and works on any mutable sequence (with the unfortunate additional requirement that it inherits from list). I've never used bisect, so I'll refrain from commenting on it.
At the end of the day, the patch proposed is merely a wrapper around the functional approach; you are welcome to continue using it as you like, it's not going anywhere. I would propose that the docs put the OOP version first though.
On Wed, Nov 22, 2017 at 11:11 PM, Grant Jenks <grant.jenks@gmail.com> wrote:
Honestly, I don't see the value in a thin object-oriented wrapper around heapq functions. I'm a big -1 on the idea.
I'm the author of sortedcontainers (https://pypi.python.org/pypi/ sortedcontainers/) so I interact with a lot of people using sorted collections types. My observations show folk's needs tend to fit a bimodal distribution. At one end are those who get by with list.sort, bisect, or heapq and they seem to appreciate the simple function-based approach those modules provide. At the other end are those who want a SortedList data type and we have some good options on PyPI and some good building-blocks in the standard library.
Personally, I think "sorted", "bisect" and "heapq" in the standard library are brilliant examples of the Python-way or "zen." I've learned a lot by studying their code and I encourage others to do the same. Just because something can be object-oriented doesn't mean it should be. There's a lot to be said for simplicity. I also think Nick's arguments are valid but I don't find them convincing.
What I think would be sufficient is a "See also:" blurb like that under https://docs.python.org/3/library/bisect.html#bisect.insort which also references SortedContainers at http://www.grantjenks.com/d ocs/sortedcontainers/ and the same blurb on heapq. I think that would be a reasonable next-step before we include any new data type in the standard library.
Grant
Grant Jenks wrote:
The heap algorithm is exposed through a high-level functional interface so that you can take advantage of duck-typing.
This would also be true of a HeapWrapper class that wrapped an existing sequence. There's a difference between the heap functions and sorted(). The latter is just a single function that does its job and returns a result. But the heap interface is a collection of functions that have to be used together in the right way to maintain the heap invariants. That suggests some encapsulation is in order. -- Greg
On 23 November 2017 at 17:13, Grant Jenks <grant.jenks@gmail.com> wrote:
And it will work! The heap algorithm is exposed through a high-level functional interface so that you can take advantage of duck-typing. This is an important Python feature.
Nobody is proposing taking the functional interface away (just as nobody is proposing to take the bisect module away). Instead, the argument is: - the heapq functions need to be combined with concrete storage to be useful - a simple wrapper class that combines them with a given list (or other mutable sequence) is short and easy to maintain - such a wrapper class is also useful in its own right for basic heap use cases - so let's provide one in the collections module The potential arguments against that are: - this means yet-another-data-type in the collections module, so it will make the collections module harder to learn - people will be confused as to whether they should use collections.Heap or the heapq functions The risks of both of those outcomes seem low, since the main current source of confusion appears to be folks looking for a concrete "Heap" container type, and being surprised when they get told they either need to be build a DIY one from a list + the heapq functions, or else install a 3rd party dependency to get a preassembled one. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Tue, 21 Nov 2017 at 14:57 Nick Timkovich <prometheus235@gmail.com> wrote:
On Tue, Nov 21, 2017 at 4:16 PM, Sven R. Kunze <srkunze@mail.de> wrote:
Maybe, that suffices: https://pypi.python.org/pypi/xheap
I still think the heapq.heap* functions are atrocious and they should immediately be packaged with *no additional features* into a stdlib object for reasons along the line of https://mail.python.org/pipermail/python-ideas/2017-October/047514.html
I wanted to address the general tone people should aim for on this list so that people don't inadvertently insult people. Now I'm going to preface this as I'm not explicitly calling out Nick here on this, I'm just using his email as an illustrative example as it's what happened to trigger me to write this email. So Nick said above that the design of the heapq module is "atrocious", to the extent that it should be "immediately" fixed with an OO wrapper. So obviously Nick doesn't like the design of the heapq module. ;) And that's okay! And he's totally within his rights to express the feeling that the heapq module as it stands doesn't meet his needs. But calling it "atrocious" and so bad that it needs to be fixed "immediately" as if it's a blight upon the stdlib is unnecessarily insulting to those that have worked on the module. To convey the feeling that you think an OO wrapper would be helpful as the current design doesn't work for you, you could just phrase it as I just did to get the same point across without insulting anyone. Basically if you wouldn't like your own work called "atrocious" by someone you respect, then it's probably best to not use that phrasing when talking about a stranger's code either. If you want more context as to why this all matters in order to keep open source sustainable, you can watch my 15 minute keynote from JupyterCon this year for a more in-depth, verbal explanation: https://www.youtube.com/watch?v=y19s6vPpGXA . And BTW, the heapq module is 15 years old, has a history, and there are explicit reasons it's designed the way it is, so from that perspective I would argue it has a good design.
+1 Basically, if you're posting here, you probably want something to change. And that means that somebody has to make that change. And someone has to approve it. Etc. And you are trying to influence those people to make/approve/etc. that change, since for them it's less work if nothing change. And how do you influence people? Not by opening with an insult. (Also if you think that none of this matters because we're all rational actors, grow up. :-) --Guido On Mon, Nov 27, 2017 at 12:17 PM, Brett Cannon <brett@python.org> wrote:
On Tue, 21 Nov 2017 at 14:57 Nick Timkovich <prometheus235@gmail.com> wrote:
On Tue, Nov 21, 2017 at 4:16 PM, Sven R. Kunze <srkunze@mail.de> wrote:
Maybe, that suffices: https://pypi.python.org/pypi/xheap
I still think the heapq.heap* functions are atrocious and they should immediately be packaged with *no additional features* into a stdlib object for reasons along the line of https://mail.python.org/ pipermail/python-ideas/2017-October/047514.html
I wanted to address the general tone people should aim for on this list so that people don't inadvertently insult people. Now I'm going to preface this as I'm not explicitly calling out Nick here on this, I'm just using his email as an illustrative example as it's what happened to trigger me to write this email.
So Nick said above that the design of the heapq module is "atrocious", to the extent that it should be "immediately" fixed with an OO wrapper. So obviously Nick doesn't like the design of the heapq module. ;) And that's okay! And he's totally within his rights to express the feeling that the heapq module as it stands doesn't meet his needs.
But calling it "atrocious" and so bad that it needs to be fixed "immediately" as if it's a blight upon the stdlib is unnecessarily insulting to those that have worked on the module. To convey the feeling that you think an OO wrapper would be helpful as the current design doesn't work for you, you could just phrase it as I just did to get the same point across without insulting anyone. Basically if you wouldn't like your own work called "atrocious" by someone you respect, then it's probably best to not use that phrasing when talking about a stranger's code either.
If you want more context as to why this all matters in order to keep open source sustainable, you can watch my 15 minute keynote from JupyterCon this year for a more in-depth, verbal explanation: https://www.youtube.com/watch?v=y19s6vPpGXA .
And BTW, the heapq module is 15 years old, has a history, and there are explicit reasons it's designed the way it is, so from that perspective I would argue it has a good design.
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
-- --Guido van Rossum (python.org/~guido)
On Mon, Nov 27, 2017 at 8:17 PM, Brett Cannon <brett@python.org> wrote:
But calling it "atrocious" and so bad that it needs to be fixed "immediately" as if it's a blight upon the stdlib is unnecessarily insulting to those that have worked on the module. To convey the feeling that you think an OO wrapper would be helpful as the current design doesn't work for you, you could just phrase it as I just did to get the same point across without insulting anyone. Basically if you wouldn't like your own work called "atrocious" by someone you respect, then it's probably best to not use that phrasing when talking about a stranger's code either.
Sorry for the curt tone, I did lose some sight on the code being designed by people rather than a faceless organization. My intention wasn't to disparage the original authors but sprung more out of my frustration and perception from that thread and those before that the status quo would not change and that if a contribution was proffered, would simply be dismissed or ignored. To motivate any change, there must be some argument levied against the status quo, but hopefully I can articulate it better. That little corner is something I'm interested in, and not having contributed to CPython before, I'm unsure how it "really works". The steps at https://devguide.python.org/stdlibchanges/ suggest trying to elicit community feedback from the lists as a step, so negative feedback tends to kill the enthusiasm to actually make the PR. In the absence of code, concrete arguments are almost impossible as we're discussing the shape of clouds. Nick
On 27 November 2017 at 21:59, Nick Timkovich <prometheus235@gmail.com> wrote:
On Mon, Nov 27, 2017 at 8:17 PM, Brett Cannon <brett@python.org> wrote:
But calling it "atrocious" and so bad that it needs to be fixed "immediately" as if it's a blight upon the stdlib is unnecessarily insulting to those that have worked on the module. To convey the feeling that you think an OO wrapper would be helpful as the current design doesn't work for you, you could just phrase it as I just did to get the same point across without insulting anyone. Basically if you wouldn't like your own work called "atrocious" by someone you respect, then it's probably best to not use that phrasing when talking about a stranger's code either.
Sorry for the curt tone, I did lose some sight on the code being designed by people rather than a faceless organization. My intention wasn't to disparage the original authors but sprung more out of my frustration and perception from that thread and those before that the status quo would not change and that if a contribution was proffered, would simply be dismissed or ignored. To motivate any change, there must be some argument levied against the status quo, but hopefully I can articulate it better.
That little corner is something I'm interested in, and not having contributed to CPython before, I'm unsure how it "really works". The steps at https://devguide.python.org/stdlibchanges/ suggest trying to elicit community feedback from the lists as a step, so negative feedback tends to kill the enthusiasm to actually make the PR. In the absence of code, concrete arguments are almost impossible as we're discussing the shape of clouds.
In my experience (and this reiterates Brett's point) the proposals that get the best responses are those that are presented positively - instead of focusing on the (perceived) problems with the current situation, describe the benefits that will come from the proposed change. If you can't do that, then it's unlikely there is enough justification for a change. Certainly, negative feedback can be demotivating, and when you have a great idea and all you hear is "but what if...?" it's hard to remain positive. But you're not going to get better feedback if you criticise - at best, people will stop listening, and you'll have avoided some of the arguments, but at the cost of no-one being willing to support your proposal and so it dies. Your perception isn't wrong, by the way. It *is* hard to persuade people that the status quo needs to change. But that's not because there's no interest in change. Rather, it's because there's a strong sense among both the core developers and the frequent contributors on this list, of the significant impact any change will have - it's hard to conceive of just how many people will be affected by even the smallest change we make, and that's a big responsibility. So while it's often hard, focusing on the positives (and being willing to accept that the status quo is sufficient for many people) really is the only way to gain support. Paul
On Mon, Nov 27, 2017 at 4:36 PM, Paul Moore <p.f.moore@gmail.com> wrote:
On 27 November 2017 at 21:59, Nick Timkovich <prometheus235@gmail.com> wrote:
On Mon, Nov 27, 2017 at 8:17 PM, Brett Cannon <brett@python.org> wrote:
But calling it "atrocious" and so bad that it needs to be fixed "immediately" as if it's a blight upon the stdlib is unnecessarily
insulting
to those that have worked on the module. To convey the feeling that you think an OO wrapper would be helpful as the current design doesn't work for you, you could just phrase it as I just did to get the same point across without insulting anyone. Basically if you wouldn't like your own work called "atrocious" by someone you respect, then it's probably best to not use that phrasing when talking about a stranger's code either.
Sorry for the curt tone, I did lose some sight on the code being designed by people rather than a faceless organization. My intention wasn't to disparage the original authors but sprung more out of my frustration and perception from that thread and those before that the status quo would not change and that if a contribution was proffered, would simply be dismissed or ignored. To motivate any change, there must be some argument levied against the status quo, but hopefully I can articulate it better.
That little corner is something I'm interested in, and not having contributed to CPython before, I'm unsure how it "really works". The steps at https://devguide.python.org/stdlibchanges/ suggest trying to elicit community feedback from the lists as a step, so negative feedback tends to kill the enthusiasm to actually make the PR. In the absence of code, concrete arguments are almost impossible as we're discussing the shape of clouds.
In my experience (and this reiterates Brett's point) the proposals that get the best responses are those that are presented positively - instead of focusing on the (perceived) problems with the current situation, describe the benefits that will come from the proposed change. If you can't do that, then it's unlikely there is enough justification for a change. Certainly, negative feedback can be demotivating, and when you have a great idea and all you hear is "but what if...?" it's hard to remain positive. But you're not going to get better feedback if you criticise - at best, people will stop listening, and you'll have avoided some of the arguments, but at the cost of no-one being willing to support your proposal and so it dies.
My first submission to this list was predicated on what I'd read in PEPs -- and many of those, since they recommend major-enough changes to require a PEP, have sections (often lengthy) dedicated to "what's wrong with the status quo". My attempt to imitate that obviously crossed some boundaries in retrospect, and of course now that it's brought up here I see that spinning it as "what can be done to make it better" is psychologically much more effective than "why the current way sucks" (because semantically these are either approximately or exactly the same). But that's where it came from, at least with some of my earlier threads, and I suspect the author of the topic message of the OP will have a similar sentiment. (One major example I can point to is PEP 465 -- because it proposed such a major change to the language, literally half its text amounts to "what's wrong with the status quo", quantifiably and repeatedly. It was also a highly persuasive PEP due in no small part to its "why current things suck" section.)
On Mon, Nov 27, 2017 at 7:22 PM, bunslow <bunslow@gmail.com> wrote:
My first submission to this list was predicated on what I'd read in PEPs -- and many of those, since they recommend major-enough changes to require a PEP, have sections (often lengthy) dedicated to "what's wrong with the status quo". My attempt to imitate that obviously crossed some boundaries in retrospect, and of course now that it's brought up here I see that spinning it as "what can be done to make it better" is psychologically much more effective than "why the current way sucks" (because semantically these are either approximately or exactly the same). But that's where it came from, at least with some of my earlier threads, and I suspect the author of the topic message of the OP will have a similar sentiment.
To quote Brett's original email:
So obviously Nick doesn't like the design of the heapq module. ;) And that's okay! And he's totally within his rights to express the feeling that the heapq module as it stands doesn't meet his needs. But calling it "atrocious" and so bad that it needs to be fixed "immediately" as if it's a blight upon the stdlib is unnecessarily insulting to those that have worked on the module.
You can and should talk about problems with the status quo! But it's totally possible to do this without insulting anyone. Brett's talking about tone, not content.
(One major example I can point to is PEP 465 -- because it proposed such a major change to the language, literally half its text amounts to "what's wrong with the status quo", quantifiably and repeatedly. It was also a highly persuasive PEP due in no small part to its "why current things suck" section.)
Maybe, but you won't find the word "suck" anywhere in that section :-). And of course, the nice thing about PEP 465 is that it's complaining about a missing feature, which sort of by definition means that it's not complaining about anyone in particular's work. Nonetheless, an earlier draft of PEP 465 did inadvertently talk about an old PEP in an overly-flippant manner, and I ended up apologizing to the author and fixing it. (Which of course also made the PEP stronger.) It's cool, no-one's perfect. If you think you've made a mistake, then apologize and try to do better, that's all. -n -- Nathaniel J. Smith -- https://vorpus.org
I certainly didn't take away the right lesson! And lesson well learned, hopefully. On Tue, Nov 28, 2017 at 12:55 AM, Nathaniel Smith <njs@pobox.com> wrote:
On Mon, Nov 27, 2017 at 7:22 PM, bunslow <bunslow@gmail.com> wrote:
My first submission to this list was predicated on what I'd read in PEPs -- and many of those, since they recommend major-enough changes to require a PEP, have sections (often lengthy) dedicated to "what's wrong with the status quo". My attempt to imitate that obviously crossed some boundaries in retrospect, and of course now that it's brought up here I see that spinning it as "what can be done to make it better" is psychologically much more effective than "why the current way sucks" (because semantically these are either approximately or exactly the same). But that's where it came from, at least with some of my earlier threads, and I suspect the author of the topic message of the OP will have a similar sentiment.
So obviously Nick doesn't like the design of the heapq module. ;) And
To quote Brett's original email: that's okay! And he's totally within his rights to express the feeling that the heapq module as it stands doesn't meet his needs.
But calling it "atrocious" and so bad that it needs to be fixed "immediately" as if it's a blight upon the stdlib is unnecessarily insulting to those that have worked on the module.
You can and should talk about problems with the status quo! But it's totally possible to do this without insulting anyone. Brett's talking about tone, not content.
(One major example I can point to is PEP 465 -- because it proposed such a major change to the language, literally half its text amounts to "what's wrong with the status quo", quantifiably and repeatedly. It was also a highly persuasive PEP due in no small part to its "why current things suck" section.)
Maybe, but you won't find the word "suck" anywhere in that section :-). And of course, the nice thing about PEP 465 is that it's complaining about a missing feature, which sort of by definition means that it's not complaining about anyone in particular's work.
Nonetheless, an earlier draft of PEP 465 did inadvertently talk about an old PEP in an overly-flippant manner, and I ended up apologizing to the author and fixing it. (Which of course also made the PEP stronger.) It's cool, no-one's perfect. If you think you've made a mistake, then apologize and try to do better, that's all.
-n
-- Nathaniel J. Smith -- https://vorpus.org
This discussion started on Python-Ideas (q.v.), and is also somewhat applicable to Python-Dev. I think further discussion belongs on Core-Mentorship, though. Cc'd and reply-to set. bunslow writes:
My first submission to this list was predicated on what I'd read in PEPs -- and many of those, since they recommend major-enough changes to require a PEP, have sections (often lengthy) dedicated to "what's wrong with the status quo".
That's a good point! However, even a for modest enhancement, you do need to advocate from "what's wrong with the status quo". But that is most persuasive when it can be phrased as "you can't do X", or at least "you can't do X without Y", and this change allow that feature. And Y usually should not be something that most Python programmers do in the ordinary course of writing code, such as defining functions or classes. This is quite a fine point though. I note that Nick defended a *very* short context manager (the "null" context manager) on the grounds that typical Python programmers think of context managers as being a bit magical. They use them all the time in the recommended idioms for file handling and the like, but they very rarely write them. Again, "writing loop statements" is usually considered something that doesn't qualify as a "Y", but we got comprehensions and then generators. I'm not sure what the lesson is here. Maybe it's that crossing the statement/expression boundary is a Y. Or maybe it's that comprehensions and generators are imports of features successful in other languages, so were relatively easy to accept. Regards, Steve
On 28 November 2017 at 13:22, bunslow <bunslow@gmail.com> wrote:
My first submission to this list was predicated on what I'd read in PEPs -- and many of those, since they recommend major-enough changes to require a PEP, have sections (often lengthy) dedicated to "what's wrong with the status quo". My attempt to imitate that obviously crossed some boundaries in retrospect, and of course now that it's brought up here I see that spinning it as "what can be done to make it better" is psychologically much more effective than "why the current way sucks" (because semantically these are either approximately or exactly the same). But that's where it came from, at least with some of my earlier threads, and I suspect the author of the topic message of the OP will have a similar sentiment.
Yeah, by the time someone reaches the point of writing a PEP, there's usually some level of existing awareness along the lines of "The status quo might not be OK any more, so let's explicitly document the risks and benefits associated with a possible change". That means part of the role of the PEP is to summarise the relevant problems with the status quo, such that future readers can understand why any change is being proposed at all. In cases where the proposed change is relatively simple at a technical level, like PEP 479 (converting an unhandled StopIteration to RuntimeError), PEP 538 (coercing the C locale to a UTF-8 based locale), or PEP 565 (tweaking the way we handle DeprecationWarning), the motivation & rationale may end up being the majority of the PEP, since the actual change to be made is relatively minor, but the potential consequences aren't necessarily obvious. By contrast, python-ideas threads usually start at a point earlier in the decision making process: asking ourselves the question "Is the status quo still OK?". In most cases the answer is "Yeah, it's still fine", but we keep asking, because sometimes the answer is "Actually, we could probably improve it by doing...". The easiest trap to fall into on that front is to think to ourselves "The status quo doesn't solve my problems, therefore it doesn't solve anyone's problems", which usually isn't a productive mindset. A more productive framing is typically "The problems that the status quo solves are not the problems that I currently have". It may seem like a small change, but in the second version, we're thinking: - the status quo solves problems for someone, just not for me - whatever I propose should try to avoid making the status quo worse at what it already does - I need to explain the problem I have, not just the potential solution I see and that ends up coming through in the way we write. I'll also note that nobody expects perfection on this front - most of us are thoroughly familiar with the lure of "But someone is *wrong* on the internet" [1], and we know that sometimes it's too late by the time we finally think "Oh, I really should have toned that down a bit before hitting Send...". We just strive to ensure our typical approach is to be respectful of each other, and of past contributors. Cheers, Nick. [1] https://xkcd.com/386/ P.S. For a longer version of the "What problem does it solve?" question in relation to the respective APIs of the requests and urllib modules, folks may be interested in https://www.curiousefficiency.org/posts/2016/08/what-problem-does-it-solve.h... -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
As it turns out, there actually is a wrapper class in the stdlib which encapsulates heapq: queue.PriorityQueue, at least for the most part. Although its intention is not to be a wrapper class for heapq, it is one by nature. But for the sake of its actual purpose, threading, it comes with overhead. This overhead is actually quite noticeable, but not so bad that I would say it is unusable. It also lacks several features, namely: - No equivalent for heapify i.e. no constructor accepting iterable arguments (you *could* heapify first, then queue.put the items to ensure it is O(n)). - No equivalent for peek, pushpop, or replace i.e. expensive push + pop variants need to be used instead. Despite these flaws, the queue.PriorityQueue is quite usable in many cases and offers a wrapper class, which is clearly for many (but not everyone) quite a nice plus, potentially worth the threading overhead. From this, we can also see that a Heap class is actually not the correct base class, but rather that a Queue is probably more appropriate, unless the underlying list is wanted for some reason (for most cases, I would assume it is not, but revealing it should not be a problem either). I can also imagine that optimizations for heapq are possible if moved this way. Currently heapq accepts any MutableSequence, but this comes with some cost I think. If it were a part of the builtin list, or better yet made into its own class, and then written in C, I would assume it would be faster. I think it is safe to say that most people do not use heapq with anything other than lists. In fact I think it is safe to say that most people do not even care about the underlying list when using heapq. The list is more of an implementation detail than anything. A big issue that I see with this suggestion, however, is that it would be confusing to lack or include the peek, pushpop, and replace methods. If they are missing, users will be confused when comparing it to heapq. If they are included, the inconsistency with the queue module will be confusing. I suppose there wouldn't be any harm with adding them to the queue module, but one might question whether or not such is worth it. I can see the exact semantics of peek, pushpop, and replace being confusing in the context of threading. Though a non-issue, the name "pushpop" also doesn't align with queue, and I think "putget" does not sound so nice. Another issue is where would these belong. We already have the queue module, so it may be easy for users to get confused seeing the two. Should it go in collections then? Should it stay in heapq? Although I don't see it happening, I would prefer to have the others inside of concurrent.queue and have queue dedicated to non-concurrent base classes. But of course this would destroy the backwards compatibility of queue, so I can't see that happening. Another issue I see is that queue is meant to support concurrency, and thus has additional features such as queue.task_done(). Although a base Queue class does not necessarily need to implement such, the fact that queue.PriorityQueue has methods such as these makes it confusing to users not interested in concurrency.
participants (17)
-
Antoine Pitrou
-
Brett Cannon
-
bunslow
-
Chris Angelico
-
Grant Jenks
-
Greg Ewing
-
Guido van Rossum
-
jackyeenguyen@gmail.com
-
Nathaniel Smith
-
Nick Coghlan
-
Nick Timkovich
-
Paul Moore
-
Sebastian Kreft
-
Stephan Houben
-
Stephen J. Turnbull
-
Steven D'Aprano
-
Sven R. Kunze