Add pop_callback to deque signature

My initial proposal would have been to add a data structure to allow something like deque, but allowing cheaper random accesses. However I realized it was a use case more suitable for an external lib. Thinking about how I would implement such a new data structure, I imagined several possibilities, one of them would be to couple a deque with a dict or a list. However, sharing data between a deque and another data structure is hard because, while you can easily hook on the element going in (since you put them in yourself), there is no efficient way to get back an element on its way out. On lists or dicts, if you remove an element, you can pop() it and you get back the removed element. On deque, if you set a maxlen of 5 and add a 6th element, if you want to get the element that has been removed, you need to check if the maxlen has been reached, and if yes, get a reference to the first element, then add the new one. It's inconvenient and of course slower than it needs to be given that deque are quite fast. So my more modest and realistic proposal would be to add a callback on deque signature: collections.deque(iterable, maxlen, pop_callback) pop_callback would accept any callable, and call it everytime an element is removed from the deque, allowing third party libraries to then do whatever they need with it.

On Tue, Oct 03, 2017 at 10:18:13AM +0200, Michel Desmoulin wrote:
My initial proposal would have been to add a data structure to allow something like deque, but allowing cheaper random accesses.
That would be a list :-) Most data structures trade off performance in one area for performance in another. Lists have good random access, but slow insertion and deletion at the start. Deques have fast insertion and deletion at both ends, but slower random access. If it were possible to have a lightweight sequence data structure with fast O(1) random access AND insertion/deletion at the same time, that would be a candidate to replace both lists and deques in the stdlib. If you have such a data structure in mind, you should propose it. [...]
Adding callbacks to basic data structures like this seems like a hypergeneralisation. How often are they going to be used? Not often enough to make it worthwhile, I think. But if we did something like that, I think that following the design of dicts would be a closer match to Pythonic style than a callback. Dicts support a special __missing__ method in subclasses. Perhaps deque could do something similar? Subclass deque and give it a __dropped__ method, and whenever an item is dropped from the deque the method will be called with the value dropped (and perhaps a flag telling you which end it was dropped from). To me, that sounds like a more Pythonic interface than a callback. -- Steve

On Tue, 3 Oct 2017 22:04:46 +1100 Steven D'Aprano <steve@pearwood.info> wrote:
Daniel Stutzbach's blist is well-known at this point: http://stutzbachenterprises.com/performance-blist See http://legacy.python.org/dev/peps/pep-3128/ (rejected) and https://mail.python.org/pipermail/python-ideas/2014-September/029434.html (inconclusive) Regards Antoine.

On Tue, Oct 3, 2017 at 5:22 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
https://mail.python.org/pipermail/python-ideas/2014-September/029434.html (inconclusive)
that was some years ago -- I wonder how much use it's seen? but I note: "I'm really reluctant to include `blist` as a dependency, given that it would basically mean my package wouldn't be pip-installable on Windows machines. " With binary wheels and all, this isn't the same issue it used to be -- so a third party package is more viable. Though on PyPi it hasn't been touched for 4 years -- so maybe didn't turn out to be that useful (Or jsut not that well known) -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On Tue, 3 Oct 2017 10:13:53 -0600 Chris Barker <chris.barker@noaa.gov> wrote:
You can go a surprisingly long way with Python's built-in and stdlib containers, so I'm not surprised it's not very widely used.
I don't know what wheels are supposed to change here. You could already build binary packages for Windows before wheels existed. The problem as I understand it is that you need a Windows machine (or VM) together with the required set of compilers, and have to take the time to run the builds. With conda-forge though, one could simply submit a recipe and have all builds done automatically in the cloud. Your users then have to use conda (rather than pip and virtualenv). Regards Antoine.

Exactly — what are the odds that list or deque performance is your bottleneck? However, the barrier to entry for a third party package is still quite a bit higher than the Stdlib. So if a third party package that provides nothing but a performance boost isn’t used much — that doesn’t mean it wouldn’t be well-used if it were in the stdlib. Note: I am not advocating anything— I haven’t looked at blist at all. I don't know what wheels are supposed to change here. You could already
build binary packages for Windows before wheels existed.
Yes, but with a different ecosystem an no virtual environment support. Being able to pip install binary wheels does make things easier for Windows users. The problem as
Yup — still the case. Also with Mac and OS-X. Distributing a package with a compiled component is still a lot more work. With conda-forge though, one could simply submit a recipe and have all
builds done automatically in the cloud. Your users then have to use conda (rather than pip and
I’m a big fan of conda and conda-forge. But a similar auto-build system could support binary wheels and pypi as well. And indeed, the scipy folks are doing just that. My point was that the infrastructure for delivering complied packaged is much better than it was even a few years ago. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On 4 October 2017 at 04:24, Chris Barker <chris.barker@noaa.gov> wrote:
The broad availability & popularity of AppVeyor's free tier is another relevant change compared to a few years ago, and https://github.com/joerick/cibuildwheel is designed to work with that to help projects automate their artifact builds without need to maintain their own cross-platform build infrastructure. So yeah, we're definitely to a point where adding new data structures to the standard library, or new features to existing data structures, is mainly going to be driven by standard library use cases that benefit from them, rather than "binary dependencies are still too hard to publish & manage". (For example, __missing__ made defaultdict easy to implement). For deque specifically, I like Steven D'Aprano's suggestion of a "__dropped__" or "__discard__" subclassing API that makes it straightforward to change the way that queue overruns are handled (especially if raising an exception from the new subclass method can prevent the collection modification entirely - that way you could readily change the deque semantics in a subclass such that if the queue fills up, submitters start getting errors instead of silently discarding older messages, allowing backpressure to be more easily propagated through a system of queues). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Wed, 4 Oct 2017 08:47:46 +0300 Serhiy Storchaka <storchaka@gmail.com> wrote:
You would only do that for subtypes, so when Py_TYPE(self) is different from the base type. This is a simple pointer comparison. Nick's idea sounds nice to me, as long as there's an actual use case :-) Regards Antoine.

On 4 October 2017 at 15:47, Serhiy Storchaka <storchaka@gmail.com> wrote:
Aye, that would need to be considered, and may push the API towards callback registration on the instance over using a subclassing API. The performance considerations in the dict case are different, since the default behaviour is to raise KeyError, where the cost of checking for the method doesn't matter much either because the command is about to terminate, or else because it is still much faster than instantiating the caught exception. However, most of the performance impact could also be avoided through a PyCheck_Exact that only checks for the method for subclasses, and not for regular deque instances. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Tue, Oct 03, 2017 at 10:18:13AM +0200, Michel Desmoulin wrote:
My initial proposal would have been to add a data structure to allow something like deque, but allowing cheaper random accesses.
That would be a list :-) Most data structures trade off performance in one area for performance in another. Lists have good random access, but slow insertion and deletion at the start. Deques have fast insertion and deletion at both ends, but slower random access. If it were possible to have a lightweight sequence data structure with fast O(1) random access AND insertion/deletion at the same time, that would be a candidate to replace both lists and deques in the stdlib. If you have such a data structure in mind, you should propose it. [...]
Adding callbacks to basic data structures like this seems like a hypergeneralisation. How often are they going to be used? Not often enough to make it worthwhile, I think. But if we did something like that, I think that following the design of dicts would be a closer match to Pythonic style than a callback. Dicts support a special __missing__ method in subclasses. Perhaps deque could do something similar? Subclass deque and give it a __dropped__ method, and whenever an item is dropped from the deque the method will be called with the value dropped (and perhaps a flag telling you which end it was dropped from). To me, that sounds like a more Pythonic interface than a callback. -- Steve

On Tue, 3 Oct 2017 22:04:46 +1100 Steven D'Aprano <steve@pearwood.info> wrote:
Daniel Stutzbach's blist is well-known at this point: http://stutzbachenterprises.com/performance-blist See http://legacy.python.org/dev/peps/pep-3128/ (rejected) and https://mail.python.org/pipermail/python-ideas/2014-September/029434.html (inconclusive) Regards Antoine.

On Tue, Oct 3, 2017 at 5:22 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
https://mail.python.org/pipermail/python-ideas/2014-September/029434.html (inconclusive)
that was some years ago -- I wonder how much use it's seen? but I note: "I'm really reluctant to include `blist` as a dependency, given that it would basically mean my package wouldn't be pip-installable on Windows machines. " With binary wheels and all, this isn't the same issue it used to be -- so a third party package is more viable. Though on PyPi it hasn't been touched for 4 years -- so maybe didn't turn out to be that useful (Or jsut not that well known) -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On Tue, 3 Oct 2017 10:13:53 -0600 Chris Barker <chris.barker@noaa.gov> wrote:
You can go a surprisingly long way with Python's built-in and stdlib containers, so I'm not surprised it's not very widely used.
I don't know what wheels are supposed to change here. You could already build binary packages for Windows before wheels existed. The problem as I understand it is that you need a Windows machine (or VM) together with the required set of compilers, and have to take the time to run the builds. With conda-forge though, one could simply submit a recipe and have all builds done automatically in the cloud. Your users then have to use conda (rather than pip and virtualenv). Regards Antoine.

Exactly — what are the odds that list or deque performance is your bottleneck? However, the barrier to entry for a third party package is still quite a bit higher than the Stdlib. So if a third party package that provides nothing but a performance boost isn’t used much — that doesn’t mean it wouldn’t be well-used if it were in the stdlib. Note: I am not advocating anything— I haven’t looked at blist at all. I don't know what wheels are supposed to change here. You could already
build binary packages for Windows before wheels existed.
Yes, but with a different ecosystem an no virtual environment support. Being able to pip install binary wheels does make things easier for Windows users. The problem as
Yup — still the case. Also with Mac and OS-X. Distributing a package with a compiled component is still a lot more work. With conda-forge though, one could simply submit a recipe and have all
builds done automatically in the cloud. Your users then have to use conda (rather than pip and
I’m a big fan of conda and conda-forge. But a similar auto-build system could support binary wheels and pypi as well. And indeed, the scipy folks are doing just that. My point was that the infrastructure for delivering complied packaged is much better than it was even a few years ago. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On 4 October 2017 at 04:24, Chris Barker <chris.barker@noaa.gov> wrote:
The broad availability & popularity of AppVeyor's free tier is another relevant change compared to a few years ago, and https://github.com/joerick/cibuildwheel is designed to work with that to help projects automate their artifact builds without need to maintain their own cross-platform build infrastructure. So yeah, we're definitely to a point where adding new data structures to the standard library, or new features to existing data structures, is mainly going to be driven by standard library use cases that benefit from them, rather than "binary dependencies are still too hard to publish & manage". (For example, __missing__ made defaultdict easy to implement). For deque specifically, I like Steven D'Aprano's suggestion of a "__dropped__" or "__discard__" subclassing API that makes it straightforward to change the way that queue overruns are handled (especially if raising an exception from the new subclass method can prevent the collection modification entirely - that way you could readily change the deque semantics in a subclass such that if the queue fills up, submitters start getting errors instead of silently discarding older messages, allowing backpressure to be more easily propagated through a system of queues). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Wed, 4 Oct 2017 08:47:46 +0300 Serhiy Storchaka <storchaka@gmail.com> wrote:
You would only do that for subtypes, so when Py_TYPE(self) is different from the base type. This is a simple pointer comparison. Nick's idea sounds nice to me, as long as there's an actual use case :-) Regards Antoine.

On 4 October 2017 at 15:47, Serhiy Storchaka <storchaka@gmail.com> wrote:
Aye, that would need to be considered, and may push the API towards callback registration on the instance over using a subclassing API. The performance considerations in the dict case are different, since the default behaviour is to raise KeyError, where the cost of checking for the method doesn't matter much either because the command is about to terminate, or else because it is still much faster than instantiating the caught exception. However, most of the performance impact could also be avoided through a PyCheck_Exact that only checks for the method for subclasses, and not for regular deque instances. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
participants (6)
-
Antoine Pitrou
-
Chris Barker
-
Michel Desmoulin
-
Nick Coghlan
-
Serhiy Storchaka
-
Steven D'Aprano