deque: Allow efficient operations

I was playing around with deque today, and there were a couple of operations I wanted to do, that can't really be done efficiently with deque because of its implementation. I was iterating through items of a deque, and in some cases I wanted to delete an item that I've found. As far as I understand, this is an operation that should be O(1) in a linked list, but Python only provides an O(N) way to do that, which is `del d[i]`. The same can be said for inserting items in the middle. What do you think about adding this to `deque`? The API will be tricky, admittedly, because you'll have to save some kind of reference to a cell in the deque. Thanks, Ram.

Hello, On Wed, 29 Apr 2020 13:42:05 +0300 Ram Rachum <ram@rachum.com> wrote:
Deque is a data structure which allows efficient insertion/deletion from 2 ends of the structure, period. If you want a different data structure, like linked list, or doubly-linked list, with different set of operations, implement such a data structure (it's trivial, and being done by other people all the time).
Thanks, Ram.
-- Best regards, Paul mailto:pmiscml@gmail.com

On Wed, Apr 29, 2020 at 1:54 PM Paul Sokolovsky <pmiscml@gmail.com> wrote:
If you want a different data structure, [...] implement such a data structure
And if I wanted an answer like "it's the way that it is because it's the way that it is" I wouldn't be posting on python-ideas. Is there a strategic decision to limit deque to certain operations of doubly-linked lists and not others? That would make sense. Is that decision written somewhere with the rationale behind it? That would be interesting to read.

On Wed, Apr 29, 2020 at 9:01 PM Ram Rachum <ram@rachum.com> wrote:
Is there a strategic decision to limit deque to certain operations of doubly-linked lists and not others? That would make sense. Is that decision written somewhere with the rationale behind it? That would be interesting to read.
"""Deques are a generalization of stacks and queues (the name is pronounced “deck” and is short for “double-ended queue”). Deques support thread-safe, memory efficient appends and pops from either side of the deque with approximately the same O(1) performance in either direction.""" I'm not seeing any evidence that this is a doubly-linked list. It might happen to be implemented using one in current versions of CPython, but that's not mandated anywhere. Is there a particular reason to mandate this for all Pythons for all time? As Paul says, you can always implement your own doubly linked list. ChrisA

Hello, On Wed, 29 Apr 2020 13:58:15 +0300 Ram Rachum <ram@rachum.com> wrote:
Deque is not doubly-linked list, they are completely different data structures. Even if a particular Python implementation uses doubly-linked list as an internal implementation strategy. There're other Python implementations which realize deque in other ways, which guarantee performance only for operations defined for deque and not for any other data structure. (E.g., an obvious implementation is an array with head/tail pointers).
-- Best regards, Paul mailto:pmiscml@gmail.com

A deque is a data structure that exists outside of Python and the only guarantee is that adding an element at the top or the front is O(1). Giving more guarantee would force trade-offs and impose them to other implementations. If you want fast deletion, you should use another data structure. Also, removing an element from a doubly-linked list is not a O(1) operation, it's O(n). It's O(1) once you have a pointer to the element but you have to iterate over the list to get it and that would take O(n) operations, so making deque a doubly-linked list would not be faster anyway. I think the data structure that you are looking for would be a skipped-list, or alternatively (but the implementation is more involved) a red-black tree.

On 29 Apr 2020, at 12:36, remi.lapeyre@henki.fr wrote:
Also, removing an element from a doubly-linked list is not a O(1) operation, it's O(n). It's O(1) once you have a pointer to the element but you have to iterate over the list to get it and that would take O(n) operations, so making deque a doubly-linked list would not be faster anyway.
In the cases where a DLL is the reasonable design choice remove is O(1) in my experience. The reason is that you have the element in your hand by other means then scanning the DLL. This was a heavily used pattern when I worked on DEC VMS device drivers. Async I/O requests could be cancelled and removed from the list of outstanding I/O for a device in O(1). Indeed the VAX had instructions to implement this pattern in hardware. Barry

_collectionsmodule.c has this note, so it sounds like this is intended to be a possibility if there is enough interest: /* insert(), remove(), and delitem() are implemented in terms of rotate()
https://github.com/python/cpython/blob/d16a1520c2b22c3cbde7547937ba283db3e88... --- Ricky. "I've never met a Kentucky man who wasn't either thinking about going home or actually going home." - Happy Chandler

On Wed, Apr 29, 2020 at 8:02 AM Ram Rachum <ram@rachum.com> wrote:
Thanks everybody for your answers, and especially Ricky for finding this note.
which seems to indicate that if one were to implement this code more efficiently, it would be considered for inclusion. I will note that for the most part, the Python specs are for the API, and do not guarantee particular performance characteristics. Dicts are efficient because they need to be for reasonable performance of the entire interpreter, but it isn't part of the Mapping ABC. Is there a single instance of the SAME ABC being implemented in two ways, with different performance characteristics? I've wondered about Linked Lists for a while, but while there are many versions on PyPi, I can't find one that seems to be mature and maintained. Which seems to indicate that there isn't much demand for them. -CHB
-- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On Apr 29, 2020, at 08:33, Christopher Barker <pythonchb@gmail.com> wrote:
I've wondered about Linked Lists for a while, but while there are many versions on PyPi, I can't find one that seems to be mature and maintained. Which seems to indicate that there isn't much demand for them.
I think there’s lots of demand for them, but there are so many different variants that can’t substitute for each other (try taking any nontrivial sample code using Haskell’s single-linked, no-handle, immutable tail-sharing list and rewriting it with C++’s doubly-linked handled mutable list, or vice-versa), and most of the key operations fit so poorly with Python’s sequence/iterable API, and they’re all so easy to build, that people just build the one they need whenever they need it. I do have a few different linked lists in my toolbox that have come up often enough that I stashed them (an immutable cons, a handled double-linked list, a cffi wrapper for a common style of C internally-linked lists, probably others), but half the time I reach for one I have to modify it anyway, so I haven’t bothered to turn them into a package I just import and use. And, while I did add the whole (Mutable)Sequence API to each one (because it’s convenient for debugging and REPL exploration to be able to list(xs), or to get a repr that’s written in terms of a from_iter classmethod so I can eval it back, etc.), I usually don’t use that API for anything but debugging. When you’re dealing with linked lists, you usually need to deal with the nodes directly. For example, one big reason to use linked lists is constant-time splicing, but you can’t splice in constant time if all you have is the head/handle and/or an opaque iterator that only knows how to go forward; you need the node before the splice point (or, for doubly-linked, after is fine too). Another reason to use (Lisp/Haskell-style) linked lists is that they automatically release nodes as you iterate unless you keep a reference to the head, but that’s clumsy to do with Python-style APIs. And so on.

Thanks for all the details -- it really makes the point I was trying to make. On Apr 29, 2020, at 08:33, Christopher Barker <pythonchb@gmail.com> wrote:
Isn't much demand for a *generic* linked list. It would probably be a good recipe though -- so users could have a starting point for their custom version. Also -- as you point out, while it may be useful for a linked list to be a Sequence, the API shouldn't be only that -- it'd almost defeat the purpose of a linked list at all. -CHB
-- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On Apr 29, 2020, at 12:03, Christopher Barker <pythonchb@gmail.com> wrote:
Isn't much demand for a *generic* linked list. It would probably be a good recipe though -- so users could have a starting point for their custom version.
I think what would be really handy would be a HOWTO on linked lists that showed the different options and tradeoffs and how to implement and use at least a few different ones, and showed why they’re useful with examples. (And also showed why the Sequence/Iterable API can be helpful but also why it’s not sufficient.) Then the collections module (and the tutorial?) could both just have a sentence saying “Python doesn’t have a linked list type because there are so many useful kinds of linked lists and they’re all easy to build but very different—see the Linked Lists HOWTO for details.” But if I wrote it, it would probably be 4x as long as any novice would want to read. (I think I wrote some blog posts on linked lists in Python years ago, and ended up building a Haskell-style lazy list out of a trigger function and then showing how to do Fibonacci numbers by recursively zipping it, or something crazy like that.) In the old days we could probably just post three different simple recipes on ActiveState and link to them from the docs and let people build on the examples there, rather than try to write it all up-front and fit it into the Python docs style, but that doesn’t work so well anymore.

On 29/04/2020 20:54, Andrew Barnert via Python-ideas wrote:
Isn't the point that you should be approaching a datastructure in Python thinking about what you want it to do, not how it's implemented underneath? That sort of micromanagement smacks of premature optimisation. -- Rhodri James *-* Kynesim Ltd

On Thu, Apr 30, 2020 at 5:56 AM Andrew Barnert via Python-ideas <python-ideas@python.org> wrote:
Summary, for novices: "Just use a built-in list and don't worry about it. Performance of built-in types is usually 'good enough' even if it isn't perfect." It's only the experts who need this sort of thing, so I'm not too bothered if novices can't implement LLs efficiently in Python. ChrisA

In case you're interested, the pure python OrderedDict code uses a doubly linked list augmented by a dictionary to quickly find individual links. It may be worth taking at look.¹ The implementation was mostly obvious. The only trick was to use weakrefs for the backlink to avoid creating a reference cycle — the original version just lets GC do the clean-up, but users wanted to avoid cycles entirely. Raymond ¹ https://github.com/python/cpython/blob/3.8/Lib/collections/__init__.py#L78

On 30/04/20 3:32 am, Christopher Barker wrote:
I think this is probably because a linked list is more of a design pattern than a concrete data structure. In situations where you want a linked list in particular, rather than just a container that supports certain operations, you really need the references implementing the links to be embedded in the objects being kept in the list. This makes it difficult to abstract, especially if your objects need to be part of more than one list at a time. -- Greg

On Wed, Apr 29, 2020 at 01:42:05PM +0300, Ram Rachum wrote:
I was iterating through items of a deque, and in some cases I wanted to delete an item that I've found.
Deleting items from a container while iterating through it is an anti-pattern, easy to get wrong, prone to bugs, and in a high-level language like Python, very likely to be slower than creating a new container and copying it across. I haven't tried it with dequeues, but some versions ago I tried to find the cross-over point where it was more efficient to delete items from a list in place, instead of copying: mylist[:] = [x from x in mylist if not condition] versus: for i in range(len(mylist)-1, -1, -1): if condition: del mylist[i] My reasoning was that *eventually* for a big enough list, the cost of making the copy would outweigh the cost of moving the items. I don't know if my reasoning was valid or not, but I was unable to find any such cutoff on my computer, up to hundreds of millions of items. Deletion was always slower. But, if you insist on doing it, dequeues support deletion.
Dequeues are not linked lists. I don't believe that they are implemented as linked lists, and I know that they certainly do not offer a linked list API. You could try rotating the dequeue, popping it, then rotating it back. That might be faster, if rotation is fast. But if you are only deleting one or two items, is that deletion a bottle-neck in your code?
The same can be said for inserting items in the middle.
If you want to insert and delete items in the middle, you are probably better off using a list. Dequeues are explicitly optimised for insertion and deletion at the ends, not the middle. It's a tradeoff.
What do you think about adding this to `deque`?
Not much. If you have a clever implementation in mind that will make deletion more efficient, that's an implementation detail that will depend on (1) whether you have someone willing and able to do the work and (2) whether or not that implementation will rule out other possible future improvements.
I think you mean the implementation will be tricky. The API will surely be trivial: either `del mydeque[i]`, as now, or `mydequeue.delete(i)`. If you have some other API in mind, please describe it. -- Steven

Hello, On Wed, 29 Apr 2020 13:42:05 +0300 Ram Rachum <ram@rachum.com> wrote:
Deque is a data structure which allows efficient insertion/deletion from 2 ends of the structure, period. If you want a different data structure, like linked list, or doubly-linked list, with different set of operations, implement such a data structure (it's trivial, and being done by other people all the time).
Thanks, Ram.
-- Best regards, Paul mailto:pmiscml@gmail.com

On Wed, Apr 29, 2020 at 1:54 PM Paul Sokolovsky <pmiscml@gmail.com> wrote:
If you want a different data structure, [...] implement such a data structure
And if I wanted an answer like "it's the way that it is because it's the way that it is" I wouldn't be posting on python-ideas. Is there a strategic decision to limit deque to certain operations of doubly-linked lists and not others? That would make sense. Is that decision written somewhere with the rationale behind it? That would be interesting to read.

On Wed, Apr 29, 2020 at 9:01 PM Ram Rachum <ram@rachum.com> wrote:
Is there a strategic decision to limit deque to certain operations of doubly-linked lists and not others? That would make sense. Is that decision written somewhere with the rationale behind it? That would be interesting to read.
"""Deques are a generalization of stacks and queues (the name is pronounced “deck” and is short for “double-ended queue”). Deques support thread-safe, memory efficient appends and pops from either side of the deque with approximately the same O(1) performance in either direction.""" I'm not seeing any evidence that this is a doubly-linked list. It might happen to be implemented using one in current versions of CPython, but that's not mandated anywhere. Is there a particular reason to mandate this for all Pythons for all time? As Paul says, you can always implement your own doubly linked list. ChrisA

Hello, On Wed, 29 Apr 2020 13:58:15 +0300 Ram Rachum <ram@rachum.com> wrote:
Deque is not doubly-linked list, they are completely different data structures. Even if a particular Python implementation uses doubly-linked list as an internal implementation strategy. There're other Python implementations which realize deque in other ways, which guarantee performance only for operations defined for deque and not for any other data structure. (E.g., an obvious implementation is an array with head/tail pointers).
-- Best regards, Paul mailto:pmiscml@gmail.com

A deque is a data structure that exists outside of Python and the only guarantee is that adding an element at the top or the front is O(1). Giving more guarantee would force trade-offs and impose them to other implementations. If you want fast deletion, you should use another data structure. Also, removing an element from a doubly-linked list is not a O(1) operation, it's O(n). It's O(1) once you have a pointer to the element but you have to iterate over the list to get it and that would take O(n) operations, so making deque a doubly-linked list would not be faster anyway. I think the data structure that you are looking for would be a skipped-list, or alternatively (but the implementation is more involved) a red-black tree.

On 29 Apr 2020, at 12:36, remi.lapeyre@henki.fr wrote:
Also, removing an element from a doubly-linked list is not a O(1) operation, it's O(n). It's O(1) once you have a pointer to the element but you have to iterate over the list to get it and that would take O(n) operations, so making deque a doubly-linked list would not be faster anyway.
In the cases where a DLL is the reasonable design choice remove is O(1) in my experience. The reason is that you have the element in your hand by other means then scanning the DLL. This was a heavily used pattern when I worked on DEC VMS device drivers. Async I/O requests could be cancelled and removed from the list of outstanding I/O for a device in O(1). Indeed the VAX had instructions to implement this pattern in hardware. Barry

_collectionsmodule.c has this note, so it sounds like this is intended to be a possibility if there is enough interest: /* insert(), remove(), and delitem() are implemented in terms of rotate()
https://github.com/python/cpython/blob/d16a1520c2b22c3cbde7547937ba283db3e88... --- Ricky. "I've never met a Kentucky man who wasn't either thinking about going home or actually going home." - Happy Chandler

On Wed, Apr 29, 2020 at 8:02 AM Ram Rachum <ram@rachum.com> wrote:
Thanks everybody for your answers, and especially Ricky for finding this note.
which seems to indicate that if one were to implement this code more efficiently, it would be considered for inclusion. I will note that for the most part, the Python specs are for the API, and do not guarantee particular performance characteristics. Dicts are efficient because they need to be for reasonable performance of the entire interpreter, but it isn't part of the Mapping ABC. Is there a single instance of the SAME ABC being implemented in two ways, with different performance characteristics? I've wondered about Linked Lists for a while, but while there are many versions on PyPi, I can't find one that seems to be mature and maintained. Which seems to indicate that there isn't much demand for them. -CHB
-- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On Apr 29, 2020, at 08:33, Christopher Barker <pythonchb@gmail.com> wrote:
I've wondered about Linked Lists for a while, but while there are many versions on PyPi, I can't find one that seems to be mature and maintained. Which seems to indicate that there isn't much demand for them.
I think there’s lots of demand for them, but there are so many different variants that can’t substitute for each other (try taking any nontrivial sample code using Haskell’s single-linked, no-handle, immutable tail-sharing list and rewriting it with C++’s doubly-linked handled mutable list, or vice-versa), and most of the key operations fit so poorly with Python’s sequence/iterable API, and they’re all so easy to build, that people just build the one they need whenever they need it. I do have a few different linked lists in my toolbox that have come up often enough that I stashed them (an immutable cons, a handled double-linked list, a cffi wrapper for a common style of C internally-linked lists, probably others), but half the time I reach for one I have to modify it anyway, so I haven’t bothered to turn them into a package I just import and use. And, while I did add the whole (Mutable)Sequence API to each one (because it’s convenient for debugging and REPL exploration to be able to list(xs), or to get a repr that’s written in terms of a from_iter classmethod so I can eval it back, etc.), I usually don’t use that API for anything but debugging. When you’re dealing with linked lists, you usually need to deal with the nodes directly. For example, one big reason to use linked lists is constant-time splicing, but you can’t splice in constant time if all you have is the head/handle and/or an opaque iterator that only knows how to go forward; you need the node before the splice point (or, for doubly-linked, after is fine too). Another reason to use (Lisp/Haskell-style) linked lists is that they automatically release nodes as you iterate unless you keep a reference to the head, but that’s clumsy to do with Python-style APIs. And so on.

Thanks for all the details -- it really makes the point I was trying to make. On Apr 29, 2020, at 08:33, Christopher Barker <pythonchb@gmail.com> wrote:
Isn't much demand for a *generic* linked list. It would probably be a good recipe though -- so users could have a starting point for their custom version. Also -- as you point out, while it may be useful for a linked list to be a Sequence, the API shouldn't be only that -- it'd almost defeat the purpose of a linked list at all. -CHB
-- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On Apr 29, 2020, at 12:03, Christopher Barker <pythonchb@gmail.com> wrote:
Isn't much demand for a *generic* linked list. It would probably be a good recipe though -- so users could have a starting point for their custom version.
I think what would be really handy would be a HOWTO on linked lists that showed the different options and tradeoffs and how to implement and use at least a few different ones, and showed why they’re useful with examples. (And also showed why the Sequence/Iterable API can be helpful but also why it’s not sufficient.) Then the collections module (and the tutorial?) could both just have a sentence saying “Python doesn’t have a linked list type because there are so many useful kinds of linked lists and they’re all easy to build but very different—see the Linked Lists HOWTO for details.” But if I wrote it, it would probably be 4x as long as any novice would want to read. (I think I wrote some blog posts on linked lists in Python years ago, and ended up building a Haskell-style lazy list out of a trigger function and then showing how to do Fibonacci numbers by recursively zipping it, or something crazy like that.) In the old days we could probably just post three different simple recipes on ActiveState and link to them from the docs and let people build on the examples there, rather than try to write it all up-front and fit it into the Python docs style, but that doesn’t work so well anymore.

On 29/04/2020 20:54, Andrew Barnert via Python-ideas wrote:
Isn't the point that you should be approaching a datastructure in Python thinking about what you want it to do, not how it's implemented underneath? That sort of micromanagement smacks of premature optimisation. -- Rhodri James *-* Kynesim Ltd

On Thu, Apr 30, 2020 at 5:56 AM Andrew Barnert via Python-ideas <python-ideas@python.org> wrote:
Summary, for novices: "Just use a built-in list and don't worry about it. Performance of built-in types is usually 'good enough' even if it isn't perfect." It's only the experts who need this sort of thing, so I'm not too bothered if novices can't implement LLs efficiently in Python. ChrisA

In case you're interested, the pure python OrderedDict code uses a doubly linked list augmented by a dictionary to quickly find individual links. It may be worth taking at look.¹ The implementation was mostly obvious. The only trick was to use weakrefs for the backlink to avoid creating a reference cycle — the original version just lets GC do the clean-up, but users wanted to avoid cycles entirely. Raymond ¹ https://github.com/python/cpython/blob/3.8/Lib/collections/__init__.py#L78

On 30/04/20 3:32 am, Christopher Barker wrote:
I think this is probably because a linked list is more of a design pattern than a concrete data structure. In situations where you want a linked list in particular, rather than just a container that supports certain operations, you really need the references implementing the links to be embedded in the objects being kept in the list. This makes it difficult to abstract, especially if your objects need to be part of more than one list at a time. -- Greg

On Wed, Apr 29, 2020 at 01:42:05PM +0300, Ram Rachum wrote:
I was iterating through items of a deque, and in some cases I wanted to delete an item that I've found.
Deleting items from a container while iterating through it is an anti-pattern, easy to get wrong, prone to bugs, and in a high-level language like Python, very likely to be slower than creating a new container and copying it across. I haven't tried it with dequeues, but some versions ago I tried to find the cross-over point where it was more efficient to delete items from a list in place, instead of copying: mylist[:] = [x from x in mylist if not condition] versus: for i in range(len(mylist)-1, -1, -1): if condition: del mylist[i] My reasoning was that *eventually* for a big enough list, the cost of making the copy would outweigh the cost of moving the items. I don't know if my reasoning was valid or not, but I was unable to find any such cutoff on my computer, up to hundreds of millions of items. Deletion was always slower. But, if you insist on doing it, dequeues support deletion.
Dequeues are not linked lists. I don't believe that they are implemented as linked lists, and I know that they certainly do not offer a linked list API. You could try rotating the dequeue, popping it, then rotating it back. That might be faster, if rotation is fast. But if you are only deleting one or two items, is that deletion a bottle-neck in your code?
The same can be said for inserting items in the middle.
If you want to insert and delete items in the middle, you are probably better off using a list. Dequeues are explicitly optimised for insertion and deletion at the ends, not the middle. It's a tradeoff.
What do you think about adding this to `deque`?
Not much. If you have a clever implementation in mind that will make deletion more efficient, that's an implementation detail that will depend on (1) whether you have someone willing and able to do the work and (2) whether or not that implementation will rule out other possible future improvements.
I think you mean the implementation will be tricky. The API will surely be trivial: either `del mydeque[i]`, as now, or `mydequeue.delete(i)`. If you have some other API in mind, please describe it. -- Steven
participants (13)
-
Andrew Barnert
-
Barry Scott
-
Chris Angelico
-
Christopher Barker
-
Greg Ewing
-
Paul Sokolovsky
-
Ram Rachum
-
Raymond Hettinger
-
remi.lapeyre@henki.fr
-
Rhodri James
-
Ricky Teachey
-
Serhiy Storchaka
-
Steven D'Aprano