Hi, I'm curious. If I append an item to a list from the left using `list.insert`, will Python always move the entire list one item to the right (which can be super-slow) or will it check first to see whether it can just allocate more memory to the left of the list and put the item there, saving a lot of resources? Thanks, Ram.
Oops, my mistake, I meant to post in python-list but accidentally posted here because it was in my favorites. Sorry, I'll post there. On Friday, February 7, 2014 1:58:05 AM UTC+2, Ram Rachum wrote:
Hi,
I'm curious. If I append an item to a list from the left using `list.insert`, will Python always move the entire list one item to the right (which can be super-slow) or will it check first to see whether it can just allocate more memory to the left of the list and put the item there, saving a lot of resources?
Thanks, Ram.
Okay, now that I've posted on python-list, and among the 80% of joke posts in my thread, there was a 20% of posts that assured me that Python's list has no such behavior and that I should use deque, I now ask on Python-ideas: Is there a good reason why `list.insert(whatever, 0)` doesn't opportunistically try to allocate more space at the left side of the list, so as to save the expensive operation of moving all the items? I'm not saying it should reserve space there, just check if that space is available, and if so use it. Is there any reason why not? Thanks, Ram. On Friday, February 7, 2014 1:58:49 AM UTC+2, Ram Rachum wrote:
Oops, my mistake, I meant to post in python-list but accidentally posted here because it was in my favorites. Sorry, I'll post there.
On Friday, February 7, 2014 1:58:05 AM UTC+2, Ram Rachum wrote:
Hi,
I'm curious. If I append an item to a list from the left using `list.insert`, will Python always move the entire list one item to the right (which can be super-slow) or will it check first to see whether it can just allocate more memory to the left of the list and put the item there, saving a lot of resources?
Thanks, Ram.
On 02/07/2014 11:14 AM, Ram Rachum wrote:
Okay, now that I've posted on python-list, and among the 80% of joke posts in my thread, there was a 20% of posts that assured me that Python's list has no such behavior and that I should use deque, I now ask on Python-ideas: Is there a good reason why `list.insert(whatever, 0)` doesn't opportunistically try to allocate more space at the left side of the list, so as to save the expensive operation of moving all the items? I'm not saying it should reserve space there, just check if that space is available, and if so use it. Is there any reason why not?
It is not possible, because it would change the pointer (to the actual "stock" of items), that must point to the first item, and be also known by the memory allocator as such. In principle, one can have a "fake" pointer with some part of the data actually positioned *before* the address it points to. This is in fact the way "Pascal arrays" (and strings) are commonly implemented, with their sizes prefixed to the first item. Bit this introduces some complication (eg one must recompute the original pointer for dealloc) and must be designed right from the start at the lowest level. The only practicle way would be to systematically reserve memory space before the start item [*], for such cases. It is not worth it for a very specific operation like like.insert_anywhere (even less list.insert_at_start), which is not (in my view) the proper point of lists (arrays, in fact). We should use proper collections whenever we need inserting (and removing) at arbitrary locations. 99% lists do not need that. As I see it, lists are for, in order of importance: * put new item (at the end) (also when used as a stack, =push) * iterate (in order) * read item (anywhere) * change item (anywhere) * take or remove last item (only when used only as a stack, =pop) The deal may possibly be different if arrays were python's only collection (like link lists in lisp or tables in lua). d [*] I sometimes wish strings would reserve place for exactly one more code at the _end_, for cases when one must ensure a string (often read from file) terminates with a newline ;-)
Thanks for your answers everyone! It was interesting for me. On Feb 7, 2014 12:37 PM, "spir" <denis.spir@gmail.com> wrote:
On 02/07/2014 11:14 AM, Ram Rachum wrote:
Okay, now that I've posted on python-list, and among the 80% of joke posts in my thread, there was a 20% of posts that assured me that Python's list has no such behavior and that I should use deque, I now ask on Python-ideas: Is there a good reason why `list.insert(whatever, 0)` doesn't opportunistically try to allocate more space at the left side of the list, so as to save the expensive operation of moving all the items? I'm not saying it should reserve space there, just check if that space is available, and if so use it. Is there any reason why not?
It is not possible, because it would change the pointer (to the actual "stock" of items), that must point to the first item, and be also known by the memory allocator as such. In principle, one can have a "fake" pointer with some part of the data actually positioned *before* the address it points to. This is in fact the way "Pascal arrays" (and strings) are commonly implemented, with their sizes prefixed to the first item. Bit this introduces some complication (eg one must recompute the original pointer for dealloc) and must be designed right from the start at the lowest level.
The only practicle way would be to systematically reserve memory space before the start item [*], for such cases. It is not worth it for a very specific operation like like.insert_anywhere (even less list.insert_at_start), which is not (in my view) the proper point of lists (arrays, in fact). We should use proper collections whenever we need inserting (and removing) at arbitrary locations. 99% lists do not need that. As I see it, lists are for, in order of importance:
* put new item (at the end) (also when used as a stack, =push) * iterate (in order)
* read item (anywhere) * change item (anywhere)
* take or remove last item (only when used only as a stack, =pop)
The deal may possibly be different if arrays were python's only collection (like link lists in lisp or tables in lua).
d
[*] I sometimes wish strings would reserve place for exactly one more code at the _end_, for cases when one must ensure a string (often read from file) terminates with a newline ;-) _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
--
--- You received this message because you are subscribed to a topic in the Google Groups "python-ideas" group. To unsubscribe from this topic, visit https://groups.google.com/d/ topic/python-ideas/G0O5NN9DjSM/unsubscribe. To unsubscribe from this group and all its topics, send an email to python-ideas+unsubscribe@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
Ram Rachum writes:
Okay, now that I've posted on python-list, and among the 80% of joke posts in my thread, there was a 20% of posts that assured me that Python's list has no such behavior and that I should use deque,
For complexity of collections the best reference I know is https://wiki.python.org/moin/TimeComplexity
Is there a good reason why `list.insert(whatever, 0)` doesn't opportunistically try to allocate more space at the left side of the list, so as to save the expensive operation of moving all the items? I'm not saying it should reserve space there, just check if that space is available, and if so use it.
The problem is that it would have to have unholy carnal knowledge of OS internals (eg, of malloc). First off, availability itself is non-portable, depending on a lot of things (eg, placement of malloc metadata and Python object metadata). I would imagine those things are placed at the beginning of a data structure, but your OS may vary. Even if somehow they were placed at the end of the allocation block, you'd have to ask malloc if there is an empty block before it, and then futz with the malloc metadata if you were to try to snarf that block. Sounds like a great way to crash Python to me. I'll take collections.deque, thankyouverymuch. A general note: implementations of Python builtins are generally quite well-tuned, but not at the expense of excessive complexity. Python's threshold for "excessive" is pretty low, but nonetheless most builtins are very efficient by now. P.S. ISTR posting via Googlegroups is deprecated.
On Fri, 07 Feb 2014 20:05:41 +0900 "Stephen J. Turnbull" <stephen@xemacs.org> wrote:
Is there a good reason why `list.insert(whatever, 0)` doesn't opportunistically try to allocate more space at the left side of the list, so as to save the expensive operation of moving all the items? I'm not saying it should reserve space there, just check if that space is available, and if so use it.
The problem is that it would have to have unholy carnal knowledge of OS internals (eg, of malloc). First off, availability itself is non-portable, depending on a lot of things (eg, placement of malloc metadata and Python object metadata).
??? I don't understand what you're talking about. It is perfectly possible while being portable. The proof is that bytearray has a limited variant of that optimization (not for insertions, but for deletions at the front). Regards Antoine.
Antoine Pitrou writes:
On Fri, 07 Feb 2014 20:05:41 +0900 "Stephen J. Turnbull" <stephen@xemacs.org> wrote:
Is there a good reason why `list.insert(whatever, 0)` doesn't opportunistically try to allocate more space at the left side of the list, so as to save the expensive operation of moving all the items? I'm not saying it should reserve space there, just check if that space is available, and if so use it.
The problem is that it would have to have unholy carnal knowledge of OS internals (eg, of malloc). First off, availability itself is non-portable, depending on a lot of things (eg, placement of malloc metadata and Python object metadata).
??? I don't understand what you're talking about.
Obviously.
It is perfectly possible while being portable. The proof is that bytearray has a limited variant of that optimization (not for insertions, but for deletions at the front).
What about "list.insert(whatever, 0)" looks like a deletion operation to you? Certainly, it would be possible to keep an additional start pointer, and advance that for deletions at position 0, then use that space for insertions at 0 (or perhaps "early" in the list) if available. But the OP refers to *allocation*, and specifically disallows "reserving space". So it's clear he's not talking about a feature like that, he's talking about finding *new* space.
On Sat, 08 Feb 2014 06:50:29 +0900 "Stephen J. Turnbull" <stephen@xemacs.org> wrote:
Certainly, it would be possible to keep an additional start pointer, and advance that for deletions at position 0, then use that space for insertions at 0 (or perhaps "early" in the list) if available.
Possible, and quite reasonable actually.
But the OP refers to *allocation*, and specifically disallows "reserving space".
Ok, so you're arguing about a misunderstanding by the OP about how memory allocation works. That doesn't mean that overallocating at the front is any more difficult than overallocating at the end is (the latter being of course already implemented by the list datatype). Regards Antoine.
Antoine Pitrou writes:
On Sat, 08 Feb 2014 06:50:29 +0900 "Stephen J. Turnbull" <stephen@xemacs.org> wrote:
But the OP refers to *allocation*, and specifically disallows "reserving space".
Ok, so you're arguing about a misunderstanding by the OP about how memory allocation works.
Yes. Why did you think I was doing anything but responding to the OP's message?
On 2/7/14 5:14 AM, Ram Rachum wrote:
Okay, now that I've posted on python-list, and among the 80% of joke posts in my thread, there was a 20% of posts that assured me that Python's list has no such behavior and that I should use deque, I now ask on Python-ideas: Is there a good reason why `list.insert(whatever, 0)` doesn't opportunistically try to allocate more space at the left side of the list, so as to save the expensive operation of moving all the items? I'm not saying it should reserve space there, just check if that space is available, and if so use it. Is there any reason why not?
"check if that space is available": this is not a simple operation. Only the memory allocator knows what blocks of memory are allocated and which are not. Memory allocators typically don't support the operation of "extend my memory block downward if possible." Why is deque not the right answer for your problem? --Ned.
On Fri, Feb 7, 2014 at 10:15 PM, Ned Batchelder <ned@nedbatchelder.com> wrote:
On 2/7/14 5:14 AM, Ram Rachum wrote:
Okay, now that I've posted on python-list, and among the 80% of joke posts in my thread, there was a 20% of posts that assured me that Python's list has no such behavior and that I should use deque, I now ask on Python-ideas: Is there a good reason why `list.insert(whatever, 0)` doesn't opportunistically try to allocate more space at the left side of the list, so as to save the expensive operation of moving all the items? I'm not saying it should reserve space there, just check if that space is available, and if so use it. Is there any reason why not?
"check if that space is available": this is not a simple operation. Only the memory allocator knows what blocks of memory are allocated and which are not. Memory allocators typically don't support the operation of "extend my memory block downward if possible."
Checking if space is available could be done without any fiddling with malloc, though. All it requires is an optimization of pop(0) followed by insert(0,foo). That is, when you poop(0), the list just marks itself as having one junk entry before the first entry (presumably it already keeps track of spare space at the end, so this would be just one more integer), which can then be reused later. But apart from maybe reducing the memory copying (the same optimization could mean that repeated pop(0) calls would incur less copying, too), there's not a huge gain. ChrisA
On Fri, 7 Feb 2014 22:45:33 +1100 Chris Angelico <rosuav@gmail.com> wrote:
But apart from maybe reducing the memory copying (the same optimization could mean that repeated pop(0) calls would incur less copying, too), there's not a huge gain.
If you think switching from O(n**2) to O(n) isn't a huge gain, then indeed :-) Regards Antoine.
On Sat, Feb 8, 2014 at 1:59 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
On Fri, 7 Feb 2014 22:45:33 +1100 Chris Angelico <rosuav@gmail.com> wrote:
But apart from maybe reducing the memory copying (the same optimization could mean that repeated pop(0) calls would incur less copying, too), there's not a huge gain.
If you think switching from O(n**2) to O(n) isn't a huge gain, then indeed :-)
Sure, that would be a huge gain. But anyone who's repeatedly calling pop(0) followed by insert(0,x) should be using deque rather than list. It's like optimizing str + str, only less common. Yes, there's an algorithmic complexity improvement, to be sure; but there's an alternative that has all the same improvement already there. A scheme such as I described would have a constant-time penalty for *every list*, so the benefit to a small number of lists is that much less likely to actually improve overall performance. ChrisA
On Sat, 8 Feb 2014 05:00:24 +1100 Chris Angelico <rosuav@gmail.com> wrote:
On Sat, Feb 8, 2014 at 1:59 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
On Fri, 7 Feb 2014 22:45:33 +1100 Chris Angelico <rosuav@gmail.com> wrote:
But apart from maybe reducing the memory copying (the same optimization could mean that repeated pop(0) calls would incur less copying, too), there's not a huge gain.
If you think switching from O(n**2) to O(n) isn't a huge gain, then indeed :-)
Sure, that would be a huge gain. But anyone who's repeatedly calling pop(0) followed by insert(0,x) should be using deque rather than list.
Indeed, since deque exists, it is the reasonable answer. This whole discussion happens in a hypothetical setting where deque wouldn't exist and there would be an opportunity to make list a one-size-fits-all sequence type. (note that my personal opinion is that built-in Python data types should be sufficiently powerful to cater for most use cases, rather than build a whole array of specialized collection types as in Java or C++) Regards Antoine.
On Sat, Feb 8, 2014 at 5:13 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
Sure, that would be a huge gain. But anyone who's repeatedly calling pop(0) followed by insert(0,x) should be using deque rather than list.
Indeed, since deque exists, it is the reasonable answer.
This whole discussion happens in a hypothetical setting where deque wouldn't exist and there would be an opportunity to make list a one-size-fits-all sequence type.
Well, yes. In the absence of deque, this would be a more plausible optimization; if nothing else, it would make destructive iteration far more efficient, even without the insert() optimization. It'd still require some deep analysis to figure out just how often the optimization would help, vs how often the complexity would add unnecessary cost, but I can imagine a way of doing it that wouldn't cost too much. (Just advance the base pointer and increment a "pre-list spare space" value. Then regular operations can ignore the spare space; only deallocation/resize need to worry about it (to get the real pointer for free()), and insert() can try to take advantage. It could even guess that it might be worth adding some wasted space, to cope with multiple insert()s.) And if someone proposed that, someone else would then say "It'd be so much more efficient if we explicitly tell the list constructor that it should do this, rather than have it do it all the time", and there you are, right back at deque. :) This, btw, is why I like Python's data type model ("have lots of 'em") rather than PHP's ("the Array will do everything, so use it"). If my code is slow because I used OrderedDict instead of list, I can fix my code. If my code is slow because I used Array instead of Array..... s'not a lot I can do about that. ChrisA
The reason C++ has so many data types is for performance reasons. If you have a constant list of items, you use a tuple. If you know the list isn't going to exceed a certain length, you use arrays. If you're mainly going to be iterating through it, you use list. Otherwise, you use vector. The reason Python doesn't need to worry about that is because it has entirely different goals. Antoine Pitrou <solipsis@pitrou.net> wrote:
On Sat, Feb 8, 2014 at 1:59 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
On Fri, 7 Feb 2014 22:45:33 +1100 Chris Angelico <rosuav@gmail.com> wrote:
But apart from maybe reducing the memory copying (the same optimization could mean that repeated pop(0) calls would incur less copying, too), there's not a huge gain.
If you think switching from O(n**2) to O(n) isn't a huge gain, then indeed :-)
Sure, that would be a huge gain. But anyone who's repeatedly calling pop(0) followed by insert(0,x) should be using deque rather than
On Sat, 8 Feb 2014 05:00:24 +1100 Chris Angelico <rosuav@gmail.com> wrote: list.
Indeed, since deque exists, it is the reasonable answer.
This whole discussion happens in a hypothetical setting where deque wouldn't exist and there would be an opportunity to make list a one-size-fits-all sequence type.
(note that my personal opinion is that built-in Python data types should be sufficiently powerful to cater for most use cases, rather than build a whole array of specialized collection types as in Java or C++)
Regards
Antoine.
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
-- Sent from my Android phone with K-9 Mail. Please excuse my brevity.
Ram Rachum <ram.rachum@gmail.com> writes:
I'm curious.
That kind of question is better on the general Python discussion forum <URL:http://www.python.org/community/lists/#comp-lang-python>. The ‘python-ideas’ forum is for discussing ideas to *change* Python behaviour in future versions. -- \ “Read not to contradict and confute, nor to believe and take | `\ for granted … but to weigh and consider.” —Francis Bacon | _o__) | Ben Finney
participants (10)
-
Antoine Pitrou
-
Ben Finney
-
Chris Angelico
-
Ned Batchelder
-
Ram Rachum
-
Ram Rachum
-
Ryan
-
spir
-
Stephen J. Turnbull
-
Stephen J. Turnbull