pop multiple elements of a list at once

Hi. As recommended here: http://bugs.python.org/issue9218 I am posting this to this list. I am currently working with buffer in an USB device and pyusb. So when i read from the buffer of endpoint, i get an array.Array() list. I handle this chunk of data with a thread to send a receive the information that i need. In this thread, i load a list with all the information that is read from the USB device, and another layer with pop this information from the threads buffer. The thing i found is that, to pop a variable chunk of data from this buffer without copying it and deleting the elements, i have to pop one element at the time. def get_chunk(self, size): for x in range(size): yield self.recv_buffer.pop() I guess that it would be improved if i can just pop a defined number of elements, like this: pop self.recv_buffer[:-size] or self.recv_buffer.pop(,-size) That would be... "pop from (the last element minus size) to (the last element)" in that way there is only one memory transaction. The new list (or maybe a tuple) points to the old memory address and the recv_buffer is advanced to a one new address. Data is not moved. Note that i like the idea of using "pop" as the "del" operator for lists, but i am concient that this would not be backward compatible. Thanks. Diego

On Sun, Jul 11, 2010 at 10:58, Diego Jacobi <jacobidiego@gmail.com> wrote:
Why can't you do ``del self.recv_buffer[-size:]``?
Note that i like the idea of using "pop" as the "del" operator for lists, but i am concient that this would not be backward compatible.
Too specialized, so that will never fly. -Brett

I think you overestimate how standardised we could make this across all platforms and data structures. Under the hood, any such expansion to the .pop API would almost certainly be defined as equivalent to: def pop(self, index): result = self[index] del self[index] return result such that slice objects could be passed in as well as integers (or integer equivalents). (Currently pop on builtin objects rejects slice objects, as it only works with integers) In the meantime, if you want to manipulate memory while minimising copying, then the 2.7 memoryview object may be for you (assuming you can switch to the later version). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 7/11/2010 1:58 PM, Diego Jacobi wrote:
In CPython, popping copies a reference and them deletes it from the list. The item popped is not copied. It is a convenience, which I proposed, but not a necessity. You can easily write a function that returns a slice after deleting it. def pop_slice(lis, n): tem = lis[:-n] del lis[:-n] return tem I expect this to run faster than popping more than a few items one at a time. -- Terry Jan Reedy

On Sun, Jul 11, 2010 at 7:58 PM, Diego Jacobi <jacobidiego@gmail.com> wrote:
I think yo misunderstand the implementation of lists (and the underlying malloc()). You can't break the memory used for the list elements into two pieces and give new ownership to a (leading) section of it. However you also seem to be worrying about "copying" too much -- the only things that get copied are the *pointers* to the objects popped off the stack, which is very cheap compared to the rest of the operation. It is true that to pop off a whole slice there is a more efficient way than calling pop() repeatedly -- but there's no need for a new primitive operation, as it can already be done by copying and then deleting the slice (again, the copying only copies the pointers). Try reading up on Python's memory model for objects, it will be quite enlightening. -- --Guido van Rossum (python.org/~guido)

On Mon, Jul 12, 2010 at 4:35 PM, Guido van Rossum <guido@python.org> wrote:
Note that the original poster was apparently talking about array.array() rather than an actual list (at least, that's the way I interpreted the phrase "array,Array() list"). In that context, the desire to avoid copying when invoking pop() makes a lot more sense than it does when using a builtin list. I agree that the suggestion of reassigning ownership of a chunk of an array is based on a misunderstanding of the way memory allocation works at the pymalloc and OS levels though. For the record, neither pymalloc nor the OS support breaking a chunk of already allocated memory in two that way - you need some master object to maintain control of it, and then use other pointers to look at subsections. Since memoryview objects in 3.x and 2.7 are designed specifically to provide a window onto a chunk of memory owned by another object (such as the storage underlying an array object) without copying, it seems like that is the kind of thing the original poster is actually looking for. (That said, supporting slice objects in pop() still doesn't strike me as an insane idea, although I'd probably want to see some use cases before we went to the hassle of adding it). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Hi. I apologize if i am having difficulties to explain myself in English. Also i just realise that i wasnt answering to the list, but directly to Brett Cannon. Sorry for that. Also i am taking into account that you are right and my concept of how memory is handled here for lists is way too different as how i am more used to work in firmwares. Anyway, the concept of my idea comes of my understanding of an low-leveled array. I do understand that poping returns only a pointer to that element. I didnt understand that every element inside a list is also a pointer to a python type, so the data is not copied, but the pointer is. The thing is that the elements on my list are just bytes. (unsigned char), and i think that an array of this type of data is called in python immutable, which means that i may not be using the right datatype. Anyway. slice support on pop(), and/or the ability to skip elements in a for loop without restarting the iteration will clear up a lot of code. If my scenario is yet no clear i give below some more details: When i think on how this will work on memory: def pop_slice(lis, n): tem = lis[:-n] del lis[:-n] return tem I think in "copying the elements of an array": BYTE* pop_slice(BYTE* list, unsigned int n){ BYTE* temp = malloc(n*sizeof(BYTE)); int i; for(i=0 ; i < n ; i++) { temp[i] = list[i]; } free(list, n); return temp; } Most python books and tutorials clearly says that this operation L[start:end] copies the elements requested. And being copy i understand the above behavior, which is less efficient than advancing the pointer. But i wanted to do (with "pop multiple bytes at once") is: typedef unsigned char BYTE; BYTE array[BIG_SIZE]; BYTE* incomming_buffer_pointer = &array[0]; BYTE* incomming_packet_pointer = &array[0]; BYTE* pop_slice(BYTE* list, unsigned int n){ BYTE* temp; temp = list; list = &list[n]; return temp; } .. incomming_packet_pointer = pop_slice( incomming_buffer_pointer , PACKET_SIZE) if (parse_packet_is_not_corrupt( incomming_packet_pointer ) ) parse_new_packet( incomming_packet_pointer ); else .... .. Thanks for analizing the idea. Jacobi Diego

On Tue, Jul 13, 2010 at 1:13 AM, Diego Jacobi <jacobidiego@gmail.com> wrote:
For builtin lists and most other Python container types, think more along the lines of a list of pointers rather than a list of numbers. However, for the behaviour of array.array (rather than the builtin list), your understanding of the cost of slicing was actually pretty close to correct (since storing actual values rather than pointers is the way array.array gains its comparative memory efficiency). Python doesn't actually excel at memory efficiency when manipulating large data sets using conventional syntax (e.g. slicing). The old buffer objects were an initial attempt at providing memory efficient access to segments of data buffers, while the recently added memoryview objects are a more sophisticated (and safer) approach to the same idea. The NumPy extension, on the other hand, is able to very efficiently provide multiple views of the same region of memory without requiring copying (NumPy itself isn't particularly small though). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sun, Jul 11, 2010 at 10:58, Diego Jacobi <jacobidiego@gmail.com> wrote:
Why can't you do ``del self.recv_buffer[-size:]``?
Note that i like the idea of using "pop" as the "del" operator for lists, but i am concient that this would not be backward compatible.
Too specialized, so that will never fly. -Brett

I think you overestimate how standardised we could make this across all platforms and data structures. Under the hood, any such expansion to the .pop API would almost certainly be defined as equivalent to: def pop(self, index): result = self[index] del self[index] return result such that slice objects could be passed in as well as integers (or integer equivalents). (Currently pop on builtin objects rejects slice objects, as it only works with integers) In the meantime, if you want to manipulate memory while minimising copying, then the 2.7 memoryview object may be for you (assuming you can switch to the later version). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 7/11/2010 1:58 PM, Diego Jacobi wrote:
In CPython, popping copies a reference and them deletes it from the list. The item popped is not copied. It is a convenience, which I proposed, but not a necessity. You can easily write a function that returns a slice after deleting it. def pop_slice(lis, n): tem = lis[:-n] del lis[:-n] return tem I expect this to run faster than popping more than a few items one at a time. -- Terry Jan Reedy

On Sun, Jul 11, 2010 at 7:58 PM, Diego Jacobi <jacobidiego@gmail.com> wrote:
I think yo misunderstand the implementation of lists (and the underlying malloc()). You can't break the memory used for the list elements into two pieces and give new ownership to a (leading) section of it. However you also seem to be worrying about "copying" too much -- the only things that get copied are the *pointers* to the objects popped off the stack, which is very cheap compared to the rest of the operation. It is true that to pop off a whole slice there is a more efficient way than calling pop() repeatedly -- but there's no need for a new primitive operation, as it can already be done by copying and then deleting the slice (again, the copying only copies the pointers). Try reading up on Python's memory model for objects, it will be quite enlightening. -- --Guido van Rossum (python.org/~guido)

On Mon, Jul 12, 2010 at 4:35 PM, Guido van Rossum <guido@python.org> wrote:
Note that the original poster was apparently talking about array.array() rather than an actual list (at least, that's the way I interpreted the phrase "array,Array() list"). In that context, the desire to avoid copying when invoking pop() makes a lot more sense than it does when using a builtin list. I agree that the suggestion of reassigning ownership of a chunk of an array is based on a misunderstanding of the way memory allocation works at the pymalloc and OS levels though. For the record, neither pymalloc nor the OS support breaking a chunk of already allocated memory in two that way - you need some master object to maintain control of it, and then use other pointers to look at subsections. Since memoryview objects in 3.x and 2.7 are designed specifically to provide a window onto a chunk of memory owned by another object (such as the storage underlying an array object) without copying, it seems like that is the kind of thing the original poster is actually looking for. (That said, supporting slice objects in pop() still doesn't strike me as an insane idea, although I'd probably want to see some use cases before we went to the hassle of adding it). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Hi. I apologize if i am having difficulties to explain myself in English. Also i just realise that i wasnt answering to the list, but directly to Brett Cannon. Sorry for that. Also i am taking into account that you are right and my concept of how memory is handled here for lists is way too different as how i am more used to work in firmwares. Anyway, the concept of my idea comes of my understanding of an low-leveled array. I do understand that poping returns only a pointer to that element. I didnt understand that every element inside a list is also a pointer to a python type, so the data is not copied, but the pointer is. The thing is that the elements on my list are just bytes. (unsigned char), and i think that an array of this type of data is called in python immutable, which means that i may not be using the right datatype. Anyway. slice support on pop(), and/or the ability to skip elements in a for loop without restarting the iteration will clear up a lot of code. If my scenario is yet no clear i give below some more details: When i think on how this will work on memory: def pop_slice(lis, n): tem = lis[:-n] del lis[:-n] return tem I think in "copying the elements of an array": BYTE* pop_slice(BYTE* list, unsigned int n){ BYTE* temp = malloc(n*sizeof(BYTE)); int i; for(i=0 ; i < n ; i++) { temp[i] = list[i]; } free(list, n); return temp; } Most python books and tutorials clearly says that this operation L[start:end] copies the elements requested. And being copy i understand the above behavior, which is less efficient than advancing the pointer. But i wanted to do (with "pop multiple bytes at once") is: typedef unsigned char BYTE; BYTE array[BIG_SIZE]; BYTE* incomming_buffer_pointer = &array[0]; BYTE* incomming_packet_pointer = &array[0]; BYTE* pop_slice(BYTE* list, unsigned int n){ BYTE* temp; temp = list; list = &list[n]; return temp; } .. incomming_packet_pointer = pop_slice( incomming_buffer_pointer , PACKET_SIZE) if (parse_packet_is_not_corrupt( incomming_packet_pointer ) ) parse_new_packet( incomming_packet_pointer ); else .... .. Thanks for analizing the idea. Jacobi Diego

On Tue, Jul 13, 2010 at 1:13 AM, Diego Jacobi <jacobidiego@gmail.com> wrote:
For builtin lists and most other Python container types, think more along the lines of a list of pointers rather than a list of numbers. However, for the behaviour of array.array (rather than the builtin list), your understanding of the cost of slicing was actually pretty close to correct (since storing actual values rather than pointers is the way array.array gains its comparative memory efficiency). Python doesn't actually excel at memory efficiency when manipulating large data sets using conventional syntax (e.g. slicing). The old buffer objects were an initial attempt at providing memory efficient access to segments of data buffers, while the recently added memoryview objects are a more sophisticated (and safer) approach to the same idea. The NumPy extension, on the other hand, is able to very efficiently provide multiple views of the same region of memory without requiring copying (NumPy itself isn't particularly small though). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
participants (6)
-
Antoine Pitrou
-
Brett Cannon
-
Diego Jacobi
-
Guido van Rossum
-
Nick Coghlan
-
Terry Reedy