Packing a long list of numbers into memory
Hello everyone! I need to pack a long list of numbers into shared memory, so I thought about using `struct.pack_into`. Its signature is struct.pack_into(format, buffer, offset, v1, v2, ...) I have a long list of nums (several millions), ended up doing the following: struct.pack_into(f'{len(nums)}Q', buf, 0, *nums) However, passing all nums as `*args` is very inefficient [0]. So I started wondering why we don't have something like: struct.pack_into(format, buffer, offset, values=values) which would receive the list of values directly. Is that because my particular case is very uncommon? Or maybe we *do* want this but we don't have it yet? Or do we already have a better way of doing this? Thanks! [0] https://linkode.org/#95ZZtVCIVtBbx72dURK7a4 -- . Facundo Blog: http://www.taniquetil.com.ar/plog/ PyAr: http://www.python.org.ar/ Twitter: @facundobatista
El dom, 10 de oct. de 2021 a la(s) 11:50, Serhiy Storchaka (storchaka@gmail.com) escribió:
10.10.21 17:19, Facundo Batista пише:
I have a long list of nums (several millions), ended up doing the following:
struct.pack_into(f'{len(nums)}Q', buf, 0, *nums)
Why not use array('Q', nums)?
You mean `array` from the `array` module? The only way I see using it is like the following:
shm = shared_memory.SharedMemory(create=True, size=total_size) a = array.array('Q', nums) shm.buf[l_offset:r_offset] = a.tobytes()
But I don't like it because of the `tobytes` call, which will produce a huge bytearray only to insert it in the shared memory buffer. That's why I liked `pack_into`, because it will write directly into the memory view. Or I'm missing something? Thanks! -- . Facundo Blog: http://www.taniquetil.com.ar/plog/ PyAr: http://www.python.org.ar/ Twitter: @facundobatista
Maybe instead of tobytes() you can use memoryview(). On Sun, Oct 10, 2021 at 08:21 Facundo Batista <facundobatista@gmail.com> wrote:
El dom, 10 de oct. de 2021 a la(s) 11:50, Serhiy Storchaka (storchaka@gmail.com) escribió:
10.10.21 17:19, Facundo Batista пише:
I have a long list of nums (several millions), ended up doing the
following:
struct.pack_into(f'{len(nums)}Q', buf, 0, *nums)
Why not use array('Q', nums)?
You mean `array` from the `array` module? The only way I see using it is like the following:
shm = shared_memory.SharedMemory(create=True, size=total_size) a = array.array('Q', nums) shm.buf[l_offset:r_offset] = a.tobytes()
But I don't like it because of the `tobytes` call, which will produce a huge bytearray only to insert it in the shared memory buffer.
That's why I liked `pack_into`, because it will write directly into the memory view.
Or I'm missing something?
Thanks!
-- . Facundo
Blog: http://www.taniquetil.com.ar/plog/ PyAr: http://www.python.org.ar/ Twitter: @facundobatista _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/52NGDRMD... Code of Conduct: http://python.org/psf/codeofconduct/
-- --Guido (mobile)
You can take a memory view of the array directly: memoryview(array.array("Q", range(1000))) If your exact use-case is writing to a SharedMemory, then I don't think there is any simple way to do it without creating some intermediate memory buffer. (other than using struct.pack_into, or packing the values completely manually) On 10/10/2021 16:18, Facundo Batista wrote:
El dom, 10 de oct. de 2021 a la(s) 11:50, Serhiy Storchaka (storchaka@gmail.com) escribió:
10.10.21 17:19, Facundo Batista пише:
I have a long list of nums (several millions), ended up doing the following:
struct.pack_into(f'{len(nums)}Q', buf, 0, *nums) Why not use array('Q', nums)? You mean `array` from the `array` module? The only way I see using it is like the following:
shm = shared_memory.SharedMemory(create=True, size=total_size) a = array.array('Q', nums) shm.buf[l_offset:r_offset] = a.tobytes() But I don't like it because of the `tobytes` call, which will produce a huge bytearray only to insert it in the shared memory buffer.
That's why I liked `pack_into`, because it will write directly into the memory view.
Or I'm missing something?
Thanks!
10.10.21 18:18, Facundo Batista пише:
You mean `array` from the `array` module? The only way I see using it is like the following:
shm = shared_memory.SharedMemory(create=True, size=total_size) a = array.array('Q', nums) shm.buf[l_offset:r_offset] = a.tobytes()
But I don't like it because of the `tobytes` call, which will produce a huge bytearray only to insert it in the shared memory buffer.
That's why I liked `pack_into`, because it will write directly into the memory view.
Or I'm missing something?
shm.buf[l_offset:r_offset].cast('Q')[:] = a or shm.buf[l_offset:r_offset] = memoryview(a).cast('B')
This still isn't a completely direct write - you're still creating an array in between which is then copied into shm, whereas struct.pack_into writes directly into the shared memory with no intermediate buffer. And unfortunately you can't do shm.buf[l_offset:r_offset].cast('Q')[:] = nums where nums is a plain Python iterable; it has to already be an array, so it still requires a copy. On 10/10/2021 16:57, Serhiy Storchaka wrote:
10.10.21 18:18, Facundo Batista пише:
You mean `array` from the `array` module? The only way I see using it is like the following:
shm = shared_memory.SharedMemory(create=True, size=total_size) a = array.array('Q', nums) shm.buf[l_offset:r_offset] = a.tobytes() But I don't like it because of the `tobytes` call, which will produce a huge bytearray only to insert it in the shared memory buffer.
That's why I liked `pack_into`, because it will write directly into the memory view.
Or I'm missing something? shm.buf[l_offset:r_offset].cast('Q')[:] = a
or
shm.buf[l_offset:r_offset] = memoryview(a).cast('B')
_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/A2XKNKG4... Code of Conduct: http://python.org/psf/codeofconduct/
10.10.21 19:05, Patrick Reader пише:
This still isn't a completely direct write - you're still creating an array in between which is then copied into shm, whereas struct.pack_into writes directly into the shared memory with no intermediate buffer.
And unfortunately you can't do
shm.buf[l_offset:r_offset].cast('Q')[:] = nums
where nums is a plain Python iterable; it has to already be an array, so it still requires a copy.
Yes, I tried this too. Maybe we will add a support of arbitrary iterables. There may be problems with this because we don't know the length of the iterable and can't guarantee that all items are convertible without converting them and saving results in a temporary array. If you do not want to spend additional memory, iterate nums and set items one by one. Or iterate and set them by chunks target = shm.buf[l_offset:r_offset].cast('Q') for i in range(0, len(nums), chunksize): j = min(i + chunksize, len(nums)) target[i:j] = array.array('Q', nums[i:j])
On Sun, Oct 10, 2021 at 7:25 AM Facundo Batista <facundobatista@gmail.com> wrote:
Hello everyone!
I need to pack a long list of numbers into shared memory, so I thought about using `struct.pack_into`.
Its signature is
struct.pack_into(format, buffer, offset, v1, v2, ...)
I have a long list of nums (several millions), ended up doing the following:
struct.pack_into(f'{len(nums)}Q', buf, 0, *nums)
However, passing all nums as `*args` is very inefficient [0]. So I started wondering why we don't have something like:
struct.pack_into(format, buffer, offset, values=values)
which would receive the list of values directly.
Is that because my particular case is very uncommon? Or maybe we *do* want this but we don't have it yet? Or do we already have a better way of doing this?
Thanks!
My first reaction on seeing things like this is "Why not use a numpy.array?" Does what you have really need to be a long list? If so, that's already a huge amount of Python object storage as it is. Is it possible for your application to have kept that in a numpy array for the entirety of the data lifetime? https://numpy.org/doc/stable/reference/routines.array-creation.html I'm not saying the stdlib shouldn't have a better way to do this by not abusing *args as an API, just that other libraries solve the larger problem of data-memory-inefficiency in their own way already. *(neat tricks from others regarding stdlib array, shm, & memoryview even if... not ideal)* -gps
On 2021-10-10 19:20, Gregory P. Smith wrote:
On Sun, Oct 10, 2021 at 7:25 AM Facundo Batista <facundobatista@gmail.com <mailto:facundobatista@gmail.com>> wrote:
Hello everyone!
I need to pack a long list of numbers into shared memory, so I thought about using `struct.pack_into`.
Its signature is
struct.pack_into(format, buffer, offset, v1, v2, ...)
I have a long list of nums (several millions), ended up doing the following:
struct.pack_into(f'{len(nums)}Q', buf, 0, *nums)
However, passing all nums as `*args` is very inefficient [0]. So I started wondering why we don't have something like:
struct.pack_into(format, buffer, offset, values=values)
which would receive the list of values directly.
Is that because my particular case is very uncommon? Or maybe we *do* want this but we don't have it yet? Or do we already have a better way of doing this?
Thanks!
[0] https://linkode.org/#95ZZtVCIVtBbx72dURK7a4 <https://linkode.org/#95ZZtVCIVtBbx72dURK7a4>
My first reaction on seeing things like this is "Why not use a numpy.array?"
Does what you have really need to be a long list? If so, that's already a huge amount of Python object storage as it is. Is it possible for your application to have kept that in a numpy array for the entirety of the data lifetime? https://numpy.org/doc/stable/reference/routines.array-creation.html <https://numpy.org/doc/stable/reference/routines.array-creation.html>
I'm not saying the stdlib shouldn't have a better way to do this by not abusing *args as an API, just that other libraries solve the larger problem of data-memory-inefficiency in their own way already.
/(neat tricks from others regarding stdlib array, shm, & memoryview even if... not ideal)/
Maybe what's needed is to add, say, '*' to the format string to indicate that multiple values should come from an iterable, e.g.: struct.pack_into(f'{len(nums)}*Q', buf, 0, nums) in this case len(nums) from the nums argument.
11.10.21 01:35, MRAB пише:
Maybe what's needed is to add, say, '*' to the format string to indicate that multiple values should come from an iterable, e.g.:
struct.pack_into(f'{len(nums)}*Q', buf, 0, nums)
in this case len(nums) from the nums argument.
El dom, 10 de oct. de 2021 a la(s) 15:20, Gregory P. Smith (greg@krypto.org) escribió:
Is that because my particular case is very uncommon? Or maybe we *do* want this but we don't have it yet? Or do we already have a better way of doing this?
Thanks!
My first reaction on seeing things like this is "Why not use a numpy.array?"
Yes, there is a plan to use `numpy.array` in another example (I'm trying to just use stdlib for pedagogical purposes). Thanks! -- . Facundo Blog: http://www.taniquetil.com.ar/plog/ PyAr: http://www.python.org.ar/ Twitter: @facundobatista
On Sun, 10 Oct 2021 11:19:44 -0300 Facundo Batista <facundobatista@gmail.com> wrote:
Hello everyone!
I need to pack a long list of numbers into shared memory, so I thought about using `struct.pack_into`.
Its signature is
struct.pack_into(format, buffer, offset, v1, v2, ...)
I have a long list of nums (several millions), ended up doing the following:
struct.pack_into(f'{len(nums)}Q', buf, 0, *nums)
However, passing all nums as `*args` is very inefficient [0]. So I started wondering why we don't have something like:
struct.pack_into(format, buffer, offset, values=values)
which would receive the list of values directly.
Just use `numpy.frombuffer` with your shared memory buffer and write into the Numpy array? https://numpy.org/doc/stable/reference/generated/numpy.frombuffer.html When you're looking to efficiently handle large volumes of primitive values such as integers, chances are Numpy already has the solution for you. Regards Antoine.
On Sun, Oct 10, 2021 at 4:23 PM Facundo Batista <facundobatista@gmail.com> wrote:
struct.pack_into(format, buffer, offset, v1, v2, ...)
I've encountered this wart with pack and pack_into too. The current interface makes sense when if v1, v2 are a small number of items from a data record, but it becomes a bit silly when v1, v2, ... are the elements of an array of 10 000 integers, for example. The option to take a list-like argument like you suggest is a good idea, but maybe this should have its own function or is just outside the scope of the built-in struct module. Multiprocessing sort of added support for this via multiprocessing.Array -- see https://stackoverflow.com/questions/9754034/can-i-create-a-shared-multiarray.... I haven't looked at what multiprocessing.Array does under the hood. Summary of the StackOverflow answer for those who don't feel like clicking: mp_arr = mp.Array(c.c_double, size) # then in each new process create a new numpy array using: arr = np.frombuffer(mp_arr.get_obj()) - Simon
El lun, 11 de oct. de 2021 a la(s) 06:50, Simon Cross (hodgestar+pythondev@gmail.com) escribió:
Multiprocessing sort of added support for this via multiprocessing.Array -- see https://stackoverflow.com/questions/9754034/can-i-create-a-shared-multiarray.... I haven't looked at what multiprocessing.Array does under the hood.
Summary of the StackOverflow answer for those who don't feel like clicking:
mp_arr = mp.Array(c.c_double, size) # then in each new process create a new numpy array using: arr = np.frombuffer(mp_arr.get_obj())
Right, this is very close to what I had in mind for the "example with numpy.array" that I want to code next (as I just said in the response to Gregory, this is a series of implementations for pedagogical purposes trying to teach parallel processing). Thanks! -- . Facundo Blog: http://www.taniquetil.com.ar/plog/ PyAr: http://www.python.org.ar/ Twitter: @facundobatista
Sorry for the spam! A bunch of these were backed up in the moderation queue. I used the UI to set the list to auto-discard future messages from this address, but then clicked "Accept" in the mistaken sense of "yes, accept my request to auto-nuke this clown". But it took "Accept" to mean "sure thing, boss! I'll send this to everyone ASAP!!". Computer software. Everyone, please, get a useful job instead ;-) On Wed, Oct 20, 2021 at 10:43 PM <joeevansjoe6@gmail.com> wrote:
that is great to see this post . https://bit.ly/3C551OO _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/5BMKWRYJ... Code of Conduct: http://python.org/psf/codeofconduct/
participants (10)
-
Antoine Pitrou
-
Facundo Batista
-
Gregory P. Smith
-
Guido van Rossum
-
joeevansjoe6@gmail.com
-
MRAB
-
Patrick Reader
-
Serhiy Storchaka
-
Simon Cross
-
Tim Peters