On Fri, Feb 21, 2020 at 12:43 AM Steven D'Aprano <steve@pearwood.info> wrote:
On Thu, Feb 20, 2020 at 02:19:13PM -0800, Stephan Hoyer wrote:
Strong +1 for an array.zeros() constructor, and/or a lower level array.empty() which doesn't pre-fill values.
So it'd be a shorthand for something like this?
array.array("i", bytes(64)) array('i', [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
It'd be convenient to specify the size as a number of array elements rather than bytes. But I'm not a heavy user of array.array() so I won't say either way as to whether this is needed.
Yes, exactly.
The main problem with array.array("i", bytes(64)) is that memory gets allocated twice, first to create the bytes() object and then to make the array(). This makes it unsuitable for high performance applications.
Got some actual measurements to demonstrate that initialising the array is a bottleneck? Especially for something as small as 64, it seems unlikely. If it were 64MB, that might be another story.
That's right, the real use-case is quickly deserializing large amounts of data (e.g., 100s of MB) from a wire format into a form suitable for fast analysis with NumPy or pandas. Unfortunately I can't share an actual code example, but this is a pretty common scenario in the data processing world, e.g., reminiscent of the use-cases for PEP 574 ( https://www.python.org/dev/peps/pep-0574/). The concern is not just speed (which I agree is probably not impacted too poorly by an extra copy) but also memory overhead. If the resulting array is 500 MB and deserialization can be done in a streaming fashion, I don't want to wastefully allocate another 500 MB just to do a memory copy.