On Fri, Feb 21, 2020 at 12:43 AM Steven D'Aprano <steve@pearwood.info> wrote:

On Thu, Feb 20, 2020 at 02:19:13PM -0800, Stephan Hoyer wrote:

> > > Strong +1 for an array.zeros() constructor, and/or a lower level
> > array.empty() which doesn't pre-fill values.
> >
> > So it'd be a shorthand for something like this?
> >
> > >>> array.array("i", bytes(64))
> > array('i', [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
> >
> > It'd be convenient to specify the size as a number of array elements
> > rather than bytes. But I'm not a heavy user of array.array() so I
> > won't say either way as to whether this is needed.
>
>
> Yes, exactly.
>
> The main problem with array.array("i", bytes(64)) is that memory gets
> allocated twice, first to create the bytes() object and then to make the
> array(). This makes it unsuitable for high performance applications.

Got some actual measurements to demonstrate that initialising the array
is a bottleneck? Especially for something as small as 64, it seems
unlikely. If it were 64MB, that might be another story.

That's right, the real use-case is quickly deserializing large amounts of data (e.g., 100s of MB) from a wire format into a form suitable for fast analysis with NumPy or pandas. Unfortunately I can't share an actual code example, but this is a pretty common scenario in the data processing world, e.g., reminiscent of the use-cases for PEP 574 (https://www.python.org/dev/peps/pep-0574/).

The concern is not just speed (which I agree is probably not impacted too poorly by an extra copy) but also memory overhead. If the resulting array is 500 MB and deserialization can be done in a streaming fashion, I don't want to wastefully allocate another 500 MB just to do a memory copy.