[Numpy-discussion] Fast way to convert (nested) list to numpy object array?

Sebastian Berg sebastian at sipsolutions.net
Sat Jul 5 11:11:03 EDT 2014


On Fr, 2014-07-04 at 17:32 +0200, Marc Hulsman wrote:
> On 07/03/2014 02:44 PM, Sebastian Berg wrote:
> > True and true. I don't see a problem with fromiter being more general,
> > just someone has to sit down and add new error checks/cleanup stuff
> > for the object case. The assignment could probably also be optimized,
> > not sure how hard that is, I would expect it isn't that hard. As
> > usually, someone just needs to find time and the interest to actually
> > do it ;). - Sebastian 
> 
> I looked at the code of FromIter below.
> 
>     /*
>      * We would need to alter the memory RENEW code to decrement any
>      * reference counts before throwing away any memory.
>      */
>     if (PyDataType_REFCHK(dtype)) {
>         PyErr_SetString(PyExc_ValueError,
>                 "cannot create object arrays from iterator");
>         goto done;
>     }
> 
> 
> However, the memory renew code (which just reallocs the memory to
> increase the array size) uses
> a simple realloc. It seems to me that it is not necessary to adapt
> reference counts in this case (as the incref
> from the new memory compensates the decref from the memory that is
> removed)? For the addition of elements
> to the array, everything seems to be ok anyway, as setitem is used,
> which does the incref already.
> So I think it should be possible to just remove this check?
> 

Yes and no. I agree that the comment was just being overly careful,
since the renew will copy the pointers without calling Py_INCREF.
However, you *will* need to add new error cleanup logic in case the
iterator throws an error, or you run out of memory. Since then you need
to decref everything again.

> I did not yet look at the assignment issue,  had some difficulty finding
> the correct place in the code, does does
> anyone have any pointers were to look?
> 

This is handled by PyArray_CopyObject in arrayobject.c. The actual logic
is probably done by PyArray_GetArrayParamsFromObject in ctors.c, that is
a public function, so my guess is, you would have to create a new one
which allows passing in a maximum ndim and then make the old one call
that one with NPY_MAXDIMS (or whatever it was)

- Sebastian

> 
> 
> 
> >> The generic solution of adding an nmaxdim parameter to numpy.array would
> >> of course be even more ideal :)
> >>
> >>
> >>
> >> _______________________________________________
> >> NumPy-Discussion mailing list
> >> NumPy-Discussion at scipy.org
> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> >>
> >
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140705/3d67f7eb/attachment.sig>


More information about the NumPy-Discussion mailing list