[Numpy-discussion] Fast way to convert (nested) list to numpy object array?

Marc Hulsman m.hulsman at tudelft.nl
Thu Jul 3 08:36:17 EDT 2014


On 07/03/2014 11:43 AM, Julian Taylor wrote:
> On second though I guess adding a short circuit to the dimension
> discovery on mismatching list length with object type should solve the
> issue too. A bit more information on the use case would still be
> useful, why do you need to use numpy arrays for this in the first place?

I use numpy as the base for a prototype data handling language (which
matches dimensions not on position as in numpy, but by identity).
This allows SQL like operations on complex data structures. The code has
to be generic, to handle the corner cases. Numpy is used as it
provides the fast indicing/ufuncs.

Input is often formatted using regular Python constructs. This input
data is 'unpacked' to a certain depth, which means
that it is converted to numpy arrays, to allow for generic query
operations.

This can however go wrong. Say that we have nested variable length
lists, what sometimes happens is that part of the data has
(by chance) only fixed length nested lists, while another part has
variable length nested lists. If we then unpack, numpy will for
the first case construct a multi-dimensional array, while for the second
case it will construct a single-dimensional
array of nested lists. If we then want to e.g. concatenate this data
using a generic operation, it will have trouble to handle the mix of
multi-dimensional and 1-dimensional arrays.  The code becomes quite a
bit simpler if I know at forehand that I can expect just e.g.
a 1-dimensional array.

This is maybe somewhat of a corner case :) However, I was still
wondering why, when assigning x[:] = k, k is still 'descended into'
further than needed given the limited dimension of x. This seems
unnecessary? Also, it is also not really clear to me why fromiter
does not work using object dtypes. A solution for these two more general
problems would already help me a lot.

The generic solution of adding an nmaxdim parameter to numpy.array would
of course be even more ideal :)






More information about the NumPy-Discussion mailing list