NEP 34 - deprecate automatic dtype=object on ragged arrays
After a few iterations by reviewers, I would like to submit NEP 34 to deprecate automatically using dtype=object for ragged arrays.
and an associated PR for the implementation
When users create arrays with sequences-of-sequences, they sometimes err in matching the lengths of the nested sequences_, commonly called "ragged arrays". Here we will refer to them as ragged nested sequences. Creating such arrays via ``np.array([<ragged_nested_sequence>])`` with no ``dtype`` keyword argument will today default to an ``object``-dtype array. Change the behaviour to raise a ``ValueError`` instead.
Motivation and Scope --------------------
Users who specify lists-of-lists when creating a `numpy.ndarray` via ``np.array`` may mistakenly pass in lists of different lengths. Currently we accept this input and automatically create an array with ``dtype=object``. This can be confusing, since it is rarely what is desired. Changing the automatic dtype detection to never return ``object`` for ragged nested sequences (defined as a recursive sequence of sequences, where not all the sequences on the same level have the same length) will force users who actually wish to create ``object`` arrays to specify that explicitly. Note that ``lists``, ``tuples``, and ``nd.ndarrays`` are all sequences _. See for instance `issue 5303`_.
Usage and Impact ----------------
After this change, array creation with ragged nested sequences must explicitly define a dtype:
>>> np.array([[1, 2], ]) ValueError: cannot guess the desired dtype from the input
>>> np.array([[1, 2], ], dtype=object) # succeeds, with no change from current behaviour
The deprecation will affect any call that internally calls ``np.asarray``. For instance, the ``assert_equal`` family of functions calls ``np.asarray``, so users will have to change code like::
np.assert_equal(a, [[1, 2], 3])
np.assert_equal(a, np.array([[1, 2], 3], dtype=object)