[Numpy-discussion] Added atleast_nd, request for clarification/cleanup of atleast_3d

Nathaniel Smith njs at pobox.com
Wed Jul 6 18:22:01 EDT 2016

On Wed, Jul 6, 2016 at 1:56 PM, Ralf Gommers <ralf.gommers at gmail.com> wrote:
> On Wed, Jul 6, 2016 at 6:26 PM, Nathaniel Smith <njs at pobox.com> wrote:
>> On Jul 5, 2016 11:21 PM, "Ralf Gommers" <ralf.gommers at gmail.com> wrote:
>> >
>> >
>> >
>> > On Wed, Jul 6, 2016 at 7:06 AM, Nathaniel Smith <njs at pobox.com> wrote:
>> >
>> >> On Jul 5, 2016 9:09 PM, "Joseph Fox-Rabinovitz"
>> >> <jfoxrabinovitz at gmail.com> wrote:
>> >> >
>> >> > Hi,
>> >> >
>> >> > I have generalized np.atleast_1d, np.atleast_2d, np.atleast_3d with a
>> >> > function np.atleast_nd in PR#7804
>> >> > (https://github.com/numpy/numpy/pull/7804).
>> >> >
>> >> > As a result of this PR, I have a couple of questions about
>> >> > `np.atleast_3d`. `np.atleast_3d` appears to do something weird with
>> >> > the dimensions: If the input is 1D, it prepends and appends a size-1
>> >> > dimension. If the input is 2D, it appends a size-1 dimension. This is
>> >> > inconsistent with `np.atleast_2d`, which always prepends (as does
>> >> > `np.atleast_nd`).
>> >> >
>> >> >   - Is there any reason for this behavior?
>> >> >   - Can it be cleaned up (e.g., by reimplementing `np.atleast_3d` in
>> >> > terms of `np.atleast_nd`, which is actually much simpler)? This would
>> >> > be a slight API change since the output would not be exactly the
>> >> > same.
>> >>
>> >> Changing atleast_3d seems likely to break a bunch of stuff...
>> >>
>> >> Beyond that, I find it hard to have an opinion about the best design
>> >> for these functions, because I don't think I've ever encountered a situation
>> >> where they were actually what I wanted. I'm not a big fan of coercing
>> >> dimensions in the first place, for the usual "refuse to guess" reasons. And
>> >> then generally if I do want to coerce an array to another dimension, then I
>> >> have some opinion about where the new dimensions should go, and/or I have
>> >> some opinion about the minimum acceptable starting dimension, and/or I have
>> >> a maximum dimension in mind. (E.g. "coerce 1d inputs into a column matrix;
>> >> 0d or 3d inputs are an error" -- atleast_2d is zero-for-three on that
>> >> requirements list.)
>> >>
>> >> I don't know how typical I am in this. But it does make me wonder if
>> >> the atleast_* functions act as an attractive nuisance, where new users take
>> >> their presence as an implicit recommendation that they are actually a useful
>> >> thing to reach for, even though they... aren't that. And maybe we should be
>> >> recommending folk move away from them rather than trying to extend them
>> >> further?
>> >>
>> >> Or maybe they're totally useful and I'm just missing it. What's your
>> >> use case that motivates atleast_nd?
>> >
>> > I think you're just missing it:) atleast_1d/2d are used quite a bit in
>> > Scipy and Statsmodels (those are the only ones I checked), and in the large
>> > majority of cases it's the best thing to use there. There's a bunch of
>> > atleast_2d calls with a transpose appended because the input needs to be
>> > treated as columns instead of rows, but that's still efficient and readable
>> > enough.
>> I know people *use* it :-). What I'm confused about is in what situations
>> you would invent it if it didn't exist. Can you point me to an example or
>> two where it's "the best thing"? I actually had statsmodels in mind with my
>> example of wanting the semantics "coerce 1d inputs into a column matrix; 0d
>> or 3d inputs are an error". I'm surprised if there are places where you
>> really want 0d arrays converted into 1x1,
> Scalar to shape (1,1) is less common, but 1-D to 2-D or scalar to shape (1,)
> is very common.

That's ravel, though, not atleast_*, right?

> Example is at the top of scipy/stats/stats.py: the
> _chk_asarray functions (used in many other functions)

I feel like this actually argues for my point :-). scipy.stats needs
some uniform prepping of input, so there's a helper function to do
that, and the helper function's semantics are not at all the semantics
of atleast_*. And they don't even use atleast_* in any necessary way
-- the only thing they do is

if arr.ndim ==0:
    arr = np.atleast_1d(arr)

but this could be written just as well as

if arr.ndim == 0:
    arr = arr[np.newaxis]

(In any case, atleast_1d definitely makes more sense to me than any of
the others, since it so obviously corresponds to exactly that 2-line
incantation as the only reasonable implementation.)

> take care to never
> return scalar arrays because those are plain annoying to deal with. If that
> sounds weird to you, you're probably one of those people who was never
> surprised by this:
> In [3]: x0 = np.array(1)
> In [4]: x1 = np.array([1])
> In [5]: x0[0]
> ---------------------------------------------------------------------------
> IndexError                                Traceback (most recent call last)
> <ipython-input-5-6a57e371ca72> in <module>()
> ----> 1 x0[0]
> IndexError: too many indices for array
> In [6]: x1[0]
> Out[6]: 1

I was surprised by it the first time I hit it, but then thought it
over and decided that it was better than the alternatives :-).

(It does strike me as really odd, I would even say a bug, that e.g.
scipy.stats.mode returns a 1d array for 1d input, and a 2d array (!)
for 2d input. Mode is semantically a reduction operation, and we have
pretty strong conventions for how those work -- drop a dimension
unless keepdims=True. Obviously this is old existing code, and that's
fine, but it's not how we'd recommend people write new code, I think?
I guess this is orthogonal to the whole atleast_* discussion anyway

>> or want to allow high dimensional arrays to pass through - and if you do
>> want to allow high dimensional arrays to pass through, then transposing
>> might help with 2d cases but will silently mangle high-d cases, right?
>>2d input handling is usually irrelevant. The vast majority of cases is
>> "function that accepts scalar and 1-D array" or "function that accepts 1-D
>> and 2-D arrays".

So maybe we should have functions that actually handle those cases,
instead of recommending atleast_*?


Nathaniel J. Smith -- https://vorpus.org

More information about the NumPy-Discussion mailing list