[Numpy-discussion] loadtxt and usecols

Sebastian Berg sebastian at sipsolutions.net
Tue Nov 10 03:19:33 EST 2015

On Mo, 2015-11-09 at 20:36 +0100, Ralf Gommers wrote:
> On Mon, Nov 9, 2015 at 7:42 PM, Benjamin Root <ben.v.root at gmail.com>
> wrote:
>         My personal rule for flexible inputs like that is that it
>         should be encouraged so long as it does not introduce
>         ambiguity. Furthermore, Allowing a scalar as an input doesn't
>         add a congitive disconnect on the user on how to specify
>         multiple columns. Therefore, I'd give this a +1.
>         On Mon, Nov 9, 2015 at 4:15 AM, Irvin Probst
>         <irvin.probst at ensta-bretagne.fr> wrote:
>                 Hi,
>                 I've recently seen many students, coming from Matlab,
>                 struggling against the usecols argument of loadtxt.
>                 Most of them tried something like:
>                 loadtxt("foo.bar", usecols=2) or the ones with better
>                 documentation reading skills tried loadtxt("foo.bar",
>                 usecols=(2)) but none of them understood they had to
>                 write usecols=[2] or usecols=(2,).
>                 Is there a policy in numpy stating that this kind of
>                 arguments must be sequences ?
> There isn't. In many/most cases it's array_like, which means scalar,
> sequence or array.

Agree, I think we have, or should have, to types of things there (well,
three since we certainly have "must be sequence").
Args such as "axes" which is typically just one, so we allow scalar, but
can often be generalized to a sequence. And things that are array-likes
(and broadcasting).

So, if this is an array-like, however, the "correct" result could be
different by broadcasting between `1` and `(1,)` analogous to indexing
the full array with usecols:

usecols=1 result:
array([2, 3, 4, 5])

usecols=(1,) result [1]:
array([[2, 3, 4, 5]])

since a scalar row (so just one row) is read and not a 2D array. I tend
to say it should be an array-like argument and not a generalized
sequence argument, just wanted to note that, since I am not sure what
matlab does.

- Sebastian

[1] could go further and do `usecols=[[1]]` and get
`array([[[2, 3, 4, 5]]])`

>                 I think that being able to an int or a sequence when a
>                 single column is needed would make this function a bit
>                 more user friendly for beginners. I would gladly
>                 submit a PR if noone disagrees.
> +1
> Ralf
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20151110/dc6d8dde/attachment.sig>

More information about the NumPy-Discussion mailing list