[Numpy-discussion] loadtxt and usecols

Irvin Probst irvin.probst at ensta-bretagne.fr
Tue Nov 10 10:07:13 EST 2015


On 10/11/2015 14:17, Sebastian Berg wrote:
> Actually, it is the "sequence special case" type ;). (matlab does not
> have this, since matlab always returns 2-D I realized).
>
> As I said, if usecols is like indexing, the result should mimic:
>
> arr = np.loadtxt(f)
> arr = arr[usecols]
>
> in which case a 1-D array is returned if you put in a scalar into
> usecols (and you could even generalize usecols to higher dimensional
> array-likes).
> The way you implemented it -- which is fine, but I want to stress that
> there is a real decision being made here --, you always see it as a
> sequence but allow a scalar for convenience (i.e. always return a 2-D
> array). It is a `sequence of ints or int` type argument and not an
> array-like argument in my opinion.

I think we have two separate problems here:

The first one is whether loadtxt should always return a 2D array or 
should it match the shape of the usecol argument. From a CS guy point of 
view I do understand your concern here. Now from a teacher point of view 
I know many people expect to get a "matrix" (thank you Matlab...) and 
the "purity" of matching the dimension of the usecol variable will be 
seen by many people [1] as a nerdy useless heavyness noone cares of (no 
offense). So whatever you, seadoned numpy devs from this mailing list, 
decide I think it should be explained in the docstring with a very clear 
wording.

My own opinion on this first problem is that loadtxt() should always 
return a 2D array, no less, no more. If I write np.loadtxt(f)[42] it 
means I want to read the whole file and then I explicitely ask for 
transforming the 2-D array loadtxt() returned into a 1-D array. Otoh if 
I write loadtxt(f, usecol=42) it means I don't want to read the other 
columns and I want only this one, but it does not mean that I want to 
change the returned array from 2-D to 1-D. I know this new behavior 
might break a lot of existing code as usecol=(42,) used to return a 1-D 
array, but usecol=((((42,)))) also returns a 1-D array so the current 
behavior is not consistent imho.

The second problem is about the wording in the docstring, when I see 
"sequence of int or int" I uderstand I will have to cast into a 1-D 
python list whatever wicked N-dimensional object I use to store my 
column indexes, or hope list(my_object) will do it fine. On the other 
hand when I read "array-like" the function is telling me I don't have to 
worry about my object, as long as numpy knows how to cast it into an 
array it will be fine.

Anyway I think something like that:

import numpy as np
a=[[[2,],[],[],],[],[],[]]
foo=np.loadtxt("CONCARNEAU_2010.txt", usecols=a)

should just work and return me a 2-D (or 1-D if you like) array with the 
data I asked for and I don't think "a" here is an int or a sequence of 
int (but it's a good example of why loadtxt() should not match the shape 
of the usecol argument).

To make it short, let the reading function read the data in a consistent 
and predictible way and then let the user explicitely change the data's 
shape into anything he likes.

Regards.

[1] read non CS people trying to switch to numpy/scipy



More information about the NumPy-Discussion mailing list