[Numpy-discussion] loadtxt and usecols

Benjamin Root ben.v.root at gmail.com
Tue Nov 10 10:24:40 EST 2015


Just pointing out np.loadtxt(..., ndmin=2) will always return a 2D array.
Notice that without that option, the result is effectively squeezed. So if
you don't specify that option, and you load up a CSV file with only one
row, you will get a very differently shaped array than if you load up a CSV
file with two rows.

Ben Root

On Tue, Nov 10, 2015 at 10:07 AM, Irvin Probst <
irvin.probst at ensta-bretagne.fr> wrote:

> On 10/11/2015 14:17, Sebastian Berg wrote:
>
>> Actually, it is the "sequence special case" type ;). (matlab does not
>> have this, since matlab always returns 2-D I realized).
>>
>> As I said, if usecols is like indexing, the result should mimic:
>>
>> arr = np.loadtxt(f)
>> arr = arr[usecols]
>>
>> in which case a 1-D array is returned if you put in a scalar into
>> usecols (and you could even generalize usecols to higher dimensional
>> array-likes).
>> The way you implemented it -- which is fine, but I want to stress that
>> there is a real decision being made here --, you always see it as a
>> sequence but allow a scalar for convenience (i.e. always return a 2-D
>> array). It is a `sequence of ints or int` type argument and not an
>> array-like argument in my opinion.
>>
>
> I think we have two separate problems here:
>
> The first one is whether loadtxt should always return a 2D array or should
> it match the shape of the usecol argument. From a CS guy point of view I do
> understand your concern here. Now from a teacher point of view I know many
> people expect to get a "matrix" (thank you Matlab...) and the "purity" of
> matching the dimension of the usecol variable will be seen by many people
> [1] as a nerdy useless heavyness noone cares of (no offense). So whatever
> you, seadoned numpy devs from this mailing list, decide I think it should
> be explained in the docstring with a very clear wording.
>
> My own opinion on this first problem is that loadtxt() should always
> return a 2D array, no less, no more. If I write np.loadtxt(f)[42] it means
> I want to read the whole file and then I explicitely ask for transforming
> the 2-D array loadtxt() returned into a 1-D array. Otoh if I write
> loadtxt(f, usecol=42) it means I don't want to read the other columns and I
> want only this one, but it does not mean that I want to change the returned
> array from 2-D to 1-D. I know this new behavior might break a lot of
> existing code as usecol=(42,) used to return a 1-D array, but
> usecol=((((42,)))) also returns a 1-D array so the current behavior is not
> consistent imho.
>
> The second problem is about the wording in the docstring, when I see
> "sequence of int or int" I uderstand I will have to cast into a 1-D python
> list whatever wicked N-dimensional object I use to store my column indexes,
> or hope list(my_object) will do it fine. On the other hand when I read
> "array-like" the function is telling me I don't have to worry about my
> object, as long as numpy knows how to cast it into an array it will be fine.
>
> Anyway I think something like that:
>
> import numpy as np
> a=[[[2,],[],[],],[],[],[]]
> foo=np.loadtxt("CONCARNEAU_2010.txt", usecols=a)
>
> should just work and return me a 2-D (or 1-D if you like) array with the
> data I asked for and I don't think "a" here is an int or a sequence of int
> (but it's a good example of why loadtxt() should not match the shape of the
> usecol argument).
>
> To make it short, let the reading function read the data in a consistent
> and predictible way and then let the user explicitely change the data's
> shape into anything he likes.
>
> Regards.
>
> [1] read non CS people trying to switch to numpy/scipy
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20151110/b4ff47e5/attachment.html>


More information about the NumPy-Discussion mailing list