Mailman 3 unpack argument in loadtxt/genfromtxt does not work as documented - NumPy-Discussion

newer
Re: [Numpy-discussion] ndarray of...

unpack argument in loadtxt/genfromtxt does not work as documented

older
numpy mac binary for Python 2.7:...

Chris Fonnesbeck

Oct. 7, 2010

2:48 a.m.

The documentation for loadtxt and genfromtxt state that the unpack argument functions as follows: If True, the returned array is transposed, so that arguments may be unpacked using x, y, z = loadtxt(...). In practice, this does not always occur. I have a csv file of mixed data types, and try importing it via: genfromtxt("progestogens.csv", delimiter=",", names=True, dtype=dtype([('id', int),('study', '|S25'),('year', int),('treat', int),('drug', '|S25'),('form', '|S10'),('ptb', int),('mgest', int),('lab', int),('n', int),('y', int),('var', '|S5'),('wt', int),('sdwt', int)]), unpack=True) With unpack=True, I would expect the data to be presented by columns, however the resulting array is by rows: array([(1, 'Meis', 2003, 1, '17P', 'IM', 1, 0, 0, 306, 111, 'ptb'), (1, 'Meis', 2003, 0, '17P', 'IM', 1, 0, 0, 153, 84, 'ptb'), (2, 'Rai', 2009, 1, 'Progesterone', 'Oral', 1, 0, 0, 74, 29, 'ptb'), (2, 'Rai', 2009, 0, 'Progesterone', 'Oral', 1, 0, 0, 74, 44, 'ptb'), ... The same behaviour occurs using loadtxt. Moreover, this array is untransposeable, so I am stuck with having to iterate over all the rows, making genfromtxt no better than csv.reader.

Show replies by date

Pierre GM

October 2010

9 a.m.

New subject: unpack argument in loadtxt/genfromtxt does not work as documented

On Oct 7, 2010, at 4:48 AM, Chris Fonnesbeck wrote:

...

The documentation for loadtxt and genfromtxt state that the unpack argument functions as follows:

If True, the returned array is transposed, so that arguments may be unpacked using x, y, z = loadtxt(...).

Provided that all the columns have the same dtype

...

In practice, this does not always occur. I have a csv file of mixed data types, and try importing it via:

genfromtxt("progestogens.csv", delimiter=",", names=True, dtype=dtype([('id', int),('study', '|S25'),('year', int),('treat', int),('drug', '|S25'),('form', '|S10'),('ptb', int),('mgest', int),('lab', int),('n', int),('y', int),('var', '|S5'),('wt', int),('sdwt', int)]), unpack=True)

With unpack=True, I would expect the data to be presented by columns, however the resulting array is by rows:

Well, you have a complex dtype, so your result array is 1D, each row corresponding to a tuple of elements with different dtypes.

...

array([(1, 'Meis', 2003, 1, '17P', 'IM', 1, 0, 0, 306, 111, 'ptb'), (1, 'Meis', 2003, 0, '17P', 'IM', 1, 0, 0, 153, 84, 'ptb'), (2, 'Rai', 2009, 1, 'Progesterone', 'Oral', 1, 0, 0, 74, 29, 'ptb'), (2, 'Rai', 2009, 0, 'Progesterone', 'Oral', 1, 0, 0, 74, 44, 'ptb'), ...

The same behaviour occurs using loadtxt. Moreover, this array is untransposeable, so I am stuck with having to iterate over all the rows, making genfromtxt no better than csv.reader.

Once again, your array is 1D, so you can't tranpose it. Now, you should be able to get each column through

...

...
...
[a[_] for _ in a.dtype.names]

Benjamin Root

1:49 p.m.

New subject: unpack argument in loadtxt/genfromtxt does not work as documented

On Thu, Oct 7, 2010 at 4:00 AM, Pierre GM <pgmdevlist@gmail.com> wrote:

...

On Oct 7, 2010, at 4:48 AM, Chris Fonnesbeck wrote:

...
The documentation for loadtxt and genfromtxt state that the unpack argument functions as follows:

If True, the returned array is transposed, so that arguments may be unpacked using x, y, z = loadtxt(...).

Provided that all the columns have the same dtype

...
In practice, this does not always occur. I have a csv file of mixed data types, and try importing it via:

genfromtxt("progestogens.csv", delimiter=",", names=True, dtype=dtype([('id', int),('study', '|S25'),('year', int),('treat', int),('drug', '|S25'),('form', '|S10'),('ptb', int),('mgest', int),('lab', int),('n', int),('y', int),('var', '|S5'),('wt', int),('sdwt', int)]), unpack=True)

With unpack=True, I would expect the data to be presented by columns, however the resulting array is by rows:

Well, you have a complex dtype, so your result array is 1D, each row corresponding to a tuple of elements with different dtypes.

...
array([(1, 'Meis', 2003, 1, '17P', 'IM', 1, 0, 0, 306, 111, 'ptb'), (1, 'Meis', 2003, 0, '17P', 'IM', 1, 0, 0, 153, 84, 'ptb'), (2, 'Rai', 2009, 1, 'Progesterone', 'Oral', 1, 0, 0, 74, 29,

'ptb'),

...
(2, 'Rai', 2009, 0, 'Progesterone', 'Oral', 1, 0, 0, 74, 44,

'ptb'),

...
...

The same behaviour occurs using loadtxt. Moreover, this array is untransposeable, so I am stuck with having to iterate over all the rows, making genfromtxt no better than csv.reader.

Once again, your array is 1D, so you can't tranpose it. Now, you should be able to get each column through

...
...
...
[a[_] for _ in a.dtype.names]

I understand the technicalities of why this occurs, but from a user's perspective, he is asking for distinct numpy arrays of specified types. The transposing seems to be almost an unimportant implementation detail because the user is asking for the data to be split up by columns. Personally, I think that this should be transparent to the user and should be able to work -- although I am not exactly sure if one should just simply return a list of numpy arrays with the column names dropped, or a list of one-column record arrays. If it can't quite work from within the framework for genfromtxt/loadtxt, then maybe another text loading function that is designed to have the data format known a priori would be suitable? Note that such a function might also be sufficient in addressing my long-standing qualm with loadtxt()'s squeeze behavior for files with only one line of data. Ben Root

Pierre GM

1:59 p.m.

New subject: unpack argument in loadtxt/genfromtxt does not work as documented

On Oct 7, 2010, at 3:49 PM, Benjamin Root wrote:

...

I understand the technicalities of why this occurs, but from a user's perspective, he is asking for distinct numpy arrays of specified types. The transposing seems to be almost an unimportant implementation detail because the user is asking for the data to be split up by columns. Personally, I think that this should be transparent to the user and should be able to work -- although I am not exactly sure if one should just simply return a list of numpy arrays with the column names dropped, or a list of one-column record arrays.

Well, easy enough to output a list of arrays for each column, be they of the same dtype or with different ones.

...

If it can't quite work from within the framework for genfromtxt/loadtxt, then maybe another text loading function that is designed to have the data format known a priori would be suitable?

Not needed. The unpack argument is used as the very end of the function anyway. Anyhow, could you open a ticket to that effect (else I'm quite likely to forget about it).

...

Note that such a function might also be sufficient in addressing my long-standing qualm with loadtxt()'s squeeze behavior for files with only one line of data.

Mind opening a second a ticket ?

Chris Fonnesbeck

2:03 p.m.

New subject: unpack argument in loadtxt/genfromtxt does not work as documented

On Thu, Oct 7, 2010 at 8:59 AM, Pierre GM <pgmdevlist@gmail.com> wrote:

...

Not needed. The unpack argument is used as the very end of the function anyway. Anyhow, could you open a ticket to that effect (else I'm quite likely to forget about it).

I can open this one.

Benjamin Root

2:09 p.m.

New subject: unpack argument in loadtxt/genfromtxt does not work as documented

On Thu, Oct 7, 2010 at 8:59 AM, Pierre GM <pgmdevlist@gmail.com> wrote:

...

On Oct 7, 2010, at 3:49 PM, Benjamin Root wrote:

...
I understand the technicalities of why this occurs, but from a user's

perspective, he is asking for distinct numpy arrays of specified types. The transposing seems to be almost an unimportant implementation detail because the user is asking for the data to be split up by columns. Personally, I think that this should be transparent to the user and should be able to work -- although I am not exactly sure if one should just simply return a list of numpy arrays with the column names dropped, or a list of one-column record arrays.

Well, easy enough to output a list of arrays for each column, be they of the same dtype or with different ones.

...
If it can't quite work from within the framework for genfromtxt/loadtxt, then maybe another text loading function that is designed to have the data format known a priori would be suitable?

Not needed. The unpack argument is used as the very end of the function anyway. Anyhow, could you open a ticket to that effect (else I'm quite likely to forget about it).

...
Note that such a function might also be sufficient in addressing my long-standing qualm with loadtxt()'s squeeze behavior for files with only one line of data.

Mind opening a second a ticket ?

Already been open for a little while now: http://projects.scipy.org/numpy/ticket/1562 Ben Root

Chris Fonnesbeck

2:01 p.m.

New subject: unpack argument in loadtxt/genfromtxt does not work as documented

On Thu, Oct 7, 2010 at 4:00 AM, Pierre GM <pgmdevlist@gmail.com> wrote:

...

On Oct 7, 2010, at 4:48 AM, Chris Fonnesbeck wrote:

...
The documentation for loadtxt and genfromtxt state that the unpack argument functions as follows:

If True, the returned array is transposed, so that arguments may be unpacked using x, y, z = loadtxt(...).

Provided that all the columns have the same dtype

Aha, I see. Unfortunately that detail is not in the docstrings. This is a pretty fundamental limitation of the function, I think, since it is rare that a multi-column table of data will be of the same type. I wonder if it would be possible to allow an 'obj' type array that could be transposed? The way it is now, you have a 1d array representing what is fundamentally 2d information.

Pierre GM

2:04 p.m.

New subject: unpack argument in loadtxt/genfromtxt does not work as documented

On Oct 7, 2010, at 4:01 PM, Chris Fonnesbeck wrote:

...

On Thu, Oct 7, 2010 at 4:00 AM, Pierre GM <pgmdevlist@gmail.com> wrote:

...
On Oct 7, 2010, at 4:48 AM, Chris Fonnesbeck wrote:

...
The documentation for loadtxt and genfromtxt state that the unpack argument functions as follows:

If True, the returned array is transposed, so that arguments may be unpacked using x, y, z = loadtxt(...).

Provided that all the columns have the same dtype

Aha, I see. Unfortunately that detail is not in the docstrings. This is a pretty fundamental limitation of the function, I think, since it is rare that a multi-column table of data will be of the same type. I wonder if it would be possible to allow an 'obj' type array that could be transposed? The way it is now, you have a 1d array representing what is fundamentally 2d information.

As I stated in a previous email, please open a ticket to that effect. In the meantime, please use the trick I was giving you to unpack the 1D array w/ several fields into a list of 1D arrays (one for each field).

5246

Age (days ago)

5246

Last active (days ago)

List overview

Download

7 comments

3 participants

participants (3)

Benjamin Root
Chris Fonnesbeck
Pierre GM

unpack argument in loadtxt/genfromtxt does not work as documented

Chris Fonnesbeck

Pierre GM

Benjamin Root

Pierre GM

Chris Fonnesbeck

Benjamin Root

Chris Fonnesbeck

Pierre GM

tags

participants (3)