Re: [Numpy-discussion] unpack argument in loadtxt/genfromtxt does not work as documented

Oct. 7, 2010

      On Thu, Oct 7, 2010 at 4:00 AM, Pierre GM <pgmdevlist@gmail.com> wrote:
...
On Oct 7, 2010, at 4:48 AM, Chris Fonnesbeck wrote:
...
The documentation for loadtxt and genfromtxt state that the unpack
argument functions as follows:
If True, the returned array is transposed, so that arguments may be
unpacked using x, y, z = loadtxt(...).
Provided that all the columns have the same dtype
...
In practice, this does not always occur. I have a csv file of mixed
data types, and try importing it via:
genfromtxt("progestogens.csv", delimiter=",", names=True,
dtype=dtype([('id', int),('study', '|S25'),('year', int),('treat',
int),('drug', '|S25'),('form', '|S10'),('ptb', int),('mgest',
int),('lab', int),('n', int),('y', int),('var', '|S5'),('wt',
int),('sdwt', int)]), unpack=True)
With unpack=True, I would expect the data to be presented by columns,
however the resulting array is by rows:
Well, you have a complex dtype, so your result array is 1D, each row
corresponding to a tuple of elements with different dtypes.
...
array([(1, 'Meis', 2003, 1, '17P', 'IM', 1, 0, 0, 306, 111, 'ptb'),
      (1, 'Meis', 2003, 0, '17P', 'IM', 1, 0, 0, 153, 84, 'ptb'),
      (2, 'Rai', 2009, 1, 'Progesterone', 'Oral', 1, 0, 0, 74, 29,
'ptb'),
...
(2, 'Rai', 2009, 0, 'Progesterone', 'Oral', 1, 0, 0, 74, 44,
'ptb'),
...
...
The same behaviour occurs using loadtxt. Moreover, this array is
untransposeable, so I am stuck with having to iterate over all the
rows, making genfromtxt no better than csv.reader.
Once again, your array is 1D, so you can't tranpose it.
Now, you should be able to get each column through
...
...
...
[a[_] for _ in a.dtype.names]
I understand the technicalities of why this occurs, but from a user's
perspective, he is asking for distinct numpy arrays of specified types.  The
transposing seems to be almost an unimportant implementation detail because
the user is asking for the data to be split up by columns.  Personally, I
think that this should be transparent to the user and should be able to work
-- although I am not exactly sure if one should just simply return a list of
numpy arrays with the column names dropped, or a list of one-column record
arrays.

If it can't quite work from within the framework for genfromtxt/loadtxt,
then maybe another text loading function that is designed to have the data
format known a priori would be suitable?  Note that such a function might
also be sufficient in addressing my long-standing qualm with loadtxt()'s
squeeze behavior for files with only one line of data.

Ben Root

Re: [Numpy-discussion] unpack argument in loadtxt/genfromtxt does not work as documented

Benjamin Root