On Wed, May 4, 2011 at 11:08 PM, Paul Anton Letnes <paul.anton.letnes@gmail.com> wrote:

On 4. mai 2011, at 20.33, Benjamin Root wrote:

> On Wed, May 4, 2011 at 7:54 PM, Derek Homeier <derek@astro.physik.uni-goettingen.de> wrote:
> On 05.05.2011, at 2:40AM, Paul Anton Letnes wrote:
>
> > But: Isn't the numpy.atleast_2d and numpy.atleast_1d functions written for this? Shouldn't we reuse them? Perhaps it's overkill, and perhaps it will reintroduce the 'transposed' problem?
>
> Yes, good point, one could replace the
> X.shape = (X.size, ) with X = np.atleast_1d(X),
> but for the ndmin=2 case, we'd need to replace
> X.shape = (X.size, 1) with X = np.atleast_2d(X).T -
> not sure which solution is more efficient in terms of memory access etc...
>
> Cheers,
>                                                Derek
>
>
> I can confirm that the current behavior is not sufficient for all of the original corner cases that ndmin was supposed to address.  Keep in mind that np.loadtxt takes a one-column data file and a one-row data file down to the same shape.  I don't see how the current code is able to produce the correct array shape when ndmin=2.  Do we have some sort of counter in loadtxt for counting the number of rows and columns read?  Could we use those to help guide the ndmin=2 case?
>
> I think that using atleast_1d(X) might be a bit overkill, but it would be very clear as to the code's intent.  I don't think we have to worry about memory usage if we limit its use to only situations where ndmin is greater than the number of dimensions of the array.  In those cases, the array is either an empty result, a scalar value (in which memory access is trivial), or 1-d (in which a transpose is cheap).

What if one does things the other way around - avoid calling squeeze until _after_ doing the atleast_Nd() magic? That way the row/column information should be conserved, right? Also, we avoid transposing, memory use, ...

Oh, and someone could conceivably have a _looong_ 1D file, but would want it read as a 2D array.

Paul



@Derek, good catch with noticing the error in the tests. We do still need to handle the case I mentioned, however.  I have attached an example script to demonstrate the issue.  In this script, I would expect the second-to-last array to be a shape of (1, 5).  I believe that the single-row, multi-column case would actually be the more common type of edge-case encountered by users than the others.  Therefore, I believe that this ndmin fix is not adequate until this is addressed.

@Paul, we can't call squeeze after doing the atleast_Nd() magic.  That would just undo whatever we had just done.  Also, wrt the transpose, a (1, 100000) array looks the same in memory as a (100000, 1) array, right?

Ben Root