[Numpy-discussion] loadtxt ndmin option

Paul Anton Letnes paul.anton.letnes at gmail.com
Thu May 5 00:08:23 EDT 2011

On 4. mai 2011, at 20.33, Benjamin Root wrote:

> On Wed, May 4, 2011 at 7:54 PM, Derek Homeier <derek at astro.physik.uni-goettingen.de> wrote:
> On 05.05.2011, at 2:40AM, Paul Anton Letnes wrote:
> > But: Isn't the numpy.atleast_2d and numpy.atleast_1d functions written for this? Shouldn't we reuse them? Perhaps it's overkill, and perhaps it will reintroduce the 'transposed' problem?
> Yes, good point, one could replace the
> X.shape = (X.size, ) with X = np.atleast_1d(X),
> but for the ndmin=2 case, we'd need to replace
> X.shape = (X.size, 1) with X = np.atleast_2d(X).T -
> not sure which solution is more efficient in terms of memory access etc...
> Cheers,
>                                                Derek
> I can confirm that the current behavior is not sufficient for all of the original corner cases that ndmin was supposed to address.  Keep in mind that np.loadtxt takes a one-column data file and a one-row data file down to the same shape.  I don't see how the current code is able to produce the correct array shape when ndmin=2.  Do we have some sort of counter in loadtxt for counting the number of rows and columns read?  Could we use those to help guide the ndmin=2 case?
> I think that using atleast_1d(X) might be a bit overkill, but it would be very clear as to the code's intent.  I don't think we have to worry about memory usage if we limit its use to only situations where ndmin is greater than the number of dimensions of the array.  In those cases, the array is either an empty result, a scalar value (in which memory access is trivial), or 1-d (in which a transpose is cheap).

What if one does things the other way around - avoid calling squeeze until _after_ doing the atleast_Nd() magic? That way the row/column information should be conserved, right? Also, we avoid transposing, memory use, ...

Oh, and someone could conceivably have a _looong_ 1D file, but would want it read as a 2D array.


More information about the NumPy-Discussion mailing list