[Numpy-discussion] loadtxt ndmin option

Benjamin Root ben.root at ou.edu
Thu May 5 11:52:11 EDT 2011


On Thu, May 5, 2011 at 10:49 AM, Benjamin Root <ben.root at ou.edu> wrote:

>
>
> On Wed, May 4, 2011 at 11:08 PM, Paul Anton Letnes <
> paul.anton.letnes at gmail.com> wrote:
>
>>
>> On 4. mai 2011, at 20.33, Benjamin Root wrote:
>>
>> > On Wed, May 4, 2011 at 7:54 PM, Derek Homeier <
>> derek at astro.physik.uni-goettingen.de> wrote:
>> > On 05.05.2011, at 2:40AM, Paul Anton Letnes wrote:
>> >
>> > > But: Isn't the numpy.atleast_2d and numpy.atleast_1d functions written
>> for this? Shouldn't we reuse them? Perhaps it's overkill, and perhaps it
>> will reintroduce the 'transposed' problem?
>> >
>> > Yes, good point, one could replace the
>> > X.shape = (X.size, ) with X = np.atleast_1d(X),
>> > but for the ndmin=2 case, we'd need to replace
>> > X.shape = (X.size, 1) with X = np.atleast_2d(X).T -
>> > not sure which solution is more efficient in terms of memory access
>> etc...
>> >
>> > Cheers,
>> >                                                Derek
>> >
>> >
>> > I can confirm that the current behavior is not sufficient for all of the
>> original corner cases that ndmin was supposed to address.  Keep in mind that
>> np.loadtxt takes a one-column data file and a one-row data file down to the
>> same shape.  I don't see how the current code is able to produce the correct
>> array shape when ndmin=2.  Do we have some sort of counter in loadtxt for
>> counting the number of rows and columns read?  Could we use those to help
>> guide the ndmin=2 case?
>> >
>> > I think that using atleast_1d(X) might be a bit overkill, but it would
>> be very clear as to the code's intent.  I don't think we have to worry about
>> memory usage if we limit its use to only situations where ndmin is greater
>> than the number of dimensions of the array.  In those cases, the array is
>> either an empty result, a scalar value (in which memory access is trivial),
>> or 1-d (in which a transpose is cheap).
>>
>> What if one does things the other way around - avoid calling squeeze until
>> _after_ doing the atleast_Nd() magic? That way the row/column information
>> should be conserved, right? Also, we avoid transposing, memory use, ...
>>
>> Oh, and someone could conceivably have a _looong_ 1D file, but would want
>> it read as a 2D array.
>>
>> Paul
>>
>>
>>
> @Derek, good catch with noticing the error in the tests. We do still need
> to handle the case I mentioned, however.  I have attached an example script
> to demonstrate the issue.  In this script, I would expect the second-to-last
> array to be a shape of (1, 5).  I believe that the single-row, multi-column
> case would actually be the more common type of edge-case encountered by
> users than the others.  Therefore, I believe that this ndmin fix is not
> adequate until this is addressed.
>
>
Apologies Derek, your patch does address the issue I raised.

Ben Root
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20110505/26d6f8fa/attachment.html>


More information about the NumPy-Discussion mailing list