[Numpy-discussion] loadtxt ndmin option
ralf.gommers at googlemail.com
Thu May 5 15:33:04 EDT 2011
On Thu, May 5, 2011 at 9:18 PM, Benjamin Root <ben.root at ou.edu> wrote:
> On Thu, May 5, 2011 at 1:08 PM, Paul Anton Letnes <
> paul.anton.letnes at gmail.com> wrote:
>> On 5. mai 2011, at 08.49, Benjamin Root wrote:
>> > On Wed, May 4, 2011 at 11:08 PM, Paul Anton Letnes <
>> paul.anton.letnes at gmail.com> wrote:
>> > On 4. mai 2011, at 20.33, Benjamin Root wrote:
>> > > On Wed, May 4, 2011 at 7:54 PM, Derek Homeier <
>> derek at astro.physik.uni-goettingen.de> wrote:
>> > > On 05.05.2011, at 2:40AM, Paul Anton Letnes wrote:
>> > >
>> > > > But: Isn't the numpy.atleast_2d and numpy.atleast_1d functions
>> written for this? Shouldn't we reuse them? Perhaps it's overkill, and
>> perhaps it will reintroduce the 'transposed' problem?
>> > >
>> > > Yes, good point, one could replace the
>> > > X.shape = (X.size, ) with X = np.atleast_1d(X),
>> > > but for the ndmin=2 case, we'd need to replace
>> > > X.shape = (X.size, 1) with X = np.atleast_2d(X).T -
>> > > not sure which solution is more efficient in terms of memory access
>> > >
>> > > Cheers,
>> > > Derek
>> > >
>> > >
>> > > I can confirm that the current behavior is not sufficient for all of
>> the original corner cases that ndmin was supposed to address. Keep in mind
>> that np.loadtxt takes a one-column data file and a one-row data file down to
>> the same shape. I don't see how the current code is able to produce the
>> correct array shape when ndmin=2. Do we have some sort of counter in
>> loadtxt for counting the number of rows and columns read? Could we use
>> those to help guide the ndmin=2 case?
>> > >
>> > > I think that using atleast_1d(X) might be a bit overkill, but it would
>> be very clear as to the code's intent. I don't think we have to worry about
>> memory usage if we limit its use to only situations where ndmin is greater
>> than the number of dimensions of the array. In those cases, the array is
>> either an empty result, a scalar value (in which memory access is trivial),
>> or 1-d (in which a transpose is cheap).
>> > What if one does things the other way around - avoid calling squeeze
>> until _after_ doing the atleast_Nd() magic? That way the row/column
>> information should be conserved, right? Also, we avoid transposing, memory
>> use, ...
>> > Oh, and someone could conceivably have a _looong_ 1D file, but would
>> want it read as a 2D array.
>> > Paul
>> > @Derek, good catch with noticing the error in the tests. We do still
>> need to handle the case I mentioned, however. I have attached an example
>> script to demonstrate the issue. In this script, I would expect the
>> second-to-last array to be a shape of (1, 5). I believe that the
>> single-row, multi-column case would actually be the more common type of
>> edge-case encountered by users than the others. Therefore, I believe that
>> this ndmin fix is not adequate until this is addressed.
>> > @Paul, we can't call squeeze after doing the atleast_Nd() magic. That
>> would just undo whatever we had just done. Also, wrt the transpose, a (1,
>> 100000) array looks the same in memory as a (100000, 1) array, right?
>> Agree. I thought more along the lines of (pseudocode-ish)
>> if ndmin == 0:
>> if ndmin == 1:
>> elif ndmin == 2:
>> I don't rightly know what would go here, maybe raise ValueError?
>> That would avoid the squeeze call before the atleast_Nd magic. But the
>> code was changed, so I think my comment doesn't make sense anymore. It's
>> probably fine the way it is!
> I have thought of that too, but the problem with that approach is that
> after reading the file, X will have 2 or 3 dimensions, regardless of how
> many singleton dims were in the file. A squeeze will always be needed.
> Also, the purpose of squeeze is opposite that of the atleast_*d()
> functions: squeeze reduces dimensions, while atleast_*d will add
> Therefore, I re-iterate... the patch by Derek gets the job done. I have
> tested it for a wide variety of inputs for both regular arrays and record
> arrays. Is there room for improvements? Yes, but I think that can wait for
> later. Derek's patch however fixes an important bug in the ndmin
> implementation and should be included for the release.
> Two questions: can you point me to the patch/ticket, and is this a
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the NumPy-Discussion