[Numpy-discussion] loadtxt/savetxt tickets
Derek Homeier
derek at astro.physik.uni-goettingen.de
Sat Mar 26 16:44:14 EDT 2011
Hi Paul,
having had a look at the other tickets you dug up,
> My opinions are my own, and in detail, they are:
> 1752:
> I attach a possible patch. FWIW, I agree with the request. The
> patch is written to be compatible with the fix in ticket #1562, but
> I did not test that yet.
Tested, see also my comments on Trac.
> 1731:
> This seems like a rather trivial feature enhancement. I attach a
> possible patch.
Agreed. Haven't tested it though.
> 1616:
> The suggested patch seems reasonable to me, but I do not have a
> full list of what objects loadtxt supports today as opposed to what
> this patch will support.
> 1562:
> I attach a possible patch. This could also be the default
> behavior to my mind, since the function caller can simply call
> numpy.squeeze if needed. Changing default behavior would probably
> break old code, however.
See comments on Trac as well.
> 1458:
> The fix suggested in the ticket seems reasonable, but I have
> never used record arrays, so I am not sure of this.
There were some issues with Python3, and I also had some general
reservations
as noted on Trac - basically, it makes 'unpack' equivalent to
transposing for 2D-arrays,
but to splitting into fields for 1D-recarrays. My question was, what's
going to happen
when you get to 2D-recarrays? Currently this is not an issue since
loadtxt can only
read 2D regular or 1D structured arrays. But this might change if the
data block
functionality (see below) were to be implemented - data could then be
returned as
3D arrays or 2D structured arrays... Still, it would probably make
most sense (or at
least give the widest functionality) to have 'unpack=True' always
return a list or iterator
over columns.
> 1445:
> Adding this functionality could break old code, as some old
> datafiles may have empty lines which are now simply ignored. I do
> not think the feature is a good idea. It could rather be implemented
> as a separate function.
> 1107:
> I do not see the need for this enhancement. In my eyes, the
> usecols kwarg does this and more. Perhaps I am misunderstanding
> something here.
Agree about #1445, and the bit about 'usecols' - 'numcols' would just
provide a
shorter call to e.g. read the first 20 columns of a file (well, not
even that much
over 'usecols=range(20)'...), don't think that justifies an extra
argument.
But the 'datablocks' provides something new, that a number of people
seem
to miss from e.g. gnuplot (including me, actually ;-). And it would
also satisfy the
request from #1445 without breaking backwards compatibility.
I've been wondering if could instead specify the separator lines
through the
parameter, e.g. "blocksep=['None', 'blank','invalid']", not sure if
that would make
it more useful...
> 1071:
> It is not clear to me whether loadtxt is supposed to support
> missing values in the fashion indicated in the ticket.
In principle it should at least allow you to, by the use of converters
as described there.
The problem is, the default delimiter is described as 'any
whitespace', which in the
present implementation obviously includes any number of blanks or
tabs. These
are therefore treated differently from delimiters like ',' or '&'. I'd
reckon there are
too many people actually relying on this behaviour to silently change it
(e.g. I know plenty of tables with columns separated by either one or
several
tabs depending on the length of the previous entry). But the tab is
apparently also
treated differently if explicitly specified with "delimiter='\t'" -
and in that case using
a converter à la {2: lambda s: float(s or 'Nan')} is working for
fields in the middle of
the line, but not at the end - clearly warrants improvement. I've
prepared a patch
working for Python3 as well.
> 1163:
> 1565:
> These tickets seem to have the same origin of the problem. I
> attach one possible patch. The previously suggested patches that
> I've seen will not correctly convert floats to ints, which I believe
> my patch will.
+1, though I am a bit concerned that prompting to raise a ValueError
for every
element could impede performance. I'd probably still enclose it into an
if issubclass(typ, np.uint64) or issubclass(typ, np.int64):
just like in npio.patch. I also thought one might switch to
int(float128(x)) in that
case, but at least for the given examples float128 cannot convert with
more
accuracy than float64 (even on PowerPC ;-).
There were some dissenting opinions that trying to read a float into
an int should
generally throw an exception though...
And Chuck just beat me...
On 26 Mar 2011, at 21:25, Charles R Harris wrote:
> I put all these patches together at https://github.com/charris/numpy/tree/loadtxt-savetxt
> . Please pull from there to continue work on loadtxt/savetxt so as
> to avoid conflicts in the patches. One of the numpy tests is
> failing, I assume from patch conflicts, and more tests for the
> tickets are needed in any case. Also, new keywords should be added
> to the end, not put in the middle of existing keywords.
>
> I haven't reviewed the patches, just tried to get them organized.
> Also, I have Derek as the author on all of them, that can be changed
> if it is decided the credit should go elsewhere ;) Thanks for the
> work you all have been doing on these tickets.
Thanks, I'll have a look at the new ticket and try to get that
organized!
Cheers,
Derek
More information about the NumPy-Discussion
mailing list