![](https://secure.gravatar.com/avatar/96dd777e397ab128fedab46af97a3a4a.jpg?s=120&d=mm&r=g)
On Sun, Mar 27, 2011 at 4:09 AM, Paul Anton Letnes < paul.anton.letnes@gmail.com> wrote:
On 26. mars 2011, at 21.44, Derek Homeier wrote:
Hi Paul,
having had a look at the other tickets you dug up,
My opinions are my own, and in detail, they are: 1752: I attach a possible patch. FWIW, I agree with the request. The patch is written to be compatible with the fix in ticket #1562, but I did not test that yet.
Tested, see also my comments on Trac.
Great!
1731: This seems like a rather trivial feature enhancement. I attach a possible patch.
Agreed. Haven't tested it though.
Great!
1616: The suggested patch seems reasonable to me, but I do not have a full list of what objects loadtxt supports today as opposed to what this patch will support.
Looks like you got this one. Just remember to make it compatible with #1752. Should be easy.
1562: I attach a possible patch. This could also be the default behavior to my mind, since the function caller can simply call numpy.squeeze if needed. Changing default behavior would probably break old code, however.
See comments on Trac as well.
Your patch is better, but there is one thing I disagree with. 808 if X.ndim < ndmin: 809 if ndmin == 1: 810 X.shape = (X.size, ) 811 elif ndmin == 2: 812 X.shape = (X.size, 1) The last line should be: 812 X.shape = (1, X.size) If someone wants a 2D array out, they would most likely expect a one-row file to come out as a one-row array, not the other way around. IMHO.
1458: The fix suggested in the ticket seems reasonable, but I have never used record arrays, so I am not sure of this.
There were some issues with Python3, and I also had some general reservations as noted on Trac - basically, it makes 'unpack' equivalent to transposing for 2D-arrays, but to splitting into fields for 1D-recarrays. My question was, what's going to happen when you get to 2D-recarrays? Currently this is not an issue since loadtxt can only read 2D regular or 1D structured arrays. But this might change if the data block functionality (see below) were to be implemented - data could then be returned as 3D arrays or 2D structured arrays... Still, it would probably make most sense (or at least give the widest functionality) to have 'unpack=True' always return a list or iterator over columns.
OK, I don't know recarrays, as I said.
1445: Adding this functionality could break old code, as some old datafiles may have empty lines which are now simply ignored. I do not think the feature is a good idea. It could rather be implemented as a separate function. 1107: I do not see the need for this enhancement. In my eyes, the usecols kwarg does this and more. Perhaps I am misunderstanding something here.
Agree about #1445, and the bit about 'usecols' - 'numcols' would just provide a shorter call to e.g. read the first 20 columns of a file (well, not even that much over 'usecols=range(20)'...), don't think that justifies an extra argument. But the 'datablocks' provides something new, that a number of people seem to miss from e.g. gnuplot (including me, actually ;-). And it would also satisfy the request from #1445 without breaking backwards compatibility. I've been wondering if could instead specify the separator lines through the parameter, e.g. "blocksep=['None', 'blank','invalid']", not sure if that would make it more useful...
What about writing a separate function, e.g. loadblocktxt, and have it separate the chunks and call loadtxt for each chunk? Just a thought. Another possibility would be to write a function that would let you load a set of text files in a directory, and return a dict of datasets, one per file. One could write a similar save-function, too. They would just need to call loadtxt/savetxt on a per-file basis.
1071: It is not clear to me whether loadtxt is supposed to support missing values in the fashion indicated in the ticket.
In principle it should at least allow you to, by the use of converters as described there. The problem is, the default delimiter is described as 'any whitespace', which in the present implementation obviously includes any number of blanks or tabs. These are therefore treated differently from delimiters like ',' or '&'. I'd reckon there are too many people actually relying on this behaviour to silently change it (e.g. I know plenty of tables with columns separated by either one or several tabs depending on the length of the previous entry). But the tab is apparently also treated differently if explicitly specified with "delimiter='\t'" - and in that case using a converter à la {2: lambda s: float(s or 'Nan')} is working for fields in the middle of the line, but not at the end - clearly warrants improvement. I've prepared a patch working for Python3 as well.
Great!
1163: 1565: These tickets seem to have the same origin of the problem. I attach one possible patch. The previously suggested patches that I've seen will not correctly convert floats to ints, which I believe my patch will.
+1, though I am a bit concerned that prompting to raise a ValueError for every element could impede performance. I'd probably still enclose it into an if issubclass(typ, np.uint64) or issubclass(typ, np.int64): just like in npio.patch. I also thought one might switch to int(float128(x)) in that case, but at least for the given examples float128 cannot convert with more accuracy than float64 (even on PowerPC ;-). There were some dissenting opinions that trying to read a float into an int should generally throw an exception though...
And Chuck just beat me...
I am sure someone has been using this functionality to convert floats to ints. Changing will break their code. I am not sure how big a deal that would be. Also, I am of the opinion that one should _first_ write a program that works _correctly_, and only afterwards worry about performance. Even so, using numpy.int64 would be better, because it would give a sensible error message.
On 26 Mar 2011, at 21:25, Charles R Harris wrote:
I put all these patches together at https://github.com/charris/numpy/tree/loadtxt-savetxt . Please pull from there to continue work on loadtxt/savetxt so as to avoid conflicts in the patches. One of the numpy tests is failing, I assume from patch conflicts, and more tests for the tickets are needed in any case. Also, new keywords should be added to the end, not put in the middle of existing keywords.
I haven't reviewed the patches, just tried to get them organized. Also, I have Derek as the author on all of them, that can be changed if it is decided the credit should go elsewhere ;) Thanks for the work you all have been doing on these tickets.
Thanks, I'll have a look at the new ticket and try to get that organized!
With some luck, all the loadtxt tickets should be closed in short time :-)
Can some one bring me up to date on which tickets have been dealt with? Also, how do they divide into bug fixes and enhancements? Chuck