genfromtxt() skips comments
data:image/s3,"s3://crabby-images/28899/2889967a920cf0aa192560c3dcf854ea86d01965" alt=""
I noticed that genfromtxt() did not skip comments if the keyword names is not True. If names is True, then genfromtxt() would take the first line as the names. I am proposing a fix to genfromtxt that skips all of the comments in a file, and potentially using the last comment line for names. This will allow reading files with and without comments and/or names. The difference is here: https://github.com/arkottke/numpy/compare/my-genfromtxt Albert p.s. insert some disclaimer about my first pull request
data:image/s3,"s3://crabby-images/fe1bb/fe1bbd4a499e9adee9972c09e34b227a1299f7f6" alt=""
On Fri, May 31, 2013 at 5:08 PM, Albert Kottke <albert.kottke@gmail.com>wrote:
I noticed that genfromtxt() did not skip comments if the keyword names is not True. If names is True, then genfromtxt() would take the first line as the names. I am proposing a fix to genfromtxt that skips all of the comments in a file, and potentially using the last comment line for names. This will allow reading files with and without comments and/or names.
The difference is here: https://github.com/arkottke/numpy/compare/my-genfromtxt
Careful with semantics here. First off, using the last comment line as the source for names might initially make sense, except when there are comments within the data file. I would suggest going for "last comment line before the first line of data". Second, sometimes the names come from an un-commented first line, but comments are still used within the file elsewhere. Just some food for thought. I don't know if the current design is best or not. Ben Root
data:image/s3,"s3://crabby-images/28899/2889967a920cf0aa192560c3dcf854ea86d01965" alt=""
I agree that "last comment line before the first line of data" is more descriptive. Regarding the location of the names. I thought taking it from the last comment line before the first line of data made sense because it would permit reading of just the data with np.loadtxt(), but also permit creating records with np.recfromtxt(). It would also be good to consider other implementations. For example, pandas and R both use names without a comment character. Albert On Fri, May 31, 2013 at 2:14 PM, Benjamin Root <ben.root@ou.edu> wrote:
On Fri, May 31, 2013 at 5:08 PM, Albert Kottke <albert.kottke@gmail.com>wrote:
I noticed that genfromtxt() did not skip comments if the keyword names is not True. If names is True, then genfromtxt() would take the first line as the names. I am proposing a fix to genfromtxt that skips all of the comments in a file, and potentially using the last comment line for names. This will allow reading files with and without comments and/or names.
The difference is here: https://github.com/arkottke/numpy/compare/my-genfromtxt
Careful with semantics here. First off, using the last comment line as the source for names might initially make sense, except when there are comments within the data file. I would suggest going for "last comment line before the first line of data". Second, sometimes the names come from an un-commented first line, but comments are still used within the file elsewhere.
Just some food for thought. I don't know if the current design is best or not.
Ben Root
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
data:image/s3,"s3://crabby-images/d6ed8/d6ed8a6c40cfae688fb3a548ced9980c66f99275" alt=""
On May 31, 2013 at 23:08:18 , Albert Kottke (albert.kottke@gmail.com) wrote: I noticed that genfromtxt() did not skip comments if the keyword names is not True. If names is True, then genfromtxt() would take the first line as the names. I am proposing a fix to genfromtxt that skips all of the comments in a file, and potentially using the last comment line for names. I'm quite surprised, as comments are already skipped in my standard numpy version (1.7.0). For example:
S=StringIO("!blah\n!blah\n!blah\n!A:B:C\n1:2:3\n4:5:6\n") np.genfromtxt(S, delimiter=":", comments="!", names=("A","B","C")) Works as expected, even when using the default `names=None`. Comments are taken care of with the `split_line` function (an instance of `_iotools.LineSplitter`).
data:image/s3,"s3://crabby-images/28899/2889967a920cf0aa192560c3dcf854ea86d01965" alt=""
Now try the same thing with np.recfromcsv(). I get the following (Python 3.3):
import io b = io.BytesIO(b"!blah\n!blah\n!blah\n!A:B:C\n1:2:3\n4:5:6\n") np.recfromcsv(b, delimiter=':', comments='!') ... ValueError: Some errors were detected ! Line #5 (got 3 columns instead of 1) Line #6 (got 3 columns instead of 1)
On Fri, May 31, 2013 at 3:02 PM, Pierre GM <pgmdevlist@gmail.com> wrote:
On May 31, 2013 at 23:08:18 , Albert Kottke (albert.kottke@gmail.com) wrote: I noticed that genfromtxt() did not skip comments if the keyword names is not True. If names is True, then genfromtxt() would take the first line as the names. I am proposing a fix to genfromtxt that skips all of the comments in a file, and potentially using the last comment line for names. I'm quite surprised, as comments are already skipped in my standard numpy version (1.7.0). For example:
S=StringIO("!blah\n!blah\n!blah\n!A:B:C\n1:2:3\n4:5:6\n") np.genfromtxt(S, delimiter=":", comments="!", names=("A","B","C")) Works as expected, even when using the default `names=None`. Comments are taken care of with the `split_line` function (an instance of `_iotools.LineSplitter`).
NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
participants (3)
-
Albert Kottke
-
Benjamin Root
-
Pierre GM