Mailman 3 genfromtxt() skips comments - NumPy-Discussion

newer
Effect of deprecating non-integers...

genfromtxt() skips comments

older
__getitem__ and creating an array...

Albert Kottke

May 31, 2013

3:08 p.m.

I noticed that genfromtxt() did not skip comments if the keyword names is not True. If names is True, then genfromtxt() would take the first line as the names. I am proposing a fix to genfromtxt that skips all of the comments in a file, and potentially using the last comment line for names. This will allow reading files with and without comments and/or names. The difference is here: https://github.com/arkottke/numpy/compare/my-genfromtxt Albert p.s. insert some disclaimer about my first pull request

Attachments:

attachment.htm (text/html — 692 bytes)

Show replies by date

Benjamin Root

May 2013

3:14 p.m.

On Fri, May 31, 2013 at 5:08 PM, Albert Kottke <albert.kottke@gmail.com>wrote:

...

I noticed that genfromtxt() did not skip comments if the keyword names is not True. If names is True, then genfromtxt() would take the first line as the names. I am proposing a fix to genfromtxt that skips all of the comments in a file, and potentially using the last comment line for names. This will allow reading files with and without comments and/or names.

The difference is here: https://github.com/arkottke/numpy/compare/my-genfromtxt

Careful with semantics here. First off, using the last comment line as the source for names might initially make sense, except when there are comments within the data file. I would suggest going for "last comment line before the first line of data". Second, sometimes the names come from an un-commented first line, but comments are still used within the file elsewhere. Just some food for thought. I don't know if the current design is best or not. Ben Root

Albert Kottke

3:30 p.m.

I agree that "last comment line before the first line of data" is more descriptive. Regarding the location of the names. I thought taking it from the last comment line before the first line of data made sense because it would permit reading of just the data with np.loadtxt(), but also permit creating records with np.recfromtxt(). It would also be good to consider other implementations. For example, pandas and R both use names without a comment character. Albert On Fri, May 31, 2013 at 2:14 PM, Benjamin Root <ben.root@ou.edu> wrote:

...

On Fri, May 31, 2013 at 5:08 PM, Albert Kottke <albert.kottke@gmail.com>wrote:

...
I noticed that genfromtxt() did not skip comments if the keyword names is not True. If names is True, then genfromtxt() would take the first line as the names. I am proposing a fix to genfromtxt that skips all of the comments in a file, and potentially using the last comment line for names. This will allow reading files with and without comments and/or names.

The difference is here: https://github.com/arkottke/numpy/compare/my-genfromtxt

Careful with semantics here. First off, using the last comment line as the source for names might initially make sense, except when there are comments within the data file. I would suggest going for "last comment line before the first line of data". Second, sometimes the names come from an un-commented first line, but comments are still used within the file elsewhere.

Just some food for thought. I don't know if the current design is best or not.

Ben Root

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Pierre GM

4:02 p.m.

On May 31, 2013 at 23:08:18 , Albert Kottke (albert.kottke@gmail.com) wrote: I noticed that genfromtxt() did not skip comments if the keyword names is not True. If names is True, then genfromtxt() would take the first line as the names. I am proposing a fix to genfromtxt that skips all of the comments in a file, and potentially using the last comment line for names. I'm quite surprised, as comments are already skipped in my standard numpy version (1.7.0). For example:

...

...
...
S=StringIO("!blah\n!blah\n!blah\n!A:B:C\n1:2:3\n4:5:6\n") np.genfromtxt(S, delimiter=":", comments="!", names=("A","B","C")) Works as expected, even when using the default `names=None`. Comments are taken care of with the `split_line` function (an instance of `_iotools.LineSplitter`).

Albert Kottke

4:24 p.m.

Now try the same thing with np.recfromcsv(). I get the following (Python 3.3):

...

...
...
import io b = io.BytesIO(b"!blah\n!blah\n!blah\n!A:B:C\n1:2:3\n4:5:6\n") np.recfromcsv(b, delimiter=':', comments='!') ... ValueError: Some errors were detected ! Line #5 (got 3 columns instead of 1) Line #6 (got 3 columns instead of 1)

On Fri, May 31, 2013 at 3:02 PM, Pierre GM <pgmdevlist@gmail.com> wrote:

...

On May 31, 2013 at 23:08:18 , Albert Kottke (albert.kottke@gmail.com) wrote: I noticed that genfromtxt() did not skip comments if the keyword names is not True. If names is True, then genfromtxt() would take the first line as the names. I am proposing a fix to genfromtxt that skips all of the comments in a file, and potentially using the last comment line for names. I'm quite surprised, as comments are already skipped in my standard numpy version (1.7.0). For example:

...
...
...
S=StringIO("!blah\n!blah\n!blah\n!A:B:C\n1:2:3\n4:5:6\n") np.genfromtxt(S, delimiter=":", comments="!", names=("A","B","C")) Works as expected, even when using the default `names=None`. Comments are taken care of with the `split_line` function (an instance of `_iotools.LineSplitter`).

NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

4285

Age (days ago)

4285

Last active (days ago)

List overview

Download

4 comments

3 participants

participants (3)

Albert Kottke
Benjamin Root
Pierre GM

genfromtxt() skips comments

Albert Kottke

Benjamin Root

Albert Kottke

Pierre GM

Albert Kottke

tags

participants (3)