[Numpy-discussion] Proposed change in genfromtxt(..., comments='#', names=True) behaviour

Tue Jul 17 11:47:48 EDT 2012

On Mon, Jul 16, 2012 at 10:39 PM, Pierre GM <pgmdevlist at gmail.com> wrote:
> I don't really have any deep issue with `skip_header=True`, besides not
> really liking having an argument whose type can vary. But that's only a
> matter of personal taste. And yes, we could always check the type…

I guess I still have a small preference for skip_header="comments"
over skip_header=True, since the latter is more opaque for no purpose.
Also it makes me slightly antsy since skip_header is normally an
integer, and True is, in fact, just an integer with a special
__repr__:

In [2]: isinstance(True, int)
Out[2]: True

In [3]: True + True
Out[3]: 2

Not that there are likely to be people using skip_header=True as an
alias for skip_header=1, but if they were it would currently work.

> Pierre, for a line "# A B C #1 #2 #3" the user gets six columns 'A',
> 'B', 'C', '#1', '#2', '#3', which is messy but what they deserve for
> using such messy input :)
>
> OK, we're on the same page.
>
>
> Also, if you look closely, the use of index()
> you propose is equivalent to my current code, just more verbose.
>
> I'm not convinced by line 1353: unless you change it to
> asbyte(comment).join(first_line.split(comments)[1:])
> you gonna lose the '#', aren't you ? With the 'index' way, we just pick the
> first one, as intended. But it's late and multitasking isn't really working
> for me now.

I think you guys are looking for .split(comments, 1).

-n