[Numpy-discussion] Proposed change in genfromtxt(..., comments='#', names=True) behaviour

Mon Jul 16 10:12:56 EDT 2012

To be ultra clear (since I want to code this), you are suggesting that
'first_commented_line' be a *new* accepted value for the kwarg 'names', to
invoke the behaviour you suggest?

 Nope, I was just referring to some hypothetical variable name. I meant
that:

first_values = None
try:
    while not first_values:
        first_line = fhd.next()
        if names is True:
            parsed = [m for m in first_line.split(comments) if m.strip()]
            if parsed:
                first_value = split_line(parsed[0])
        else:
            ...

(it's not tested, I'm writing it as it comes. And I didn't even use the
`first_commented_line` name, sorry)

If this IS what you mean, I'd counter-propose something in the same spirit,
but a bit simpler…we let the kwarg 'skip_header' take some additional
value, say int(0), int(-1), str('auto'), or True.

 In this case, instead of skipping a fixed number of lines, it will skip
any number of consecutive empty OR commented lines;

I really like the idea of having `skip_header=-1` skip all the empty or
commented lines (that is, lines whose first non-space character is the
`comments` character). That'd be rather convenient.

The semantics of this are more intuitive, because this is what I am
really after: to *skip* a commented *header* of arbitrary length. So my
four examples below could be parsed with:

1. genfromtxt(..., names=True)
2. genfromtxt(..., names=True, skip_header=True)
3. genfromtxt(..., names=True)
4. genfromtxt(..., names=True, skip_header=True)

…crucially #1 avoids the regression.

Does this seem good to everyone?

Sounds good w/ `skip_header=-1`

But if this is NOT what you mean, then what you say does not actually work
with the simple use-case of my Example #2 below. The first commented line
is "# here is a..." with # as the first non-space character, so the part
after becomes the names 'here', 'is', 'a' etc.

In that case, you could always use `skip_header=2`

In short, the code can't resolve the ambiguity without some extra
information from the user.

It's always best not to let the code guess too much anyway...

Well, no regression, and you have a nice plan. I'm for it.
Anybody else?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120716/f883acfb/attachment.html>