[Tutor] mapping header row to data rows in file

Thu Jun 27 08:50:01 CEST 2013

Sivaram Neelakantan wrote:

> On Wed, Jun 26 2013,Peter Otten wrote:
> 
> 
> [snipped 36 lines]
> 
>> from collections import namedtuple
>>
>> def reader(instream):
>>     rows = (line.split() for line in instream)
>>     names = next(rows)
>>     Row = namedtuple("Row", names)
>>     return (Row(*values) for values in rows)
>>
>> with open(FILENAME, "r") as f:
>>     for row in reader(f):
>>         print row.name
>>
> I get these errors with the code above
> 
> Row = namedtuple("Row", names)
> File "/usr/lib/python2.7/collections.py", line 278, in namedtuple
> raise ValueError('Type names and field names can only contain alphanumeric
> characters and underscores: %r' % name) ValueError: Type names and field
> names can only contain alphanumeric characters and underscores:
> 'Symbol,Series,Date,Prev_Close'
> 
> 
> 
> --8<---------------cut here---------------start------------->8---
> 
> Symbol,Series,Date,Prev_Close
> STER,EQ,22-Nov-2012,         9
> STER,EQ,29-Nov-2012,        10
> STER,EQ,06-Dec-2012,        11
> STER,EQ,06-Jun-2013,         9
> STER,EQ,07-Jun-2013,         9

The format of the above table differes from the one you posted originally.

line.split()

splits the line on whitespace:

>>> "alpha    beta\tgamma\n".split()
['alpha', 'beta', 'gamma']

To split the line on commas you can use line.split(","). This preserves the 
surrounding whitespace, though:

>>> "alpha,   beta,gamma\n".split(",")
['alpha', '   beta', 'gamma\n']

I'd prefer a csv.reader(), and if you have control over the table format you 
should remove the extra whitespace in the source data.

def reader(instream):
    rows = csv.reader(instream) # will remove newline at the 
                                # end of the line, 
                                # but not other whitespace

    # Optional: remove surrounding whitespace from the fields
    rows = ([field.strip() for field in row] for row in rows)

    ... # as before

> def reader(instream):
>     rows = (line.split() for line in instream)
>     names = next(rows)
>     Row = namedtuple("Row", names)
>     return (Row(*values) for values in rows)
> 
> with open("AA.csv", "r") as f:
>     for row in reader(f):
>         print row.name