Using namedtuples field names for column indices in a list of lists

Peter Otten __peter__ at web.de
Sun Jan 8 08:20:51 EST 2017


Deborah Swanson wrote:

> Peter Otten wrote, on January 08, 2017 3:01 AM
>> 
>> Deborah Swanson wrote:
>> 
>> > to do that is with .fget(). Believe me, I tried every > possible way
> to
>> > use instance.A or instance[1] and no way could I get ls[instance.A].
>> 
>> Sorry, no.
> 
> I quite agree, I was describing the dead end I was in from peeling the
> list of data and the namedtuple from the header row off the csv
> separately. That was quite obviously the wrong path to take, but I
> didn't know what a good way would be.
> 
>> To get a list of namedtuple instances use:
>> 
>> rows = csv.reader(infile)
>> Record = namedtuple("Record", next(rows))
>> records = [Record._make(row) for row in rows]
> 
> This is slightly different from Steven's suggestion, and it makes a
> block of records that I think would be iterable. At any rate all the
> data from the csv would belong to a single data structure, and that
> seems inherently a good thing.
> 
> a = records[i].A , for example
> 
> And I think that this would produce recognizable field names in my code
> (which was the original goal) if the following works:
> 
> records[0] is the header row == ('Description', 'Location', etc.)

Personally I would recommend against mixing data (an actual location) and 
metadata (the column name,"Location"), but if you wish my code can be 
adapted as follows:

infile = open("dictreader_demo.csv")
rows = csv.reader(infile)
fieldnames = next(rows)
Record = namedtuple("Record", fieldnames)
records = [Record._make(fieldnames)]
records.extend(Record._make(row) for row in rows)

If you want a lot of flexibility without doing the legwork yourself you 
might also have a look at pandas. Example session:

$ cat places.csv
Location,Description,Size
here,something,17
there,something else,10
$ python3
Python 3.4.3 (default, Nov 17 2016, 01:08:31) 
[GCC 4.8.4] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas
>>> places = pandas.read_csv("places.csv")
>>> places
  Location     Description  Size
0     here       something    17
1    there  something else    10

[2 rows x 3 columns]
>>> places.Location
0     here
1    there
Name: Location, dtype: object
>>> places.sort(columns="Size")
  Location     Description  Size
1    there  something else    10
0     here       something    17

[2 rows x 3 columns]
>>> places.Size.mean()
13.5

Be aware that there is a learning curve...
 
> If I can use records[i].Location for the Location column data in row
> 'i', then I've got my recognizable-field-name variables.
> 
>> If you want a column from a list of records you need to
>> extract it manually:
>> 
>> columnA = [record.A for record in records]
> 
> This is very neat. Something like a list comprehension for named tuples?
> 
> Thanks Peter, I'll try it all tomorrow and see how it goes.
> 
> PS. I haven't forgotten your defaultdict suggestion, I'm just taking the
> suggestions I got in the "Cleaning up Conditionals" thread one at a
> time, and I will get to defaultdict. Then I'll look at all of them and
> see what final version of the code will work best with all the factors
> to consider.




More information about the Python-list mailing list