Namedtuples: some unexpected inconveniences

Fri Apr 14 17:18:30 EDT 2017

On 2017-04-14 20:34, Deborah Swanson wrote:
> Peter,
> 
> Retracing my steps to rewrite the getattr(row, label) code, this is what
> sent me down the rabbit hole in the first place. (I changed your 'rows'
> to 'records' just to use the same name everywhere, but all else is the
> same as you gave me.) I'd like you to look at it and see if you still
> think complete(group, label) should work. Perhaps seeing why it fails
> will clarify some of the difficulties I'm having.
> 
> I ran into problems with values and has_empty. values has a problem
> because
> row[label] gets a TypeError. has_empty has a problem because a list of
> field values will be shorter with missing values than a full list, but a
> namedtuple with missing values will be the same length as a full
> namedtuple since missing values have '' placeholders.  Two more
> unexpected inconveniences.
> 
In the line:

     values = {row[label] for row in group}

'group' is a list of records; row is a record (namedtuple).

You can get the members of a namedtuple (also 'normal' tuple) by numeric 
index, e.g. row[0], but the point of a namedtuple is that you can get 
them by name, as an attribute, e.g. row.Location.

As the name of the attribute isn't fixed, but passed by name, use 
getattr(row, label) instead:

     values = {getattr(row, label) for row in group}

As for the values:

     # Remove the missing value, if present.
     values.discard('')

     # There's only 1 value left, so fill in the empty places.
     if len(values) == 1:
         ...

The next point is that namedtuples, like normal tuples, are immutable. 
You can't change the value of an attribute.

> A short test csv is at the end, for you to read in and attempt to
> execute the following code, and I'm still working on reconstructing the
> lost getattr(row, label) code.
> 
> import csv
> from collections import namedtuple, defaultdict
> 
> def get_title(row):
>      return row.title
> 
> def complete(group, label):
>      values = {row[label] for row in group}
>      # get "TypeError: tuple indices must be integers, not str"
>      has_empty = not min(values, key=len)
>      if len(values) - has_empty != 1:
>          # no value or multiple values; manual intervention needed
>          return False
>      elif has_empty:
>          for row in group:
>              row[label] = max(values, key=len)
>      return True
> 
> infile = open("E:\\Coding projects\\Pycharm\\Moving\\Moving 2017 in -
> test.csv")
> rows = csv.reader(infile)
> fieldnames = next(rows)
> Record = namedtuple("Record", fieldnames)
> records = [Record._make(fieldnames)]
> records.extend(Record._make(row) for row in rows)
> 
> # group rows by title
> groups = defaultdict(list)
> for row in records:
>      groups[get_title(row)].append(row)
> 
> LABELS = ['Location', 'Kind', 'Notes']
> 
> # add missing values
> for group in groups.values():
>      for label in LABELS:
>          complete(group, label)
> 
> Moving 2017 in - test.csv:
> (If this doesn't come through the mail system correctly, I've also
> uploaded the file to
> http://deborahswanson.net/python/Moving%202017%20in%20-%20test.csv.
> Permissions should be set correctly, but let me know if you run into
> problems downloading the file.)
> 
[snip]