Namedtuples: some unexpected inconveniences

Fri Apr 14 20:03:04 EDT 2017

fix = {label: max(values, key=len)}
group[:] = [record._replace(**fix) for record in group]

Peter Otten wrote, on Friday, April 14, 2017 2:16 PM
> > def complete(group, label):
> >     values = {row[label] for row in group}
> >     # get "TypeError: tuple indices must be integers, not str"
> 
> Yes, the function expects row to be dict-like. However when 
> you change 
> 
> row[label]
> 
> to
> 
> getattr(row, label)
> 
> this part of the code will work...
> 
> >     has_empty = not min(values, key=len)
> >     if len(values) - has_empty != 1:
> >         # no value or multiple values; manual intervention needed
> >         return False
> >     elif has_empty:
> >         for row in group:
> >             row[label] = max(values, key=len)
> 
> but here you'll get an error. I made the experiment to change 
> everything 
> necessary to make it work with namedtuples, but you'll 
> probably find the 
> result a bit hard to follow:
> 
> import csv
> from collections import namedtuple, defaultdict
> 
> INFILE = "E:\\Coding projects\\Pycharm\\Moving\\Moving 2017 
> in - test.csv" OUTFILE = "tmp.csv" 
> 
> def get_title(row):
>     return row.title
> 
> def complete(group, label):
>     values = {getattr(row, label) for row in group}  
>     has_empty = not min(values, key=len)
>     if len(values) - has_empty != 1:
>         # no value or multiple values; manual intervention needed
>         return False
>     elif has_empty:
>         # replace namedtuples in the group. Yes, it's ugly
>         fix = {label: max(values, key=len)}
>         group[:] = [record._replace(**fix) for record in group]
>     return True
> 
> with open(INFILE) as infile:
>     rows = csv.reader(infile)
>     fieldnames = next(rows)
>     Record = namedtuple("Record", fieldnames)
>     groups = defaultdict(list)
>     for row in rows:
>         record = Record._make(row)
>         groups[get_title(record)].append(record)
> 
> LABELS = ['Location', 'Kind', 'Notes']
> 
> # add missing values
> for group in groups.values():
>     for label in LABELS:
>         complete(group, label)
> 
> # dump data (as a demo that you do not need the list of all 
> records) with open(OUTFILE, "w") as outfile:
>     writer = csv.writer(outfile)
>     writer.writerow(fieldnames)
>     writer.writerows(
>         record for group in groups.values() for record in group
>     )
> 
> One alternative is to keep the original and try to replace 
> the namedtuple 
> with the class suggested by Gregory Ewing. Then it should 
> suffice to also 
> change
> 
> >     elif has_empty:
> >         for row in group:
> >             row[label] = max(values, key=len)
> 
> to
> 
> >     elif has_empty:
> >         for row in group:
>               setattr(row, label, max(values, key=len))
> 
> PS: Personally I would probably take the opposite direction 
> and use dicts 
> throughout...

Ok, thank you. I haven't run it on a real input file yet, but this seems
to work with the test file.

Because the earlier incarnation defined 'values' as

values = {row[label] for row in group}

I'd incorrectly guessed what was going on in 

has_empty = not min(values, key=len).

Now that 

values = {getattr(row, label) for row in group}

works properly as you intended it to, I see you get the set of unique
values for that label in that group, which makes the rest of it make
sense.

I know it's your "ugly" answer, but can I ask what the '**' in

fix = {label: max(values, key=len)}
group[:] = [record._replace(**fix) for record in group]

means? 

I haven't seen it before, and I imagine it's one of the possible
'kwargs' in 'somenamedtuple._replace(kwargs)', but I have no idea where
to look up the possible 'kwargs'. (probably short for keyword args) 

Also, I don't see how you get a set for values with the notation you
used. Looks like if anything you've got a comprehension that should give
you a dict. (But I haven't worked a lot with sets either.)

Thanks