Namedtuples: some unexpected inconveniences
Peter Otten
__peter__ at web.de
Fri Apr 14 17:16:10 EDT 2017
Deborah Swanson wrote:
> Peter,
>
> Retracing my steps to rewrite the getattr(row, label) code, this is what
> sent me down the rabbit hole in the first place. (I changed your 'rows'
> to 'records' just to use the same name everywhere, but all else is the
> same as you gave me.) I'd like you to look at it and see if you still
> think complete(group, label) should work. Perhaps seeing why it fails
> will clarify some of the difficulties I'm having.
>
> I ran into problems with values and has_empty. values has a problem
> because
> row[label] gets a TypeError. has_empty has a problem because a list of
> field values will be shorter with missing values than a full list, but a
> namedtuple with missing values will be the same length as a full
> namedtuple since missing values have '' placeholders. Two more
> unexpected inconveniences.
>
> A short test csv is at the end, for you to read in and attempt to
> execute the following code, and I'm still working on reconstructing the
> lost getattr(row, label) code.
>
> import csv
> from collections import namedtuple, defaultdict
>
> def get_title(row):
> return row.title
>
> def complete(group, label):
> values = {row[label] for row in group}
> # get "TypeError: tuple indices must be integers, not str"
Yes, the function expects row to be dict-like. However when you change
row[label]
to
getattr(row, label)
this part of the code will work...
> has_empty = not min(values, key=len)
> if len(values) - has_empty != 1:
> # no value or multiple values; manual intervention needed
> return False
> elif has_empty:
> for row in group:
> row[label] = max(values, key=len)
but here you'll get an error. I made the experiment to change everything
necessary to make it work with namedtuples, but you'll probably find the
result a bit hard to follow:
import csv
from collections import namedtuple, defaultdict
INFILE = "E:\\Coding projects\\Pycharm\\Moving\\Moving 2017 in - test.csv"
OUTFILE = "tmp.csv"
def get_title(row):
return row.title
def complete(group, label):
values = {getattr(row, label) for row in group}
has_empty = not min(values, key=len)
if len(values) - has_empty != 1:
# no value or multiple values; manual intervention needed
return False
elif has_empty:
# replace namedtuples in the group. Yes, it's ugly
fix = {label: max(values, key=len)}
group[:] = [record._replace(**fix) for record in group]
return True
with open(INFILE) as infile:
rows = csv.reader(infile)
fieldnames = next(rows)
Record = namedtuple("Record", fieldnames)
groups = defaultdict(list)
for row in rows:
record = Record._make(row)
groups[get_title(record)].append(record)
LABELS = ['Location', 'Kind', 'Notes']
# add missing values
for group in groups.values():
for label in LABELS:
complete(group, label)
# dump data (as a demo that you do not need the list of all records)
with open(OUTFILE, "w") as outfile:
writer = csv.writer(outfile)
writer.writerow(fieldnames)
writer.writerows(
record for group in groups.values() for record in group
)
One alternative is to keep the original and try to replace the namedtuple
with the class suggested by Gregory Ewing. Then it should suffice to also
change
> elif has_empty:
> for row in group:
> row[label] = max(values, key=len)
to
> elif has_empty:
> for row in group:
setattr(row, label, max(values, key=len))
PS: Personally I would probably take the opposite direction and use dicts
throughout...
More information about the Python-list
mailing list