Namedtuples: some unexpected inconveniences
Deborah Swanson
python at deborahswanson.net
Fri Apr 14 20:03:04 EDT 2017
fix = {label: max(values, key=len)}
group[:] = [record._replace(**fix) for record in group]
Peter Otten wrote, on Friday, April 14, 2017 2:16 PM
> > def complete(group, label):
> > values = {row[label] for row in group}
> > # get "TypeError: tuple indices must be integers, not str"
>
> Yes, the function expects row to be dict-like. However when
> you change
>
> row[label]
>
> to
>
> getattr(row, label)
>
> this part of the code will work...
>
> > has_empty = not min(values, key=len)
> > if len(values) - has_empty != 1:
> > # no value or multiple values; manual intervention needed
> > return False
> > elif has_empty:
> > for row in group:
> > row[label] = max(values, key=len)
>
> but here you'll get an error. I made the experiment to change
> everything
> necessary to make it work with namedtuples, but you'll
> probably find the
> result a bit hard to follow:
>
> import csv
> from collections import namedtuple, defaultdict
>
> INFILE = "E:\\Coding projects\\Pycharm\\Moving\\Moving 2017
> in - test.csv" OUTFILE = "tmp.csv"
>
> def get_title(row):
> return row.title
>
> def complete(group, label):
> values = {getattr(row, label) for row in group}
> has_empty = not min(values, key=len)
> if len(values) - has_empty != 1:
> # no value or multiple values; manual intervention needed
> return False
> elif has_empty:
> # replace namedtuples in the group. Yes, it's ugly
> fix = {label: max(values, key=len)}
> group[:] = [record._replace(**fix) for record in group]
> return True
>
> with open(INFILE) as infile:
> rows = csv.reader(infile)
> fieldnames = next(rows)
> Record = namedtuple("Record", fieldnames)
> groups = defaultdict(list)
> for row in rows:
> record = Record._make(row)
> groups[get_title(record)].append(record)
>
> LABELS = ['Location', 'Kind', 'Notes']
>
> # add missing values
> for group in groups.values():
> for label in LABELS:
> complete(group, label)
>
> # dump data (as a demo that you do not need the list of all
> records) with open(OUTFILE, "w") as outfile:
> writer = csv.writer(outfile)
> writer.writerow(fieldnames)
> writer.writerows(
> record for group in groups.values() for record in group
> )
>
> One alternative is to keep the original and try to replace
> the namedtuple
> with the class suggested by Gregory Ewing. Then it should
> suffice to also
> change
>
> > elif has_empty:
> > for row in group:
> > row[label] = max(values, key=len)
>
> to
>
> > elif has_empty:
> > for row in group:
> setattr(row, label, max(values, key=len))
>
> PS: Personally I would probably take the opposite direction
> and use dicts
> throughout...
Ok, thank you. I haven't run it on a real input file yet, but this seems
to work with the test file.
Because the earlier incarnation defined 'values' as
values = {row[label] for row in group}
I'd incorrectly guessed what was going on in
has_empty = not min(values, key=len).
Now that
values = {getattr(row, label) for row in group}
works properly as you intended it to, I see you get the set of unique
values for that label in that group, which makes the rest of it make
sense.
I know it's your "ugly" answer, but can I ask what the '**' in
fix = {label: max(values, key=len)}
group[:] = [record._replace(**fix) for record in group]
means?
I haven't seen it before, and I imagine it's one of the possible
'kwargs' in 'somenamedtuple._replace(kwargs)', but I have no idea where
to look up the possible 'kwargs'. (probably short for keyword args)
Also, I don't see how you get a set for values with the notation you
used. Looks like if anything you've got a comprehension that should give
you a dict. (But I haven't worked a lot with sets either.)
Thanks
More information about the Python-list
mailing list